华西口腔医学杂志 ›› 2023, Vol. 41 ›› Issue (6): 686-693.doi: 10.7518/hxkq.2023.2023124

• 临床研究 • 上一篇    下一篇

基于机器学习的四川省12岁儿童龋齿预测模型

严鑫淼1(), 孙桃兰1, 卢雨航1, 谭馨1, 王卓2, 李淼晶3()   

  1. 1.成都医学院公共卫生学院,成都 610500
    2.四川省疾病预防控制中心,成都 610500
    3.成都医学院大健康与智能工程学院,成都 610500
  • 收稿日期:2023-04-19 修回日期:2023-08-14 出版日期:2023-12-01 发布日期:2023-11-27
  • 通讯作者: 李淼晶 E-mail:yanxinmiao@cmc.edu.cn;limiaojing@aliyun.com
  • 作者简介:严鑫淼,硕士,E-mail:yanxinmiao@cmc.edu.cn
  • 基金资助:
    四川省卫生健康委员会科研课题(20PJ122)

Prediction model of dental caries in 12-year-old children in Sichuan Province based on machine learning

Yan Xinmiao1(), Sun Taolan1, Lu Yuhang1, Tan Xin1, Wang Zhuo2, Li Miaojing3()   

  1. 1.School of Public Health, Chengdu Medical College, Chengdu 610500, China
    2.Sichuan Center for Disease Control and Prevention, Chengdu 610500, China
    3.College of Health and Intelligent Engineering, Chengdu Medical College, Chengdu 610500, China
  • Received:2023-04-19 Revised:2023-08-14 Online:2023-12-01 Published:2023-11-27
  • Contact: Li Miaojing E-mail:yanxinmiao@cmc.edu.cn;limiaojing@aliyun.com
  • Supported by:
    Scientific Research Project of Sichuan Provincial Health Commission(20PJ122);Correspondence: Li Miaojing, E-mail: limiaojing@aliyun.com

摘要:

目的 采用机器学习算法构建儿童龋齿预测模型,寻找儿童龋齿的危险因素,有针对性地提出儿童口腔健康改善措施、政策建议。 方法 采用分层整群随机抽样方法,根据四川省各地开展政策措施不同,在四川省8个市中随机抽取3~4所中学的12岁在校学生进行问卷调查、口腔检查和体格检查,采用多因素Logistic回归分析12岁儿童龋齿的危险因素。将数据集按7∶3随机分为训练集和验证集,使用R 4.1.1构建随机森林、决策树、极致梯度提升和Logistic回归4种机器学习算法,应用受试者工作特征(ROC)曲线下面积(AUC)评估4种预测模型的预测效果。 结果 研究共纳入符合标准的12岁儿童4 439例,其中恒牙患龋率为50.93%。多因素Logistic回归分析结果显示,身体质量指数、父亲最高学历、母亲最高学历、是否刷牙、每天刷几次牙、刷牙时使用牙膏、刷牙时长、饭后漱口、刷牙后睡前进食、甜饮料、零食、去牙科诊所看牙、几岁刷牙与儿童龋齿存在关联(P<0.05)。随机森林、决策树、Logistic回归、极致梯度提升预测儿童龋齿的AUC值分别为0.840、0.755、0.799、0.794,在随机森林模型中,贡献度最高的变量为刷牙后睡前进食。 结论 基于随机森林建立了一个儿童龋齿的预测模型,具有较好的预测效果。在此基础上有针对性地对影响儿童龋齿发生的主要影响因素采取预防措施有利于降低儿童龋齿的发生率。

关键词: 儿童龋齿, 机器学习, 随机森林, 影响因素, 预测模型

Abstract:

Objective The machine learning algorithm was used to construct a prediction model of children’s dental caries to determine the risk factors of dental caries in children and put forward targeted measures and policy suggestions to improve children’s oral health. Methods Stratified cluster random sampling was adopted in this study. In accordance with different policies and measures in Sichuan Province, 12-year-old students from 3-4 middle schools in eight cities of Sichuan Province were randomly selected for questionnaire survey, oral examination, and physical examination. Multivariate logistic regression analysis of risk factors for dental caries in 12-year-old children was conducted. The dataset was randomly divided into training set and validation set at a ratio of 7∶3. Four machine learning algorithms, including random forest, decision tree, extreme gradient boosting (XGBoost), and Logistic regression, were constructed using R version 4.1.1, and the prediction effects of the four prediction models were evaluated using the area under receiver operating characteristic curve (AUC). Results A total of 4 439 children aged 12 years were included in this study. The incidence of permanent teeth caries was 50.93%. The results of multivariate logistic regression analysis showed that body mass index, highest educational background of the father, highest educational background of the mother, whether to brush teeth, how many times a day, use of toothpaste when brushing teeth, duration of brushing teeth, mouthwash after meals, eating before going to bed after brushing teeth, sweet drinks, snacks, going to dental clinic to examine teeth, and age of brushing teeth were the factors influencing children’s dental caries (P<0.05). The AUC values predicted by random forest, decision tree, Logistic regression, and XGBoost were 0.840, 0.755, 0.799, and 0.794, respectively. In the random forest model, the variable with the highest contribution was eating before bed after brushing. Conclusion A prediction model of dental caries in children was established on the basis of random forest, showing good prediction effect. Taking preventive measures for the main factors affecting the occurrence of dental caries in children is beneficial.

Key words: children caries, machine learning, random forest, influencing factor, prediction model

中图分类号: