华西口腔医学杂志 ›› 2025, Vol. 43 ›› Issue (6): 871-880.doi: 10.7518/hxkq.2025.2025135

• 临床研究 • 上一篇    下一篇

基于机器学习的苏州地区9岁儿童第一恒磨牙龋病预测模型研究

陈灵芝1(), 王霞琴1, 朱凯飞2, 任坤3, 吴桢1()   

  1. 1.苏州市吴中人民医院口腔科,苏州 215128
    2.苏州市吴中区疾控中心,苏州 215128
    3.苏州科技大学工程学院重点实验室,苏州 215000
  • 收稿日期:2025-04-10 修回日期:2025-11-01 出版日期:2025-12-01 发布日期:2025-11-27
  • 通讯作者: 吴桢 E-mail:clz823173371@si-na.com;baikaishui7707@126.com
  • 作者简介:陈灵芝,主管护师,学士,E-mail:clz823173371@si-na.com
  • 基金资助:
    苏州市吴中区科技计划项目(医疗卫生领域)青年项目(WZYW2021017)

Machine learning-based prediction model for caries in the first molars of 9-year-old children in Suzhou

Chen Lingzhi1(), Wang Xiaqin1, Zhu Kaifei2, Ren Kun3, Wu Zhen1()   

  1. 1.Dept. of Dentistry, Suzhou Wuzhong People’s Hospital, Suzhou 215128, China
    2.Suzhou Wuzhong Center for Disease Control and Prevention, Suzhou 215128, China
    3.The Key Laboratory of Engineering of Suzhou University of Science and Technology, Suzhou 215000, China
  • Received:2025-04-10 Revised:2025-11-01 Online:2025-12-01 Published:2025-11-27
  • Contact: Wu Zhen E-mail:clz823173371@si-na.com;baikaishui7707@126.com
  • Supported by:
    Suzhou Wuzhong District Science and Technology Plan Youth Project(WZYW2021017)

摘要:

目的 利用机器学习算法构建苏州地区9岁儿童第一恒磨牙龋病预测模型,筛选危险因素。 方法 采用随机分层整群抽样的方法,在吴中区14个乡镇、街道的38所小学中随机抽取9岁在校学生进行口腔检查和问卷调查。采用Logistic多因素回归分析龋齿的危险因素。将数据集按8∶2随机分为训练集及验证集,使用R 4.3.1构建随机森林、决策树、极端梯度提升(XGBoost)、Logistic回归、轻量级梯度提升(LightGBM)5种机器学习算法,应用受试者工作特征曲线下面积(AUC)评估5种模型的预测效果。通过沙普利加和解释(SHAP)量化特征对龋齿预测模型的边际贡献。 结果 研究纳入符合标准的样本7 225例,其中第一恒磨牙患龋率为54.96%,多因素Logistic回归分析显示,甜饮料、甜点心和糖果、零食频率、刷牙后睡前零食等与第一恒磨牙龋齿的发生存在关联(P<0.05)。决策树、Logistic回归、轻量级梯度提升、随机森林、极端梯度提升这5种预测模型的AUC值分别为75.5%、83.9%、88.6%、88.9%、90.1%。对比独热编码后的变量,高频甜食(如甜点心糖果每天≥2次、母亲含糖饮食每天≥2次)与不良口腔卫生习惯(如刷牙后睡前常吃零食、刷牙不规律)的SHAP值为正。 结论 基于极端梯度提升算法构建苏州地区9岁儿童第一恒磨牙龋病的预测模型,具有较好的预测效果。高频甜食和不良口腔卫生习惯对第一恒磨牙患龋有强正向影响,是关键的驱动因素,可用于针对性干预措施的制定。

关键词: 第一恒磨牙, 机器学习, 影响因素, 预测模型

Abstract:

Objective This study aimed to use machine learning algorithms to build a prediction model of the first permanent molar caries of 9-year-old children in Suzhou and screen out risk factors. Methods Random stratified whole group sampling was applied to randomly select 9-year-old students from 38 primary schools in 14 townships and streets in Wuzhong District for oral examination and questionnaire survey. Multifactor Logistics regression was used to analyze the risk factors of tooth decay. The data set was randomly divided into training sets and verification sets according to 8∶2, and R 4.3.1 was used to build five machine learning algorithms: random forest, decision tree, extreme gradient boosting (XGBoost), Logistics regression, and lightweight gradient enhancement (LightGBM). The predictive effect of these five models was evaluated using the area under the characteristic curve (AUC). The marginal contribution of quantitative characteristics to the caries prediction model was determined through Shapley additive explanations (SHAP). Results This study included 7 225 samples that met the standard. The caries rate of the first permanent molar was 54.96%. Multifactor Logistic regression analysis showed that sweet drinks, dessert and candy, snack frequency, and snacks before going to bed after brushing teeth were correlated with the occurrence of first permanent molar caries (P<0.05). The AUC values of decision tree, Logistic regression, LightGBM, random forest, and XGBoost were 75.5%, 83.9%, 88.6%, 88.9%, and 90.1%, respectively. Compared with the variables after single heat coding, the SHAP value of high-frequency sweets (such as dessert candy ≥2 times a day, mother’s sugary diet ≥2 times a day) and bad oral hygiene habits (such as frequent snacks before going to bed after brushing teeth and irregular brushing teeth) exhibited the highest positive. Conclusion XGBoost algorithm has a good prediction effect for first permanent molar caries in 9-year-old children. High-frequency sweet factors and bad oral hygiene habits have a strong positive impact on the risk of first permanent molar caries and are key drivers that can be used in the formulation of targeted interventions.

Key words: first permanent molar, machine learning, influencing factor, prediction model

中图分类号: