West China Journal of Stomatology ›› 2025, Vol. 43 ›› Issue (6): 871-880.doi: 10.7518/hxkq.2025.2025135

• Clinical Research • Previous Articles    

Machine learning-based prediction model for caries in the first molars of 9-year-old children in Suzhou

Chen Lingzhi1(), Wang Xiaqin1, Zhu Kaifei2, Ren Kun3, Wu Zhen1()   

  1. 1.Dept. of Dentistry, Suzhou Wuzhong People’s Hospital, Suzhou 215128, China
    2.Suzhou Wuzhong Center for Disease Control and Prevention, Suzhou 215128, China
    3.The Key Laboratory of Engineering of Suzhou University of Science and Technology, Suzhou 215000, China
  • Received:2025-04-10 Revised:2025-11-01 Online:2025-12-01 Published:2025-11-27
  • Contact: Wu Zhen E-mail:clz823173371@si-na.com;baikaishui7707@126.com
  • Supported by:
    Suzhou Wuzhong District Science and Technology Plan Youth Project(WZYW2021017)

Abstract:

Objective This study aimed to use machine learning algorithms to build a prediction model of the first permanent molar caries of 9-year-old children in Suzhou and screen out risk factors. Methods Random stratified whole group sampling was applied to randomly select 9-year-old students from 38 primary schools in 14 townships and streets in Wuzhong District for oral examination and questionnaire survey. Multifactor Logistics regression was used to analyze the risk factors of tooth decay. The data set was randomly divided into training sets and verification sets according to 8∶2, and R 4.3.1 was used to build five machine learning algorithms: random forest, decision tree, extreme gradient boosting (XGBoost), Logistics regression, and lightweight gradient enhancement (LightGBM). The predictive effect of these five models was evaluated using the area under the characteristic curve (AUC). The marginal contribution of quantitative characteristics to the caries prediction model was determined through Shapley additive explanations (SHAP). Results This study included 7 225 samples that met the standard. The caries rate of the first permanent molar was 54.96%. Multifactor Logistic regression analysis showed that sweet drinks, dessert and candy, snack frequency, and snacks before going to bed after brushing teeth were correlated with the occurrence of first permanent molar caries (P<0.05). The AUC values of decision tree, Logistic regression, LightGBM, random forest, and XGBoost were 75.5%, 83.9%, 88.6%, 88.9%, and 90.1%, respectively. Compared with the variables after single heat coding, the SHAP value of high-frequency sweets (such as dessert candy ≥2 times a day, mother’s sugary diet ≥2 times a day) and bad oral hygiene habits (such as frequent snacks before going to bed after brushing teeth and irregular brushing teeth) exhibited the highest positive. Conclusion XGBoost algorithm has a good prediction effect for first permanent molar caries in 9-year-old children. High-frequency sweet factors and bad oral hygiene habits have a strong positive impact on the risk of first permanent molar caries and are key drivers that can be used in the formulation of targeted interventions.

Key words: first permanent molar, machine learning, influencing factor, prediction model

CLC Number: