Lung cancerComparative analysis of three data mining techniques in diagnosis of lung cancerLi, Dia; Li, Zunshuia; Ding, Mingcuia; Ni, Ranb; Wang, Jingb; Qu, Lingboa,,c; Wang, Weia; Wu, Yongjuna,,dAuthor Information aCollege of Public Health, Zhengzhou University bThe First Affiliated Hospital of Zhengzhou University cHenan Joint International Research Laboratory of Green Construction of Functional Molecules and Their Bio analytical Applications dThe Key Laboratory of Nanomedicine and Health Inspection of Zhengzhou, Zhengzhou, Henan, China Received 18 December 2019 Accepted 13 March 2020 Correspondence to Yongjun Wu, 100 Kexue Avenue, Zhengzhou City, China, Tel: +86 37167781450; e-mail: [email protected] European Journal of Cancer Prevention: January 2021 - Volume 30 - Issue 1 - p 15-20 doi: 10.1097/CEJ.0000000000000598 Buy Metrics Abstract There is a lot of abnormal information in the development of lung cancer, and how to extract useful knowledge is urgent from massive information. Data mining technology has become a popular tool for medical classification and prediction. However, each technology has its advantage and disadvantage, and several data mining methods will be applied to conduct the in-depth analysis step by step. And the prediction results of different models are compared. A total of 180 lung cancer patients and 243 lung benign individuals were collected from the First Affiliated Hospital of Zhengzhou University from October 2014 to March 2016, and the prediction models based on epidemiological data, clinical features and tumor markers were developed by artificial neural network (ANN), decision tree C5.0 and support vector machine (SVM). The results showed that there were significant differences between the lung cancer group and the lung benign group in terms of seven tumor markers and 10 epidemiological and clinical indicators. The accuracy rates of ANN, C5.0 and SVM were 76.47, 89.92 and 85.71%, respectively. The results of receiver operating characteristic curve (ROC) curve revealed the area under the ROC curve (AUC) of ANN was 0.811 (0.770–0.847), the AUC of C5.0 was 0.897 (0.864–0.924) and the AUC of SVM was 0.878 (0.843–0.908). It was shown that the decision tree C5.0 model has the least error rate and highest accuracy, and it could be used to diagnose lung cancer. Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.