Feature ArticleDeveloping a Classification Algorithm for Prediabetes Risk Detection From Home Care Nursing Notes Using Natural Language ProcessingJeon, Eunjoo PhD, RN; Kim, Aeri BSN, RN; Lee, Jisoo BSN, RN; Heo, Hyunsook MSN, RN; Lee, Hana MPH, RN; Woo, Kyungmi PhD, RN, CCM Author Information Author Affiliations: Technology Research, SamsungSDS (Dr Jeon); College of Nursing, Seoul National University (Mss Kim, J. Lee, and H. Lee and Dr Woo); and Seoul National University Hospital (Ms Heo), Seoul, South Korea. E.J. and A.K. contributed equally to this work. The authors have disclosed that they have no significant relationships with, or financial interest in, any commercial companies pertaining to this article. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (no. 810-20210012). A.K., J.L., and H.L. received a scholarship from the BK21 Education Program (Center for Human-Caring Nursing Leaders for the Future). The funding source had no role in the conducting of this study; study design; data collection, management, and analysis; interpretation of the results; preparation and review of the manuscript; and decision to publish. Corresponding author: Kyungmi Woo, PhD, RN, CCM, The Research Institute of Nursing Science, College of Nursing, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul, South Korea 03080 ([email protected]). Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s Web site (www.cinjournal.com). CIN: Computers, Informatics, Nursing ():10.1097/CIN.0000000000001000, January 17, 2023. | DOI: 10.1097/CIN.0000000000001000 Buy SDC PAP Metrics Abstract This study developed and validated a rule-based classification algorithm for prediabetes risk detection using natural language processing from home care nursing notes. First, we developed prediabetes-related symptomatic terms in English and Korean. Second, we used natural language processing to preprocess the notes. Third, we created a rule-based classification algorithm with 31 484 notes, excluding 315 instances of missing data. The final algorithm was validated by measuring accuracy, precision, recall, and the F1 score against a gold standard testing set (400 notes). The developed terms comprised 11 categories and 1639 words in Korean and 1181 words in English. Using the rule-based classification algorithm, 42.2% of the notes comprised one or more prediabetic symptoms. The algorithm achieved high performance when applied to the gold standard testing set. We proposed a rule-based natural language processing algorithm to optimize the classification of the prediabetes risk group, depending on whether the home care nursing notes contain prediabetes-related symptomatic terms. Tokenization based on white space and the rule-based algorithm were brought into effect to detect the prediabetes symptomatic terms. Applying this algorithm to electronic health records systems will increase the possibility of preventing diabetes onset through early detection of risk groups and provision of tailored intervention. Copyright © 2023 Wolters Kluwer Health, Inc. All rights reserved.