Secondary Logo

Journal Logo


Machine learning in chronic obstructive pulmonary disease

Zhang, Bochao1,2; Wang, Jiping1,2; Chen, Jing1,2; Ling, Zongquan1,2; Ren, Yuhao1,2; Xiong, Daxi1,2; Guo, Liquan1,2

Editor(s): Guo, Lishao

Author Information
Chinese Medical Journal: August 10, 2022 - Volume - Issue - 10.1097/CM9.0000000000002247
doi: 10.1097/CM9.0000000000002247

The World Health Organization has projected that by 2030, chronic obstructive pulmonary disease (COPD) will be the third-leading cause of mortality and the seventh-leading cause of morbidity worldwide.[1] Acute exacerbations of chronic obstructive pulmonary disease (AECOPD) are associated with an accelerated decline in lung function, diminished quality of life, and higher mortality.[2] Accurate early detection of AECOPD contributes to better management and mortality reduction, and in the recent past there has been great progress in the development and application of machine learning (ML), which is an important part of predictive analytics incorporating predictive models that improve with increasing numbers of cases, in seeking ways to better tackle COPD.

Combined with the current research status of ML technology commonly used in the field of COPD, as illustrated in [Supplementary Tables 1 and 2,] this article discusses the application of ML in the prevention and control model based on three aspects: scale assessment and patient classification, pulmonary function assessment, and acute exacerbation prediction methods during AECOPD, and analyzes the deficiencies and challenges of current ML technology in COPD prevention and diagnosis [Supplementary Figure 1,].

In order to better distinguish patients with potentially severe AECOPD, specific scales related to the acute exacerbation phase have been developed based on large-scale clinical studies and analyses, including the COPD and Asthma Physiology Score, the BAP-65 score, and the AECOPD-F score.

Based on the comparative results obtained from the data used for synthesizing [Supplementary Table 3,] we can infer that a comprehensive and integrated scoring system has tremendous advantages over a single assessment index, since it can more conveniently guide medical practitioners to take reasonable treatment measures in a targeted manner and adjust the intensity of clinical treatment in a timely manner to improve medical care and patient prognosis.

The assessment and grading of patients with acute exacerbations are still in the exploratory stage and need to be combined with clinical objective indicators such as pulmonary function tests, comorbidities, and biomarkers to facilitate accurate phenotypic classification, severity assessment, and treatment guidance for patients.

In a study by Pikoula et al,[3] it was shown that COPD can present with various phenotypes, etiologies, and prognostic features. Further, in the same study, patient characteristics in terms of demographics, comorbidities, risk of death, and exacerbations were identified and characterized, by carrying out comparisons among patients with associations between AECOPD and respiratory and cardiovascular mortality, through cluster analysis of electronic health record data. The study showed that it was possible to classify patients into five phenotypes without specific testing. However, the study also showed that patient boundaries between phenotypes are not clear and that complex patients may belong to multiple phenotypes.

As discernable from the combination of studies discussed in [Supplementary Table 4,] patients are classified according to their clinical presentation and disease progression, and the probabilistic model is matched to a dataset of multiple variables through potential category analysis to combine different characteristics of patients for classification prediction, providing important reference values for individualized treatment.

AECOPD occurs mostly in the elderly, and when pulmonary function has deteriorated below the level of compatibility allowing pulmonary function breath tests, physicians are unable to ascertain the exact level of pulmonary function. Therefore, the use of ML algorithms for lung function prediction helps physicians in clinical decision making and improves clinical practice.

The study by Chen et al,[4] given the poor health status of patients during acute exacerbations and their inability to effectively cooperate with pulmonary function tests, developed a prediction model based on multiple output support vector regression, which, when combined with demographic and inflammatory parameters, was able to predict the pulmonary function indices. However, the small sample size of the study and the predominance of men limit the predictive effect of the model for women.

Home telemonitoring includes the use of electronic devices and electronic information technology for wireless information exchange and allows for regular collection of clinical data. Telehealth-supported chronic care management services promote patient self-management, improve control, enhance quality of life, and prevent hospital admissions.

In a study by Wu et al,[5] reliable predictions of future AECOPD events were made by using wearable devices, home air quality sensing devices, smartphone apps, and supervised prediction algorithms. The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve of the model were >0.9. However, due to the limitations of air quality sensing devices, the scope of environmental collection is limited to the user's bedroom, implying an inferiority in the representativeness of the prediction results.

In our review [Supplementary Table 5,], an increasing number of smart sensing technologies are capable of continuously tracking body movements and detecting patients’ vital signs. In the course of examining the literature, we found that most patients self-managed, but there were older patients who had poor mobility and were prone to some rejection and maladjustment to monitoring equipment, making it necessary to measure experimental data multiple times, and to screen and collate them. In addition, the unreliability of patient physiological data measurements during AECOPD and the presence of targeted therapy hinder the accuracy of the algorithm; patient self-management may interfere with what would otherwise be a natural history of deterioration, reducing the relationship between some signs and admission, but may strengthen the relationship between some components of the algorithm and predictive decisions.

Respiratory sounds are important signs of the lungs and are physiological sound signals produced by the human respiratory system during the exchange with the external environment, containing a large amount of physiological and pathological information. Using traditional stethoscopes, it is difficult to capture certain weak physiological sound signals, and subjective experience can easily influence the diagnosis results of doctors. The rapid development of electronic technology and ML algorithms has played a great role in the analysis of breath sounds, assisting doctors in the diagnosis of respiratory diseases and intelligently identifying the condition of patients with COPD.

Altan et al[6] developed a method to focus directly on pulmonary sounds, and the study analyzed 12 distinct pulmonary sounds associated with varying degrees of COPD severity, assessed the severity of COPD using multichannel pulmonary sounds, and applied three-dimensional second-order difference maps to extract characteristic abnormalities in pulmonary sounds. The characteristic abnormalities on chaotic maps were extracted using cubic and octet-based quantization methods. A deep extreme learning machine classifier (Deep-ELM) was used for classification to predict the severity of COPD. The model performed well, with accuracy, sensitivity, and specificity >0.9.

These studies [Supplementary Table 6,] have some limitations, mainly the small sample size and the monocentric nature of the data collection; the rejection of auscultation devices by patients; the incomprehensiveness of the study characteristics and the possibility of a local optimum; and the fact that the model may be considered a “black box” that does not allow clinical staff to understand the rationale for the outcome.

When a patient is too ill to perform pulmonary function tests effectively, the patient needs to be evaluated in terms of clinical features. Peng et al[7] collected medical records and selected 28 features, including vital signs, medical history, comorbidities, and various inflammatory indicators. In addition, the performance of the constructed C5.0 model was analyzed, that it was found to exceed those of the C4.5, classification and regression trees, and iterative dichotomous models. The study demonstrated that the C5.0 decision tree classifier performed best and helped respiratory physicians to quickly assess the severity of AECOPD in patients at an early stage. Inflammatory indicators such as interleukins were missing from the study due to the fact that only a small number of patients had the relevant tests done.

As illustrated in [Supplementary Table 7,] the studies mentioned above have some limitations: clinical tests performed on patients in different hospitals or doctors are not uniform, and there are differences in the number and importance of characteristics; some indicators are difficult to quantify, such as depression and anxiety, and mobility; each study model is only suitable for specific patients who meet specific criteria, and the generalization ability of the model is limited.

Chest computed tomography (CT) is an advanced imaging technique that is widely used to detect lung texture abnormalities as well as the status of COPD. However, chest CT results provide a large amount of image data, from which pathophysiological irregularities cannot be identified with the naked eye. This calls into question the need for ML algorithms to assist in decision making. ML can be used for automated analysis of pulmonary function tests and differential diagnosis of COPD. In recent years, weakly supervised learning in particular has been greatly developed and applied in chest CT by virtue of its convenience, wide coverage, and good application performance.

In a recent prospective study,[8] support vector machines and logistic regression algorithms were used to analyze chest CT images to assess the pulmonary ventilation function in COPD. The assessment model (quadratic support vector machine) was tested for validity in 27 COPD patients based on 87 image features with an accuracy of 88% and an AUC value of 0.82. Although these results are encouraging, the sample size was small and most patients had moderate-to-severe COPD, suggesting that patients with mild COPD who are eligible should be included in future studies. Similarly, Sun et al[9] developed weakly supervised deep learning models that utilize CT image data for the automated detection and staging of spirometry-defined COPD among the natural population.

As illustrated in [Supplementary Table 8,] these studies have limitations: (1) the use of deep learning requires high training costs and memory requirements; (2) a large amount of data is required to achieve stable model performance; and (3) the “black box” nature of deep learning and even ML may lead to uncomfortable results.

Researchers have classified and diagnosed patients with COPD from multiple perspectives, including genes, insurance data, social factors, and biomarkers.

In a study by Ma et al,[10] 101 single nucleotide polymorphisms (SNPs) were determined by Mass Array analysis, and six prediction models were developed and evaluated to predict the development of COPD by combining SNPs and clinical information; the models showed good performance in COPD risk prediction, compensating for the lack of pulmonary function tests at early disease stages.

As illustrated in [Supplementary Table 9,] these studies and predictive models assist physicians in various aspects and degrees of decision support and can effectively save valuable healthcare resources, and governments are able to extend the needed care for COPD patients.

ML techniques and even artificial intelligence applications in medicine are an increasingly important topic, although ML currently has difficulty in ensuring generalization capability and providing valid information for relationships between high-dimensional features.[11] In the future, ML models and algorithms, validated using large real data sets as well as continuously monitored clinical data, and combined with cloud platforms and IoT engineering, will effectively improve computational speed and data processing capabilities to carry out remote applications assist physicians in obtaining multiple perspectives and directions in diagnosis and assessment, improve patient survival, reduce the waste of medical resources, and potentially elevate COPD management to a whole new level, bringing effective help and rehabilitation to COPD patients and physicians.


This work was supported by a grant from Suzhou Special Technical Project for Diagnosis and Treatment of Key Clinical Diseases (No. LCZX201931).

Conflicts of interest



1. Qureshi H, Sharafkhaneh A, Hanania NA. Chronic obstructive pulmonary disease exacerbations: latest evidence and clinical implications. Ther Adv Chronic Dis 2014;5:212–227. doi:10.1177/2040622314532862.
2. Mekov E, Miravitlles M, Petkov R. Artificial intelligence and machine learning in respiratory medicine. Expert Rev Respir Med 2020;14:559–564. doi: 10.1080/17476348.2020.1743181.
3. Pikoula M, Quint JK, Nissen F, Hemingway H, Smeeth L, Denaxas S. Identifying clinically important COPD sub-types using data driven approaches in primary care population based electronic health records. BMC Med Inform Decis Mak 2019;19:86. doi: 10.1186/s12911-019-0805-0.
4. Chen J, Yang Z, Yuan Q, Xiong DX, Guo LQ. Prediction models for pulmonary function during acute exacerbation of chronic obstructive pulmonary disease. Physiol Meas 2021;41:125010. doi: 10.1088/1361-6579/abc792.
5. Wu CT, Li GH, Huang CT, Cheng YC, Chen CH, Chien JY, et al. Acute exacerbation of a chronic obstructive pulmonary disease prediction system using wearable device data, machine learning, and deep learning: development and Cohort Study. JMIR Mhealth Uhealth 2021;9:e22591. doi: 10.2196/22591.
6. Altan G, Kutlu Y, GÖKÇEn A. Chronic obstructive pulmonary disease severity analysis using deep learning on multi-channel lung sounds. Turk J Electr Eng Comput Sci 2020;28:2979–2996. doi: 10.3906/elk-2004-68.
7. Peng J, Chen C, Zhou M, Xie X, Zhou Y, Luo CH. A machine-learning approach to forecast aggravation risk in patients with acute exacerbation of chronic obstructive pulmonary disease with clinical indicators. Sci Rep 2020;10:3118. doi: 10.1038/s41598-020-60042-1.
8. Westcott A, Capaldi DPI, McCormack DG, Ward AD, Fenster A, Parraga G. Chronic obstructive pulmonary disease: thoracic CT texture analysis and machine learning to predict pulmonary ventilation. Radiology 2019;293:676–684. doi: 10.1148/radiol.2019190450.
9. Sun J, Liao X, Yan Y, Zhang X, Sun J, Tan W, et al. Detection and staging of chronic obstructive pulmonary disease using a computed tomography-based weakly supervised deep learning approach. Eur Radiol 2022;32:1–11. doi: 10.1007/s00330-022-08632-7.
10. Ma X, Wu Y, Zhang L, Yuan W, Yan L, Fan S, et al. Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population. J Transl Med 2020;18:146. doi: 10.1186/s12967-020-02312-0.
11. Shortliffe EH. Artificial intelligence in medicine: weighing the accomplishments, hype, and promise. Yearb Med Inform 2019;28:257–262. doi: 10.1055/s-0039-1677891.

Supplemental Digital Content

Copyright © 2022 The Chinese Medical Association, produced by Wolters Kluwer, Inc. under the CC-BY-NC-ND license.