Secondary Logo

Journal Logo

Chest Imaging

Deep Learning–Based Automatic CT Quantification of Coronavirus Disease 2019 Pneumonia: An International Collaborative Study

Yoo, Seung-Jin MD; Qi, Xiaolong MD, PhD; Inui, Shohei MD‡,§; Kim, Hyungjin MD, PhD; Jeong, Yeon Joo MD, PhD; Lee, Kyung Hee MD, PhD#; Lee, Young Kyung MD, PhD∗∗; Lee, Bae Young MD, PhD††; Kim, Jin Yong MD, MPH‡‡; Jin, Kwang Nam MD, PhD§§; Lim, Jae-Kwang MD∥∥; Kim, Yun-Hyeon MD, PhD¶¶; Kim, Ki Beom MD##; Jiang, Zicheng MD∗∗∗; Shao, Chuxiao MD†††; Lei, Junqiang MD‡‡‡; Zou, Shengqiang MD§§§; Pan, Hongqiu MD§§§; Gu, Ye MD∥∥∥; Zhang, Guo MD¶¶¶; Goo, Jin Mo MD, PhD; Yoon, Soon Ho MD, PhD∥,###

Author Information
Journal of Computer Assisted Tomography: 5/6 2022 - Volume 46 - Issue 3 - p 413-422
doi: 10.1097/RCT.0000000000001303


Coronavirus disease 2019 (COVID-19) is rapidly spreading worldwide, causing substantial morbidity and mortality on a global scale, and was declared a pandemic as of March 12, 2020. The clinical manifestation of COVID-19 varies from an asymptomatic form to a critical form that causes respiratory and multiorgan failure, requiring mechanical ventilation and support in an intensive care unit.1 Four fifths of COVID-19 patients experience mild disease, whereas one fifth of COVID-19 patients have severe to critical illness.2 Elderly patients and those with comorbidities are at a higher risk of severe disease, developing acute respiratory distress syndrome, and death.1–4

Another prognostic indicator in patients with COVID-19 may be the radiologic extent of pneumonia, in accordance with experiences from the earlier SARS and MERS outbreaks.5 Chest radiography is an easily accessible imaging modality, but its sensitivity is only 30% to 70% in detecting COVID-19 pneumonia.6,7 Chest computed tomography (CT) provides a comprehensive evaluation of the pulmonary manifestations of COVID-19,8 and typical CT findings are bilateral predominant ground-glass opacities (GGOs) with or without consolidation in the peripheral lungs.9,10 COVID-19 patients with severe to critical illness were found to have a larger extent of pulmonary disease based on a visual assessment, along with lymphadenopathy, pleural effusion, and traction bronchiectasis.10–13 Chest CT is indicated to assess COVID-19 patients with a moderate-to-severe disease or at risk for progression.12,14 The visual evaluation of disease extent requires considerable reader experience and is prone to intrareader and interreader variability. Meanwhile, CT quantification of COVID-19 pneumonia provides better prognostication than visual scoring, independent of reader experience, or variability.15

The purpose of our study was to develop and validate the automatic quantification of the extent (%) and weight (g) of COVID-19 pneumonia on CT images.


The institutional review board of each of the participating hospitals approved this retrospective study, and the requirement for patient consent was waived due to the retrospective nature of data collection (IRB No. H-2003-022-1106).

Study Population

We retrospectively collected 194 anonymized chest CT scans of 146 RT-PCR (reverse transcription–polymerase chain reaction)–proven COVID-19 patients (mean age, 47.2 ± 18.1 years; male-to-female ratio, 59%:41%) that were obtained at 14 Korean and Chinese institutions from January 23 to March 15, 2020. Eighteen CT scans from 15 patients with poor image quality due to respiratory artifacts were excluded (Fig. 1A). A total of 176 CT scans from 131 patients were included in this study, of whom 12, 5, and 3 patients had additional follow-up chest CT scans once, twice, and thrice, respectively. All 176 chest CT scans were performed using 1 of the 17 multidetector CT scanners (Supplementary Materials, Twenty CT scans from 17 patients overlapped entirely with 2 previous studies that reported a qualitative evaluation of radiologic findings in 9 Korean patients and a human-derived quantitative evaluation of the radiologic extent of COVID-19 in 17 Korean and Chinese patients.4,6

Diagram of data selection in the internal (A) and external (B) data sets. ICD, International Classification of Diseases.

Preparation of Training CT Data

The CT images were uploaded to a commercially available software program for semiautomatic segmentation (MEDIP PRO v2.0.0.0; MEDICALIP Co Ltd, Seoul, South Korea). The lung parenchyma was segmented by a previously developed deep neural network (DeepCatch v1.0.0.0; MEDICALIP Co Ltd, Seoul, South Korea), which automatically extracts lung parenchyma with an accuracy higher than 99% in CT images containing extensive lung disease.16 All parenchymal abnormalities of COVID-19 were initially segmented by 2 technicians who had experience with lung parenchymal and lesion segmentation in thousands of CT images and received detailed instructions for segmentation, with active feedback between the technicians and radiologists. After reviewing the tentative lung and lesion masks, 1 of the 2 thoracic radiologists (S.H.Y. and S.J.Y. with 15 and 5 years of experience with chest CT interpretation, respectively) determined the presence of COVID-19 lesions and adjusted the masks in every axial CT image slice. The adjustment was further supplemented by modification of the mask on coronal and sagittal images. The radiologists excluded parenchymal lesions other than COVID-19, such as peripheral reticulations and honeycombing, tuberculous sequelae, calcified nodules, dependent densities, pleural effusions, and areas of motion artifacts.

Development of the Deep Neural Network

The 176 CT scans were randomly partitioned at the study level into 1 of the 3 following data sets: 146 cases for the training set, 10 cases for the tuning set, and 20 cases for the internal test set (Fig. 1A). A majority of the CT scans consisted of 1-mm-section CT images with standard- to low-dose CT protocols. Data were preprocessed by changing the Hounsfield unit values of the area outside the lung to −3024, and axial slices without the lung were not included in the training set. In total, 24,915 slices of axial data with areas of pneumonia and 30,711 slices of axial data without areas of pneumonia were available in the data set. To minimize the possibility of reduced performance due to an unbalanced data set, for every epoch, all positive slices were included, whereas all 24,915 negative slices were randomly selected. Details about the development of the 2D U-Net are described in Fig. 2 and the Supplementary Materials (

2D U-Net architecture for COVID-19 pneumonia segmentation.

The 2D U-Net was distributed as free standalone software (MEDIP COVID19) on March 18, 2020 and updated with the current version of v1.2.0.0 on April 27, 2020. The software automatically calculated the extent (%) and weight (g) of pneumonia in 1 minute with the recommended specifications (Supplementary Materials,

External Validation

We used 5 data sets for external validation (Fig. 1B and Supplementary Materials, Data were in DICOM or NifTi format. The first data set included 101 nonenhanced chest CT scans of RT-PCR–proven Japanese COVID-19 cases with 74 asymptomatic and 27 symptomatic patients in the Diamond Princess cruise ship.11 The second and third data sets were a public CT data set that comprised 100 single axial scan images of the Italian Society of Medical and Interventional Radiology’s collection of representative images of COVID-19 patients17 and 9 volumetric CT scans from the “Radiopaedia.”18 The fourth data set was a public data set comprising 20 COVID-19 CT scans from China.19 The fifth data set was obtained from the deidentified “COVID data save lives” data, with 261 volumetric chest CT scans, from HM Hospitales, Spain.20 The publicly available data sets are endorsed for use as a benchmark for segmentation algorithms with appropriate citation, and no written permission was required for their use.

Statistical Analysis

The primary measures for the performance of the network were correlation coefficients for the extent and weight of COVID-19 pneumonia between the reference data sets and 2D U-Net. The following served as reference values: in the internal data set, radiologists’ mask-driven extent and weight; in the first external data set, radiologists’ visual CT severity scores (semiquantitative lobe-basis estimate of CT abnormalities; Supplementary Materials,; and in the second, third, and fourth external data sets, the extent and weight calculated with the provided mask drawn by experienced radiologists at each institution. Intraclass correlation coefficients (ICCs) using a 2-way mixed model (absolute agreement type) were used for the internal, the second, third, and fourth external validation data sets, whereas Pearson correlation coefficients were used in the first external validation data set.

Secondary measures included differences in the extent and weight of pneumonia and the morphological similarity of lung opacities between human experts and 2D U-Net. The degree of these differences was evaluated using Bland-Altman plots. The limits of agreement (LOA) in the difference between the masks were explored using the original values and square root transformation of the extent and weight of pneumonia to check whether the degree of the difference depended on the extent and the weight.21 Morphological similarity was assessed using the Dice similarity coefficient (DSC), sensitivity, and positive predictive value (PPV) between the network-driven and reference masks per patient.22,23

Multivariable logistic regression was conducted to evaluate risk factors for the presence of symptoms in the Japanese data set and the composite outcome including respiratory failure, intensive care unit admission, and mortality in the Spanish data set. Multivariable logistic regression analyses were done, including variables previously reported to be risk factors of mortality of COVID-19 patients14,24 and AI-driven pneumonia extent and weight, which were divided into 4 quartiles. The added value of pneumonia extent or weight to a clinical model for predicting the composite outcome was examined using a receiver operating characteristic curve analysis in the Spanish data set.

SPSS version 25 (IBM Corp, Armonk, NY) was used for all statistical analyses. The comparison of AUC values was done using MedCalc for Windows, version 15.0 (MedCalc Software, Ostend, Belgium).


Data Selection

A flow diagram of the data sets is shown in Figure 1. Of the 100 CT images of the second external data set, one CT image was excluded due to absence of the reference mask. Of the 20 CT scans of the fourth external data set, 10 CT scans were excluded because the Hounsfield unit values of the NifTi file were not preserved. In the fifth external data set, 19 follow-up CT scans, 16 CT scans without information on International Classification of Diseases codes, 66 CT scans taken 8 or more days after the admission date, 23 CT scans taken more than 2 days after the event of respiratory failure (Pao2 ≤60 mm Hg), and 22 CT scans without information of age and laboratory findings were excluded, resulting in a total of 115 CT scans in 115 patients. Among the 115 patients, 34 patients developed the composite outcome, including respiratory failure, intensive care unit admission, or mortality during admission (Supplementary Materials and Supplementary Table 1,

Internal Validation

The ICCs for pneumonia extent and weight between the 2D U-Net and reference were 0.990 and 0.993 in the test data set (Table 1). The mean differences in pneumonia extent and weight between the reference and 2D U-Net were 0.7% and 7.04 g, respectively. The magnitude of the measurement difference between the reference sources and 2D U-Net depended on the extent and weight (Fig. 3): 10% extent (95% LOA, −4.0% to 5.4%); 50% extent (95% LOA, −9.7% to 11.1%); 100 g (95% LOA, −41.4 g to 55.5 g); 500 g (95% LOA, −101.3 g to 115.4 g). The DSC, sensitivity, and PPV for the 2D U-Net masks relative to the reference masks were 77.8% ± 17.1%, 81.4% ± 10.2%, and 80.3% ± 20.6%, respectively (Table 1).

TABLE 1 - Performance of the Network Predicting COVID-19 Pneumonia Extent and Weight in CT Scans, Evaluated With Correlation Coefficients, Calculations of Spatial Overlaps, and Bland-Altman Analysis in Internal and External Validation Data Sets
Internal Validation External Validation
Japan (1st) Italy (2nd) Radiopaedia (3rd) China (4th)
Data Sets Extent Weight Extent Weight Extent Weight Extent Weight Extent Weight
Correlation coefficients
 ICC (95% CI) 0.990 (0.974–0.996) 0.993 (0.983–0.997) 0.949 (0.922–0.966) 0.981 (0.971–0.987) 0.965 (0.854–0.992) 0.978 (0.907–0.995) 0.959 (0.796–0.990) 0.993 (0.967–0.998)
 Pearson correlation coefficient 0.908 0.899
Calculation of spatial overlaps of AI-driven mask and reference mask
 Dice 77.8% ± 17.1% 73.4% ± 14.0% 71.9% ± 25.9% 77.0% ± 10.9%
 Sensitivity 81.4% ± 10.2% 70.6% ± 18.3% 79.9% ± 11.8% 70.7% ± 15.4%
 PPV 80.3% ± 20.6% 80.6% ± 10.1% 71.9% ± 27.6%. 87.7% ± 5.4%
Bland-Altman analysis
 95% LOA
 10% extent −4.0% to 5.4% −10.1% to 15.1% −4.0% to 7.4% −4.6% to 8.4%
 50% extent −9.7% to 11.1% −25.6% to 30.6% −11.0% to 14.4% −12.6% to16.4%
 10 g −3.6 to 3.5 g
 50 g −7.9 to 7.8 g
 100 g −41.4 to 55.5 g −57.9 to 97.2 g −32.4 to 68.2 g
 500 g −101.3 to 115.4 g −153.8 to 193.1 g −94.5 to 130.3 g

Bland-Altman plot and LOA for pneumonia extent (A) and weight (B) between the reference and 2D U-Net masks in the test data set. Horizontal lines, LOA from the model using the original values of extent and weight of pneumonia; curved lines, LOA from the model using the square root transformation of the extent and weight of pneumonia.

External Validation

In the first external validation (Japanese data set), the Pearson correlation coefficients were 0.908 and 0.899 between the visual CT severity score and extent and pneumonia weight, respectively (Table 1). In the other external validation data sets, the ICCs between the U-Net and reference values were between 0.949 and 0.965 (extent) and between 0.978 and 0.993 (weight), respectively (Table 1). The LOA of pneumonia extent and weight for these external data sets are described in Figure 4 and Table 1. The magnitude of the measurement difference between the reference sources and 2D U-Net depended on the extent and weight in every external validation data set, as follows: Italian data set—10% extent (95% LOA, −10.1% to 15.1%); 50% extent (95% LOA, −25.6% to 30.6%); 10 g (95% LOA, −3.6 g to 3.5 g); and 50 g (95% LOA, −7.9 g to 7.8 g); Radiopaedia data set—10% extent (95% LOA, −4.0% to 7.4%); 50% extent (95% LOA, −11.0% to 14.4%); 100 g (95% LOA, −57.9 g to 97.2 g); and 500 g (95% LOA, −153.8 g to 193.1 g); and Chinese data set—10% extent (95% LOA, −4.6% to 8.4%); 50% extent (95% LOA, −12.6% to 16.4%); 100 g (95% LOA, −32.4 g to 68.2 g); and 500 g (95% LOA, −94.5 g to 130.3 g). The DSCs for the 2D U-Net masks relative to the reference masks for the second, third, and fourth external validation sets were 73.4% ± 14.0%, 71.9% ± 25.9%, and 77.0% ± 10.9%, respectively (Table 1).

Bland-Altman plots and LOA of pneumonia extent and weight between the reference and 2D U-Net masks in the Italian (A and B), Radiopaedia (C and D), and Chinese (E and F) external validation data sets, respectively. Horizontal lines, LOA from the model using the original values of extent and weight of pneumonia; curved lines, LOA from the model using the square root transformation of the extent and weight of pneumonia.

Clinical Validation

Multivariable logistic regression analysis for identifying the risk factor for presence of symptom in the Japanese data set was done with previously reported risk factors for poor prognosis (age, presence of comorbidities, lymphocyte count, lactate dehydrogenase [LDH]), sex, and AI-driven pneumonia extent and weight.24,25 When adjusted for age, sex, presence of comorbidities, lymphocyte count, and LDH, the pneumonia extent or weight in the top quartile (Q4) was an independent risk factor for symptomatic presentation (Q1 vs Q4: odds ratio [OR], 5.523; confidence interval [CI], 1.069–28.529; P = 0.041) (Q1 vs Q4: OR, 10.561; CI, 1.544–72.242; P = 0.016) (Table 2).

TABLE 2 - Multivariable Logistic Regression Analysis of Risk Factors for the Presence of Symptoms in the Japanese Data Set
Multivariable Analysis 1–Pneumonia Extent Model Multivariable Analysis 2–Pneumonia Weight Model
Variables OR 95% CI P OR 95% CI P
 Q1 (≤47) (reference) 0.101 0.058
 Q2 (48–67) 0.347 0.070–1.729 0.196 0.295 0.057–1.533 0.147
 Q3 (68–75) 0.093 0.014–0.603 0.013 0.065 0.009–0.470 0.007
 Q4 (>75) 0.168 0.021–1.323 0.090 0.094 0.010–0.844 0.035
Male (vs female) 1.139 0.342–3.790 0.832 1.081 0.317–3.691 0.901
Presence of comorbidity (vs absence of comorbidity) 1.282 0.327–5.022 0.721 1.376 0.352–5.378 0.646
Lymphocyte count, /μL
 Q4 (≥1891) (reference) 0.095 0.081
 Q3 (1428–1890) 1.915 0.301–12.191 0.491 2.029 0.31–13.288 0.461
 Q2 (1041–1427) 4.771 0.724–31.426 0.104 6.582 0.959–45.189 0.055
 Q1 (<1041) 8.358 1.420–49.180 0.019 8.924 1.431–55.66 0.019
 Q1 (≤173) (reference) 0.115 0.062
 Q2 (174–191) 5.077 0.947–27.215 0.058 5.541 1.051–29.201 0.043
 Q3 (192–230) 0.706 0.125–4.001 0.694 0.570 0.095–3.407 0.538
 Q4 (>230) 1.332 0.208–8.542 0.763 1.317 0.218–7.949 0.764
AI-driven pneumonia extent, %
 Q1 (≤0.1) (reference) 0.144
 Q2 (0.2–0.3) 0.525 0.041–6.456 0.621
 Q3 (0.4–1.7) 2.742 0.580–12.967 0.203
 Q4 (>1.7) 5.523 1.069–28.529 0.041
AI-driven pneumonia weight, g
 Q1 (≤1.1) (reference) 0.106
 Q2 (1.2–6.8) 2.115 0.348–12.858 0.416
 Q3 (6.9–34.8) 3.736 0.659–21.194 0.137
 Q4 (>34.8) 10.561 1.544–72.242 0.016
Statistical significance below 0.05 (boldface).

With regard to the composite outcome, multivariable logistic regression analyses also included sex, presence of comorbidity, lymphocyte count, and LDH. The pneumonia extent, as compared between the lowest quartile (Q1) and the third and fourth quartiles (Q3 and Q4), was an independent risk factor for a poor prognosis (Q1 vs Q3: OR, 9.992; CI, 1.773–56.320; P = 0.009; Q1 vs Q4: OR, 9.365; CI, 1.393–62.966; P = 0.021). Similarly, the pneumonia weight, as compared between the lowest quartile (Q1) and the third and fourth quartiles (Q3 and Q4), was independently associated with a poor prognosis (Q1 vs Q3: OR, 10.620; CI, 1.702–66.286; P = 0.011; Q1 vs Q4: OR, 7.085; CI, 1.149–43.675; P = 0.035) (Table 3). In receiver operating characteristic curve analyses, AUC of the clinical model (age, sex, comorbidity, lymphocyte count, LDH) predicting the composite outcome was 0.682. When either pneumonia extent or weight was added into the clinical model, AUCs nonsignificantly increased to 0.713 (pneumonia extent, P = 0.1466) and 0.699 (pneumonia weight, P = 0.2331) (Supplementary Figure 1,

TABLE 3 - Multivariable Logistic Regression Analysis of Risk Factors for the Composite Outcome in the Spanish Data Set
Multivariable Analysis 1–Pneumonia Extent Model Multivariable Analysis 2–Pneumonia Weight Model
Variables OR 95% CI P OR 95% CI P
 Q1 (≤56) (reference)
 Q2 (57–69) 32.332 4.626–225.980 0.000 24.369 3.698–160.565 0.001
 Q3 (70–80) 4.376 0.668–28.658 0.124 3.768 0.603–23.549 0.156
 Q4 (>80) 22.591 3.037–168.068 0.002 19.050 2.670–135.895 0.003
Male (vs female) 0.598 0.208–1.720 0.340 0.438 0.149–1.290 0.134
Presence of comorbidity (vs absence of comorbidity) 1.824 0.583–5.710 0.302 1.785 0.605–5.272 0.294
Lymphocyte count, /μL
 Q4 (>1635) (reference)
 Q3 (1101–1635) 2.499 0.513–12.172 0.257 1.964 0.418–9.225 0.392
 Q2 (801–1100) 1.944 0.397–9.532 0.412 1.534 0.346–6.798 0.573
 Q1 (≤800) 0.477 0.099–2.310 0.358 0.348 0.074–1.649 0.184
 Q1 (≤385.50) (reference)
 Q2 (385.51–531.70) 1.583 0.334–7.497 0.563 2.035 0.446–9.274 0.359
 Q3 (531.71–727.00) 1.278 0.229–7.121 0.780 1.564 0.293–8.340 0.601
 Q4 (>727.00) 1.671 0.340–8.221 0.528 2.097 0.464–9.479 0.336
AI-driven pneumonia extent, %
 Q1 (≤2.4) (reference)
 Q2 (2.5–6.7) 1.272 0.223–7.264 0.786
 Q3 (6.8–16.7) 9.992 1.773–56.320 0.009
 Q4 (>16.7) 9.365 1.393–62.966 0.021
AI-driven pneumonia weight, g
 Q1 (≤46.7) (reference)
 Q2 (46.8–175.2) 2.309 0.436–12.218 0.325
 Q3 (175.3–369.5) 10.620 1.702–66.276 0.011
 Q4 (>369.5) 7.085 1.149–43.675 0.035
Statistical significance below 0.05 (boldface).
Comorbidity includes hypertension, diabetes mellitus type 2, coronary artery disease, heart failure, cerebrovascular disease, chronic kidney disease.


The 2D U-Net developed in this study was trained with CT images with COVID-19 pneumonia that were obtained from 17 CT scanners with varying CT parameters, including devices from the major 5 CT vendors that accounted for approximately 90% of the global market25 in 2018. The correlations observed between the 2D U-Net and reference in the Korean-Chinese test data set were relatively well reproduced in the external validation data sets. The degree of correlation was slightly lower in the external data sets than in the test data set. In terms of LOA, the external validation data sets seemed subpar compared with the test data set, especially the Italian external validation data set. This may partly originate from differences in the individual standard of each radiologist in discriminating subtle GGO and normal lung parenchyma, which resulted in discrepancies in the individual standards of manufacturing the reference masks (Supplementary Figure 2,

There are several previous studies associated with AI-based COVID-19 pneumonia segmentation on chest CT.26–29 However, most of them preferentially addressed the technical aspect of AI-based CT segmentation, while omitting or briefly demonstrating clinical performance in a limited data set. For example, Chaganti et al26 developed an AI-based lung and lesion segmentation tool on chest CT using chest CT scans of various diseases, including COVID-19 pneumonia. When comparing the ground truth with the percentage of predicted COVID-19 pneumonia, extent-based CT severity scores, the Pearson correlation coefficient was more than 0.90. Our study thoroughly tested our algorithm with extensive external validation data sets to prove the high performance of the algorithm in various institutions. Also, we added a concept of pneumonia weight, which reflects the dense pneumonic consolidation on CT scans. In the multivariable analysis, both extent and weight of COVID-19 pneumonia calculated by our algorithm on chest CT scans were significant factors for predicting poor prognosis as other clinical indicators.

Recent publications have indicated a close relationship between clinical and radiologic severity in COVID-19.10,30 When clinical severity was categorized into mild, severe, and critical cases, severe to critical COVID-19 cases had more frequent bilateral disease2 and a greater extent of COVID-19 than mild COVID-19 cases.9 These results imply that CT severity can be a surrogate parameter for estimating the pneumonia burden of COVID-19. Indeed, quantitative CT metrics for COVID-19 pneumonia were associated with a poor prognosis.31–33 To validate the usefulness of CT severity for severity stratification or prognostication, CT severity should be assessed using a uniform measuring tool in multiple cohorts with a sufficient number of cases. Such cohorts should be collected at multiple centers, requiring an accurate, reproducible, and easily accessible measuring tool. A deep neural network is a potential candidate for this purpose and needs to be validated with multiple external data.

Recent studies of severe COVID-19 patients reported that the elderly and several laboratory findings such as lymphocytopenia, elevated neutrophil-to-lymphocyte ratio, elevated LDH, elevated C-reactive protein, elevated procalcitonin, elevated d-dimers, elevated serum ferritin, and cardiac troponin were predictive parameters for risk stratification of COVID-19 patients.14,34–36 Our multivariable analyses showed partially consistent results with those studies regarding clinical and laboratory parameters. In addition, automatically quantified pneumonia extent and weight were also predictive factors for the symptom presence and composite outcome of patients with COVID-19 pneumonia. These results suggest that the AI-driven extent and weight of pneumonia measured in CT scans could be useful for risk stratification or monitoring of COVID-19 pneumonia, particularly for patients at risk for disease progression or moderate-to-severe respiratory impairment. Nevertheless, as the AI-driven outcomes were not based on a perfect segmentation of CT abnormalities, the results of the software should be checked by a radiologist and interpreted with consideration of relevant clinical factors.

Measurement differences between the reference and 2D U-Net values inevitably occurred and depended on the extent and weight. The DSC, sensitivity, and PPV between the reference masks and the 2D U-Net masks were slightly lower than 80% (Table 1). Differences mainly occurred under the following circumstances: the current version of 2D U-Net sometimes misrecognized partial volumes of respiratory and cardiac motion artifacts or pulmonary vessels in the basal lungs as pneumonia (Figs. 5 and 6), whereas minute lesions in the apical end of lung parenchyma tended to be missed. Despite these limitations, the potential utility of our software has been shown in a series of studies.33,37,38

Representative images of a 32-year-old man with COVID-19 pneumonia in the test data set. A chest CT image shows peripheral and peribronchial GGOs in both lungs (A). The reference mask (C) and 2D U-Net mask (D) match in most areas of the lesions except the blurry peripheral margin of the GGO visible in the subtracted mask (B). The DSC, sensitivity, and PPV were 90.2%, 91.4%, and 88.9%, respectively. Figure 5 can be viewed online in color at
Representative images of a COVID-19 pneumonia in the fourth external validation data set. A chest CT image shows peribronchial and subpleural GGOs in the right and left lower lobes (A). Subtraction (B) of the reference mask (C) and 2D U-Net mask (D) shows inaccurate lesion segmentation by 2D U-Net in the left retrocardiac and basal lung as pneumonia, whereas it was actually an artifact due to cardiac and diaphragm motion. However, other areas of pneumonia were segmented precisely. The DSC, sensitivity, and PPV were 84.9%, 77.0%, and 94.4%, respectively. Figure 6 can be viewed online in color at

Several limitations exist in this study. First, we retrospectively collected data from 146 patients, and their CT images may not cover a diverse range of radiologic manifestations of COVID-19. Second, we did not include sufficient numbers of COVID-19 patients who had underlying parenchymal disease (ie, extensive metastasis), although some CT images containing pulmonary lesions other than COVID-19 pneumonia were included in the 2D U-Net training set. Third, we tentatively validated correlations of pneumonia extent and weight with clinical parameters or outcomes. The 3D U-Net performed better in terms of minimizing misperceptions (unpublished data), but required higher computer specifications than 2D U-Net. Software based on 2D U-Net also requires high computer specifications with a particular GPU version, limiting its accessibility. Implementing this program in CT scanner consoles may be considered as a way of expanding its accessibility in locations where CT scanners exist.

In conclusion, the extent and weight of COVID-19 pneumonia on CT images were automatically quantifiable and independently associated with symptoms and prognosis. The quantification of COVID-19 pneumonia at multiple sites using a uniform measuring method can facilitate research toward better severity stratification and prognostication of COVID-19 patients at risk for morbidity and mortality.


The authors gratefully acknowledge the provision of deidentified “COVID data save lives” data by HM Hospitales and Andrew Dombrowski, PhD (Compecs, Inc) for his assistance in improving the use of English in this manuscript.


1. Wu Z, McGoogan JM. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA. 2020;323:1239–1242.
2. Guan W-J, Ni Z-Y, Hu Y, et al. Clinical characteristics of coronavirus disease 2019 in China. N Engl J Med. 2020;382:1708–1720.
3. Yang X, Yu Y, Xu J, et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet. Respir Med. 2020;8:475–481.
4. Yoon SH, Lee KH, Kim JY, et al. Chest radiographic and CT findings of the 2019 novel coronavirus disease (COVID-19): analysis of nine patients treated in Korea. Korean J Radiol. 2020;21:494–500.
5. Hosseiny M, Kooraki S, Gholamrezanezhad A, et al. Radiology perspective of coronavirus disease 2019 (COVID-19): lessons from severe acute respiratory syndrome and Middle East respiratory syndrome. Am J Roentgenol. 2020;214:1078–1082.
6. Choi H, Qi X, Yoon SH, et al. Extension of coronavirus disease 2019 (COVID-19) on chest CT and implications for chest radiograph interpretation. Radiol Cardiothorac Imaging. 2020;2:e204001.
7. Wong HYF, Lam HYS, Fong AH-T, et al. Frequency and distribution of chest radiographic findings in patients positive for COVID-19. Radiology. 2020;296:E72–E78.
8. Fang Y, Zhang H, Xie J, et al. Sensitivity of chest CT for COVID-19: comparison to RT-PCR. Radiology. 2020;296:E115–E117.
9. Chung M, Bernheim A, Mei X, et al. CT imaging features of 2019 novel coronavirus (2019-nCoV). Radiology. 2020;295:202–207.
10. Zhao W, Zhong Z, Xie X, et al. Relation between chest CT findings and clinical conditions of coronavirus disease (COVID-19) pneumonia: a multicenter study. Am J Roentgenol. 2020;214:1072–1077.
11. Inui S, Fujikawa A, Jitsu M, et al. Chest CT findings in cases from the cruise ship Diamond Princess with coronavirus disease (COVID-19). Radiol Cardiothorac Imaging. 2020;2:e200110.
12. Li K, Wu J, Wu F, et al. The clinical and chest CT features associated with severe and critical COVID-19 pneumonia. Invest Radiol. 2020;55:327–331.
13. Lyu P, Liu X, Zhang R, et al. The performance of chest CT in evaluating the clinical severity of COVID-19 pneumonia: identifying critical cases based on CT characteristics. Invest Radiol. 2020;55:412–421.
14. Zhou F, Yu T, Du R, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395:1054–1062.
15. Yin X, Min X, Nan Y, et al. Assessment of the severity of coronavirus disease: quantitative computed tomography parameters versus semiquantitative visual score. Korean J Radiol. 2020;21:998.
16. Yoo S-J, Yoon SH, Lee JH, et al. Automated lung segmentation on chest computed tomography images with extensive lung parenchymal abnormalities using a deep neural network. Korean J Radiol. 2020;22:476–488.
17. Medseg. COVID-19 CT segmentation dataset [Medseg Web site]. 2020. Available at: Accessed June 13, 2021.
18. Bell DJ. COVID-19 [Radiopaedia Web site]. 2020. Available at: Accessed June 13, 2021.
19. Jun M, Cheng G, Yixin W, et al. COVID-19 CT Lung and Infection Segmentation Dataset [Zenodo Web site]. Available at: Published April 20, 2020; Accessed June 13, 2021.
20. HMhospitales. Covid Data Save Lives [HM hospitales Web site]. 2020. Available at: Accessed June 13, 2021.
21. Yoon J-H, Yoon SH, Hahn S. Development of an algorithm for evaluating the impact of measurement variability on response categorization in oncology trials. BMC Med Res Methodol. 2019;19:1–14.
22. Mansoor A, Bagci U, Foster B, et al. Segmentation and image analysis of abnormal lungs at CT: current approaches, challenges, and future trends. Radiographics. 2015;35:1056–1076.
23. Taha AA, Hanbury A. Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med Imaging. 2015;15:29.
24. Yan L, Zhang H-T, Goncalves J, et al. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell. 2020;2:283–288.
25. Davidson A. CT equipment market: made in China… and the USA [IHS Markit]. 2018. Available at: Accessed June 13, 2021.
26. Chaganti S, Grenier P, Balachandran A, et al. Automated quantification of CT patterns associated with COVID-19 from chest CT. Radiol Artif Intell. 2020;2:e200048.
27. Chassagnon G, Vakalopoulou M, Battistella E, et al. AI-driven quantification, staging and outcome prediction of COVID-19 pneumonia. Med Image Anal. 2021;67:101860.
28. Goncharov M, Pisov M, Shevtsov A, et al. CT-Based COVID-19 triage: deep multitask learning improves joint identification and severity quantification. Med Image Anal. 2021;71:102054.
29. Pu J, Leader JK, Bandos A, et al. Automated quantification of COVID-19 severity and progression using chest CT images. Eur Radiol. 2021;31:436–446.
30. Yang R, Li X, Liu H, et al. Chest CT severity score: an imaging tool for assessing severe COVID-19. Radiol Cardiothorac Imaging. 2020;2:e200047.
31. Colombi D, Bodini FC, Petrini M, et al. Well-aerated lung on admitting chest CT to predict adverse outcome in COVID-19 pneumonia. Radiology. 2020;296:E86–E96.
32. Matos J, Paparo F, Mussetto I, et al. Evaluation of novel coronavirus disease (COVID-19) using quantitative lung CT and clinical data: prediction of short-term outcome. Eur Radiol Exp. 2020;4:1–10.
33. Park B, Park J, Lim J-K, et al. Prognostic implication of volumetric quantitative ct analysis in patients with COVID-19: a multicenter study in Daegu, Korea. Korean J Radiol. 2020;21:1256.
34. Qin C, Zhou L, Hu Z, et al. Dysregulation of immune response in patients with coronavirus 2019 (COVID-19) in Wuhan, China. Clin Infect Dis. 2020;71:762–768.
35. Tan L, Wang Q, Zhang D, et al. Lymphopenia predicts disease severity of COVID-19: a descriptive and predictive study. Signal Transduct Target Ther. 2020;5:33.
36. Velavan TP, Meyer CG. Mild versus severe COVID-19: laboratory markers. Int J Infect Dis. 2020;95:304–307.
37. Hahm CR, Lee YK, Oh DH, et al. Predictive Parameters for the Worsening Clinical Course of Mild COVID-19 Pneumonia. Research Square. Preprint. Aug 07, 2020. Available at: Accessed March 2, 2022.
38. Yoon SH, Kim M. Anterior pulmonary ventilation abnormalities in COVID-19. Radiology. 2020;297:E276–E277.

COVID-19; pneumonia; deep learning; computed tomography

Copyright © 2022 Wolters Kluwer Health, Inc. All rights reserved.