Secondary Logo

Journal Logo

Original Articles

Automated Detection and Quantification of COVID-19 Airspace Disease on Chest Radiographs

A Novel Approach Achieving Expert Radiologist-Level Performance Using a Deep Convolutional Neural Network Trained on Digital Reconstructed Radiographs From Computed Tomography-Derived Ground Truth

Mortani Barbosa, Eduardo J. Jr MD; Gefter, Warren B. MD; Ghesu, Florin C. PhD; Liu, Siqi PhD; Mailhe, Boris PhD; Mansoor, Awais PhD; Grbic, Sasa PhD; Vogt, Sebastian PhD

Author Information
doi: 10.1097/RLI.0000000000000763

Abstract

Since its emergence in Wuhan, China, in December 2019, the coronavirus Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its associated disease, COVID-19, have spread rapidly throughout the world. In March 2020, the World Health Organization declared the COVID-19 outbreak a pandemic. COVID-19 has resulted in a global health and economic crisis, with millions of cases and deaths, despite substantial containment and mitigation measures.1,2

As COVID-19 creates an unprecedented strain on health care systems, chest imaging (including chest radiography [CXR] and chest computed tomography [CTs]) has been increasingly used in suspected and confirmed cases.3,4 The clinical presentation of COVID-19 varies from mild upper respiratory symptoms to severe dyspnea, fever, and multifocal pneumonia (hallmark of severe disease). The most severe cases result in respiratory failure requiring admission to a critical care unit.4,5

The reference standard diagnosis of COVID-19 relies on identification of the virus on nasopharyngeal swabs via reverse transcriptase-polymerase chain reaction testing (RT-PCR). However, false-negative rates of up to 30% have been reported, particularly in early disease.4–11 Chest radiography and chest CT, therefore, may play an important role in the diagnosis of COVID-19, excluding other acute cardiopulmonary abnormalities, while aiding in patient management. Several studies have detailed the imaging features of COVID-19 pneumonia using CXR and chest CT, with CT being commonly performed in China and South Korea for diagnosis and management.12–19

The most typical chest CT feature of COVID-19-associated pneumonia is airspace disease (AD; consisting of ground-glass opacities with or without consolidations), often bilateral, multifocal, peripheral, and lower lung zone predominant. This pattern resembles SARS, H1N1 influenza, and cytomegalovirus infections more than other common lower respiratory viruses.12–15,20 With disease progression, there may be increasing perilobular linear opacities and consolidation in a pattern characteristic of organizing pneumonia or, in the most severe cases, of diffuse alveolar damage.12,14 Chest radiography and chest CT are increasingly being used to monitor disease progression in hospitalized patients with COVID-19.21–23

Despite lower sensitivity and specificity than CT, CXR is often the first and only imaging study performed in COVID-19 patients and likely will be increasingly used as the number of COVID-19 patients increases.23 Among CXR's roles in the evaluation and management of patients with symptomatic COVID-19 is assessing the severity of lung involvement. This can inform triage decisions for patient management; monitoring of disease course in hospitalized patients, particularly in the intensive care unit (ICU); and risk stratification/prognosis.

Several semiquantitative scoring systems have been developed to measure the extent/severity of COVID-19 lung disease, and these have been shown to have predictive value concerning risk stratification and patient outcomes.24–35 The latter include disease progression, need for ICU admission and ventilatory support, and mortality. In general, these semiquantitative methods require visual estimates of the percentage and type of parenchymal involvement summed across variable numbers of lung zones, steps that are time-consuming and prone to subjectivity.

Computational methods using deep learning offer the potential for more objective, consistent quantification as well as improved efficiency. In addition to their use in the diagnosis of COVID-19 and differentiation from other types of pneumonia on CT and CXR,36–39 deep neural networks have been applied to CT for automatic segmentation of COVID-19-related quantitative biomarkers, such as the percentage of opacity (PO).36,40–43 The COVID-19 Reporting and Data System using Artificial Intelligence (CO-RADS-AI) algorithm36,44 predicts the likelihood of COVID-19 diagnosis (CO-RADS score), as well as provides a CT severity score reflecting the degree of parenchymal involvement by lobe.

Although deep neural networks have also been used to diagnose COVID-19 from CXR, most studies formulated the problem simply as an image-wise binary classification.39,45,46 Automatically quantifying COVID-19 severity from CXR remains challenging, given the difficulty of establishing high confidence in training the neural networks. A deep learning algorithm measuring the extent of COVID-19 lung involvement, trained on CXRs and using radiology reports as the gold standard, has been shown to be an independent predictor of adverse patient outcomes (ICU admission, mortality) and comparable with a semiquantitative scoring method.47

Here, we report a novel deep learning approach providing a method for CXR quantification of COVID-19 lung involvement using a convolutional neural network (CNN) trained on radiographic images derived directly from CT scans (digitally reconstructed radiographs [DRRs]) and thus based upon a superior ground truth. Our goals were (1) to assess the accuracy of COVID-19 AD quantification on CXR by this AI CNN against CT-derived AD volumetric quantification and AD area quantification projected on DRRs and (2) to compare the performance of COVID-19 AD quantification by the AI CNN on CXR against expert human readers, using as ground-truth AD volumetric quantification on CT.

MATERIALS AND METHODS

Patient and Image Selection

This single-institution retrospective study obtained institutional review board approval with waiver of informed consent and was Health Insurance Portability and Accountability Act compliant. We randomly selected 86 patients with the following inclusion criteria: positive RT-PCR for SARS-CoV-2 and a pair of CXR and chest CT performed within 48 hours of each other. All scans were obtained from March to May 2020. Figure 1 details the study design and inclusion and exclusion criteria. Table 1 details the demographics and clinical features of the cohort.

F1
FIGURE 1:
Flow diagram with inclusion and exclusion criteria in our cohort (n = 86).
TABLE 1 - Demographics of the Cohort (n = 86)
Total (n = 86)
Age, mean (range), y 59 (25–93)
Sex
 Male 51% (44/86)
 Female 49% (42/86)
Race/ethnicity
 White non-Hispanic 14% (12/86)
 White Hispanic 1% (1/86)
 Black or African American 75% (64/86)
 Black Hispanic 1% (1/86)
 Asian 2% (2/86)
 East Indian 1% (1/86)
 Other/unknown 6% (5/86)
Smoking status
 Never smoked 8% (41/86)
 Former smoker 27% (23/86)
 Current smoker 8% (7/86)
 Unknown 17% (15/86)
Outcome
 Discharged 472% (62/86)
 Deceased 23% (20/86)
 Still admitted (as of July 13, 2020) 5% (4/86)

The CXR and CT images were deidentified using a standard anonymization profile in Sectra PACS and transferred through a secure file exchange to a computational cluster for imaging processing.

All CXRs were digital radiographs performed in the anterior-posterior (AP) projection. Chest CTs were acquired either with or without intravenous contrast (the former to evaluate for pulmonary embolism as well); however, the CT reconstruction parameters are otherwise essentially identical. All CT protocols used kVp automated selection (range, 100–120) and tube current modulation (reference mAs of 100) to minimize radiation dose, with slight differences according to the CT scanner model and manufacturers. All protocols used iterative reconstruction with high spatial resolution algorithms and 1-mm slice thickness without interslice gap to optimize lung parenchymal assessment.

Expert Human Quantification of AD on CXR

Two expert readers (subspecialty trained thoracic radiologists with 12 and >30 years of experience) read each CXR independently, blinded to the results of the paired chest CT or any clinical information, except for being aware that each patient tested positive for SARS-CoV-2 by RT-PCR. Each reader segmented AD on a single frontal CXR using manual annotations on ITK-Snap.48 Although readers had no access to previous comparisons, they exercised their best judgment to separate AD from presumably chronic findings such as scarring.

CT 3-Dimensional Whole-Lung and 3-Dimensional Volumetric Segmentation of AD

All 86 CT datasets were annotated manually by a trained annotator with 2 years of experience in annotating pulmonary abnormalities in chest CT. The annotations were supervised and revised by a radiologist with 6 years of experience in chest CT. The CT readers were instructed to segment AD (ground-glass opacities and consolidations) and to exclude masses, nodules, and areas of fibrosis or scarring. A subset of 13 outlier cases was further reviewed by 2 expert thoracic radiologists. ITK-Snap was used for manual AD segmentation.48

Creation of DRRs and Projection of CT-Derived 3-Dimensional Volume Quantification to 2-Dimensional Area Quantification of Airspace Opacity

Digitally reconstructed radiographs are generated as integrals over synthetic projection lines through the CT volume under a parallel projection geometry. The resolution of the resulting DRR is enhanced using a deep super-resolution CNN built with stacked DenseNet49 blocks and voxel interpolation layers, yielding isotropic resolution and reducing the difference in resolution between DRR and digital CXRs. The super-resolution CNN was trained on 2650 DRRs mixed with COVID-related abnormalities, non-COVID-related abnormalities and healthy control subjects. Finally, a frequency subband normalization50 is applied to reduce the noise and increase the contrast of the image (Fig. 2). Supplemental Figure 1, https://links.lww.com/RLI/A595 demonstrates the comparative appearance of the DRR versus the respective CXR, for 2 patient examples.

F2
FIGURE 2:
Schematic illustration of the data flow for 3D AD POv quantification and projection on super resolution up-sampled DRR for POa quantification based on CT-derived 3D AD POv mask.

The whole lung segmentation and 3-dimensional (3D) AD segmentation in chest CTs are converted to volumetric masks, which can be projected in the 2D DRR in 2 different ways:

  1. Anteroposterior thickness projection: For both whole lung and AD segmentation, the corresponding binary mask from CT can be projected using a line integral that measures the depth/thickness of the mask along the AP axis under a parallel projection geometry.
  2. Anteroposterior intensity projection: Similar to above using intensity values of the CT voxels rather than depth/thickness of the mask.

Segmentation of Whole Lungs on DRR and CXR

To train a lung segmentation neural network for the DRRs, we used the anteroposterior thickness projection derived from the 3D CT lung segmentation to establish the ground-truth annotations for DRR. A binarization of the projected mask is performed at a cutoff value of 38 mm, established based on a qualitative assessment of optimal visibility of the lung boundary for a range of values. Ground-glass opacities and consolidations were considered equal in the projected masks. A deep convolutional segmentation network51 is used to learn the mapping between input DRR and the established binary masks defining the area of the lungs. The trained network is also used for predicting the binary lung masks on the CXRs without further training.

Segmentation of COVID-19 AD on DRR and CXR

We performed an intensity projection of the 3D ground-truth mask describing the COVID-19 affected lung parenchyma. The resulting projection image threshold is set at 25,000 projected Hounsfield units (HU) to obtain a binary mask for a given DRR. The value of the threshold is selected such that the average mean absolute error (MAE) between PO–volume (POv) (ground truth on CT) and PO–area (POa) on DRR is minimized. A deep CNN is used to learn the mapping between input DRR and the binary masks defining the area affected by AD. The output of the system is represented as a pixel-wise probability map of AD and constrained to the estimated area of the lung parenchyma. Given that the selected threshold does not guarantee that every lesion quantified in the 3D CT is visible in the DRR, there is a certain degree of label noise that can lead to a per-sample bias in the model estimation. We mitigated this limitation by training an ensemble of models and averaging the output of each to obtain an improved estimate.52 Similar to lung segmentation on CXR, the trained DRR lesion segmentation network was applied to the CXRs without further training. The CNN used in this work resembles the UNet architecture.51 It is composed of an encoder based on the ResNet1853 and a decoder with skip connections. The encoder parameters were pretrained on the ImageNet.54 All other network parameters were initialized randomly. The initial convolutional layer of the encoder with the kernel size of 7 × 7 and stride 2 is followed by 4 residual blocks with ReLU activation, outputting 64, 128, 256, and 512 feature maps, respectively. Each block halves the resolution of the input feature maps using strided convolution. The decoder is composed of 5 convolutional blocks consisting of a layer concatenating the upsampled output feature maps of the previous decoder block with the encoder output feature maps with same resolution followed a convolutional layer and a ReLU activation. The final layer projects using softmax activation the output feature maps of the last decoder block to a 2-channel output map of the same size as the input image. The same architecture is used for both the lesion segmentation model as well as the lung segmentation model. We use the Generalized Dice as the loss function.55 ADAM with AMS-Grad56 was used to optimize the loss function with a batch size of 4 samples. The learning rate is 0.001. To avoid overfitting, we used a validation dataset containing 182 DRRs to select the model with early-stopping. The processing time of the system to compute POa, including segmenting both the lungs and the lesions, was 52 milliseconds per CXR on an RTX 2080TI GPU.

COVID-19 AD Severity Measures

The metrics used to assess the severity of AD are detailed below.

Percentage of opacity–volume: The POv is measured on CT scans and quantifies the percentage volume of the lung parenchyma that is affected by AD:

POv=100×Volume of Airspace DiseaseTotal Lung Parenchyma Volume

Percentage of opacity–area: The POa is measured on DRRs and CXRs and quantifies the percentage area of the lung parenchyma that is affected by AD:

POa=100×Area of Airspace DiseaseTotal of Lung Parenchyma Area

The MAE conveys the difference between POa and POv for a pair of CXR and CT scan of the same patient acquired within 48 hours.

In addition to the POv and POa values, we computed the spatial distribution of AD (Table 3) along the AP, cranio-caudal, and transverse symmetry axes, as well as between central and peripheral zones in CT, DRR, and CXR. For splitting central versus peripheral, a 3D mesh that splits the lungs into a 75%:25% ratio of central/peripheral volumes was obtained by progressively shrinking the original lung surfaces homogeneously using a distance transform.

TABLE 3 - Spatial Distribution of AD in the Cohort
POv From CT
Inferior: 12% Superior: 9%
Anterior: 4% Posterior: 16%
Right: 11% Left: 9%
Central: 16% Rind: 4%
POa From DRR—CT Projection
Inferior: 26% Superior: 18%
Anterior: n/a Posterior: n/a
Right: 24% Left: 20%
Central: 37% Rind: 7%
POa From DRR—CNN Prediction
Inferior: 6% Superior: 18%
Anterior: n/a Posterior: n/a
Right: 14% Left: 9%
Central: 21% Rind: 3%
AD indicates airspace disease; POv, percentage of opacity–volume; CT, computed tomography; POa, percentage of opacity–area; DRR, digitally reconstructed radiograph; CNN, convolutional neural network.

Statistical Analysis

The overall study design is depicted in Figure 3, which depicts the testing set. We compared the POa values derived from the 2D annotations on CXR or DRR against the POv derived from the 3D CT annotations to calculate MAE and Pearson correlation coefficients, which were reported also with 95% bootstrapping with 1000 times resampling. The human expert POas were obtained using the average (Reader Avg), the intersection (Reader Intersect), and the union (Reader Union) of the expert annotations. Both the single-network CNN system (Sole System) and the ensemble CNN system (Ensemble CNN) were used for computing the POa automatically on the DRR and CXR images. The ensemble CNN system averages the predicted POa values from 3 CNNs with the same architecture but different parameter initializations. The related sample t test with 1-tailed P values assessed statistical significance (P < 0.05) between each pair of expert POa results and CNN system POa results. The statistical analysis was implemented in Python 3.6.1 (statsmodels v 0.12 package).

F3
FIGURE 3:
Schematic illustration of the study design with comparison of expert reader annotation on CXR, CNN prediction on CXR, CNN prediction on DRR, DRR POa derived from CT projection, and CT-derived POv (ground truth). All the data components shown in this diagram are in the testing set.

RESULTS

Our cohort of 86 patients, who all had a positive confirmatory test (RT-PCR) for SARS-CoV-2, had a mean age of 59 years (range, 25–93 years) and balanced sex distribution. Of note, African Americans comprised 75% of the cohort. Almost half of the cohort were never-smokers (48%). We assessed outcomes until September 2020. Most of the cohort recovered and have been discharged (72%) from the hospital, whereas 23% were deceased due to COVID-19 manifestations or related complications, and 5% remain in the hospital at the time of this writing (Table 1, Fig. 1).

The multicontinent training dataset used for generating the DRRs contains 1929 CT images obtained from multiple sites as described in the Supplementary Table, https://links.lww.com/RLI/A596. The training set contains 1005 COVID-19 patients, 267 patients with different types of interstitial lung diseases, 147 patients with other types of non-COVID-19 pneumonia, and 510 healthy control patients. We added 727 control CXR images to improve accuracy on mildly affected patients. A validation set consisting of 182 CT images was randomly chosen for model selection (Supplemental Table, https://links.lww.com/RLI/A596).

For our study cohort, the distributions of AD measured via POv (ground-truth POv) and POa (DRR POa) are summarized in Table 2, and the spatial distribution of AD, in Table 3. Our cohort range of AD was 0% to 92%–97% (DRR POa and POv, respectively), with mean (SD) of 23% (21%) (POv) and 49% (26%) (DRR POa). The average spatial distribution of AD was computed along multiple axes of symmetry: anterior (4%) versus posterior (16%) for CT only, as well as right (11%–24%) versus left (9%–20%), inferior (6%–26%) versus superior (9%–18%), and central (16%–37%) versus rind (3%–7%) for CT, DRR, and CXR. The MAE and Pearson coefficient between the ground-truth DRR POa and POv are 5.27% and 0.97, respectively.

TABLE 2 - CT and CXR Characteristics of the Cohort Regarding Severity of POv and POa (% of AD Involvement)
POv and POa Percentage Involvement of Segmented Lung Parenchyma
Modality Min Max Mean Standard Deviation
Ground-truth POv CT 0 97% 23% 21%
DRR POa DRR 0 92% 49% 26%
CT indicates computed tomography; CXR, chest radiography; POv, percentage of opacity–volume; POa, percentage of opacity–area; AD, airspace disease; DRR, digitally reconstructed radiograph.

The ground-truth POv in our cohort of 86 patients is 4.44% greater than the ground-truth POa on average. There were 16 of 86 patients (18.61%) with POa > POv, 65 of 86 patients (75.58%) with POa < POv, and 5 patients (5.81%) with POa = POv = 0. Supplemental Figures 2, https://links.lww.com/RLI/A597 and 3, https://links.lww.com/RLI/A598 demonstrate error analysis cases in which the CNN overestimated AD on DRR (POa DRR CNN), in comparison with the ground truth from the CT projection (POa DRR GT). Supplemental Figures 4, https://links.lww.com/RLI/A599 and 5, https://links.lww.com/RLI/A600 demonstrate error analysis cases in which there was underestimation.

The 2 expert human readers demonstrated high interreader agreement (r = 0.82) for AD quantification on CXRs. Against the ground truth of CT-derived POv, the MAE of the average human reader was 11.98% (13.15% and 12.14% for readers 1 and 2, respectively), with a correlation of 0.77 (0.70 and 0.76 for readers 1 and 2, respectively). The MAEs of the reader intersection and union were 15.91% and 11.04%, with correlations of 0.73 and 0.77, respectively. Figures 4 to 6 demonstrate patient examples of AD quantification on CT, DRR projection, DRR CNN prediction, expert readers on CXR, and CXR CNN prediction. Because it is nontrivial to evaluate the pixel-wise overlap between the ground-truth segmentation on DRR and the results obtained on CXR, we computed an overlap score (OS) and the pixel-wise segmentation accuracy (ACC) for measuring (1) the interreader variability on CXR (OS = 0.693, ACC = 0.965); (2) the pixel-wise agreement between the readers and the single CNN system (OS = 0.742, ACC = 0.957); (3) the agreement between the single CNN system and the DRR ground truth (OS = 0.714, ACC = 0.974). The OS was computed by averaging the dice scores on the foreground and the background area within the lung regions. The system segmentation performance on DRR was in the same range as the interreader variability on CXR.

F4
FIGURE 4:
CT-derived 3D volume (POv) airspace quantification (ground truth) on axial (A, E), sagittal (B, F), and coronal (C, G) MPRs and on VR mask (D, H) for 2 patients in our cohort.
F5
FIGURE 5:
DRR (A), intensity/AP thickness AD mask from CT (B), DRR + AD mask (C), DRR CNN prediction (D), CXR (E), CXR reader 1 (F), CXR reader 2 (G), and CXR CNN prediction (H) for an example patient in our cohort.
F6
FIGURE 6:
DRR (A), intensity/AP thickness AD mask from CT (B), DRR + AD mask (C), DRR CNN prediction (D), CXR (E), CXR reader 1 (F), CXR reader 2 (G), and CXR CNN prediction (H) for another example patient in our cohort.

As shown in Table 4, the single CNN quantified POa on CXR achieved a MAE of 9.78% and correlation of 0.78 compared with the ground-truth POv derived from CT. With the model ensemble, the MAE was further reduced to 9.56%, whereas the correlation was increased to 0.81. Comparing to the CXR results, the CNN same systems achieved lower MAE (9.26% and 7.72%) and higher correlation (0.86 and 0.87) on DRR images. As shown in Table 5, statistical Student t test comparing the average, union, and intersection of the CXR readers with the CNN prediction on CXR demonstrated that the ensemble CNN overall performed statistically better than the expert human readers when compared with the POv ground truth (P = 0.01). The superiority of the ensemble CNN system over the single CNN system is not statistically significant (P = 0.35).

TABLE 4 - 95% Bootstrapped Mean Absolute Errors (MAEs) and Pearson Coefficients of the CNN Systems and the Expert Readers on CXR Against the Ground Truth of POv Measured on CT
Description Modality 95% Bootstrap MAE 95% Bootstrap Pearson
Reader avg CXR CXR 11.98% (11.05%–12.47%) 0.77 (0.70–0.82)
Reader inter CXR CXR 15.91% (14.68%–16.63%) 0.73 (0.66–0.78)
Reader union CXR CXR 11.04% (10.22%–11.50%) 0.77 (0.69–0.82)
Single CNN DRR DRR 9.27% (8.56%–9.67%) 0.86 (0.84–0.90)
Ensemble CNN DRR DRR 7.73% (6.74%–8.07%) 0.87 (0.84–0.92)
Single CNN CXR CXR 9.78% (8.94%–10.22%) 0.78 (0.73–0.82)
Ensemble CNN CXR CXR 9.56% (8.83%–10.00%) 0.81 (0.76–0.85)
CNN indicates convolutional neural network; CXR, chest radiograph; POv, percentage of opacity–volume; CT, computed tomography; DRR, digitally reconstructed radiograph.

TABLE 5 - Statistical Comparison of Readers (Average, Intersection, Union) Versus CNNs (Single, Ensemble) for Quantification of POa on CXR
t P
Reader avg vs single CNN 1.68 0.05
Reader avg vs ensemble CNN 2.46 0.01*
Reader intersect vs single CNN 3.71 <0.01*
Reader intersect vs ensemble CNN 4.68 <0.01*
Reader union vs single CNN 1.31 0.10
Reader union vs ensemble CNN 1.90 0.03*
Single CNN vs ensemble CNN 0.38 0.35
*P < 0.05.
t Scores and the 1-tailed P values from the related sample t test.
CNN indicates convolutional neural network; POa, percentage of opacity–area; CXR, chest radiography.

DISCUSSION

The rising number of COVID-19 cases will likely be paralleled by an increasing number of CXRs performed for diagnostic evaluation, underscoring the need to maximize CXR use and value and reduce variability in interpretation in suspected and confirmed COVID-19 patients. Although neither as sensitive nor as specific, CXRs will be used in much greater numbers than chest CTs, further supporting the need to augment CXR capability in the COVID-19 pandemic.57,58 Airspace disease is the hallmark of pulmonary involvement in COVID-19, supporting the concept that AD quantification on CT and CXR carries diagnostic (including in differential diagnosis) and potential prognostic value.31,40,59

Quantitative assessment of the extent of AD on CXR by deep CNN, as opposed to subjective evaluation by human readers, possesses 4 major strengths: (1) it provides disease severity quantification, not just binary output of disease present/absent, thereby carrying potential prognostic and management implications23; (2) it increases the consistency of AD evaluation over human readers, given high interreader and even intrareader variability; (3) it can increase reading efficiency because most computational algorithms can generate results in a small fraction of the time a human reader would take to perform a similar task; and (4) it provides an objective measure to monitor patient evolution.

Although previous studies attempted to obtain severity scores and predict patient outcomes using manually generated semiquantitative subjective scoring systems,13,25,35,47 most AI publications on COVID-19 CXR imaging lack explicit AD quantification, instead of focusing on classifying images based on patient RT-PCR status (positive vs negative) and using primarily CTs rather than much more widely available CXRs.37,46,60–62 Moreover, when CXRs have been used for quantification of AD, the ground truth has been human annotation on the CXRs, which depends on considerable expertise and carries a nonnegligible error, even when performed by highly experienced expert thoracic radiologists.

Our research is innovative as it proposes leveraging a superior modality (CT) to provide a much more accurate ground truth of AD quantification than can be obtained from CXR, while gauging the performance of human readers and deep CNN in quantifying AD on less accurate CXRs. For that purpose, a crucial element is to project the 3D volume into a 2D coronal image that is as similar as possible to a CXR, which we labeled DRRs, and which to our knowledge is unique in the COVID-19 literature. This step introduces an intrinsic error due to the information loss resulting from the conversion of 3D volumetric AD (in CT) into 2D area AD (in DRR). Nonetheless, by knowing the ground truth derived from CT, it is possible to obtain a binarization threshold on either the thickness or the intensity projection maps using the swipe search in the training dataset to minimize this intrinsic error. Without considering the image intensity information lost because of the projection, the intrinsic error was estimated to be at most 5.72% in our cohort. Although the CNNs were not explicitly optimized to output POv, the optimized training target helped lower the theoretical MAE bound between POa and POv for the CNN systems.

Our ensemble CNN system, in particular, performed significantly better than the average of 2 highly trained expert human readers, with an MAE of 9.56%, which is 3.84% higher than the estimated lower boundary. The ensemble CNN was shown to outperform the single network with −0.2% MAE on the cohort, although the difference is not statistically significant. A clinically meaningful metric was the time required for the CNN to compute AD on each CXR, 52 milliseconds per radiograph, versus several minutes for the human readers.

Our study has several limitations. Given that we selected a subset of patients who fulfilled multiple inclusion criteria (positive RT-PCR for SARS-CoV-2 and paired chest CT and CXR performed within 48 hours of each other), and because of the single-center retrospective design, we have a relatively small sample size. Our small sample lacked a control group who tested negative for SARS-CoV-2 by RT-PCR, precluding evaluation of diagnostic utility; however, our sample contained the entire range of AD (0%–97%), allowing full assessment of the performance of our algorithms from normal to very severe pulmonary involvement. The AD annotation on CT was performed by 2 human readers, without automated algorithms; however, outlier cases were secondarily reviewed by 2 additional expert radiologists and corrected. Although every patient in the cohort had COVID-19 at the time the CXR and CT were obtained, it is possible that not all AD detected was a manifestation of COVID-19, although every reader attempted to separate AD from other patterns of disease that are presumably chronic. Similarly, the CNN system also cannot exclude all the findings that are not related to COVID-19 without previous imaging information and a knowledge model of the data generation mechanism. The system is therefore designed for AD quantification only, rather than for differential diagnosis.

In summary, we have devised a novel approach to improve the CXR quantification of AD in patients with COVID-19, leveraging quantification derived from a superior modality (CT) via the novel intermediate step of projecting the CT-derived AD mask into parallel projection coronal DRR. This approach provides a better ground truth, with a more accurate and quantitative understanding of the error accrued by both human readers and a CNN applied to CXR for AD 2D quantification. Furthermore, we showed that CNN is at least as accurate as expert human readers for the task of CXR-based AD 2D quantification. Such a system, when deployed in a high-volume clinical setting, could substantially increase the consistency of CXR interpretations, and reduce reporting times, improving radiologist efficiency and throughput. Moreover, by providing quantitative measurements of the extent and spatial distribution of AD that correlate with physiologic impairment, such a system could guide patient management and generate prognostic information. This is particularly true when applied longitudinally on serial CXRs, as it can objectively quantify disease course over time. This approach may generate a prognostic imaging biomarker in predicting the need for ICU admission and mortality.

ACKNOWLEDGMENTS

We gratefully acknowledge the contributions of multiple frontline hospitals to this collaboration. In addition, we would like to acknowledge Rochelle Yang, for her contributions regarding data selection, curation, and manuscript revision; Sebastian Piat, Guillaume Chabin, Vishwanath RS, and Abishek Balachandran, for their contributions regarding data analysis; and Steffen Kappler and Dorin Comaniciu, for their contributions regarding study design and execution.

REFERENCES

1. Novel Coronavirus (2019-nCov). World Health Organization. March 29, 2020. https://www.who.int/emergencies/diseases/novel-coronavirus-2019. Accessed July 24, 2020.
2. Situation report-69. World Health Organization. March 29, 2020. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200329-sitrep-69-covid-19.pdf?sfvrsn=8d6620fa_4. Accessed July 24, 2020.
3. Coronavirus disease 2019 (COVID-19)—evaluating and testing PUI. Centers for Disease Control and Prevention. March 29, 2020. https://www.cdc.gov/coronavirus/2019-nCoV/hcp/clinical-criteria.html. Accessed July 24, 2020.
4. Wu Z, McGoogan JM. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA. 2020;323:1239–1242.
5. Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497–506.
6. Xu XW, Wu XX, Jiang XG, et al. Clinical findings in a group of patients infected with the 2019 novel coronavirus (SARS-Cov-2) outside of Wuhan, China: retrospective case series. BMJ. 2020;368:m606.
7. Ai T, Yang Z, Hou H, et al. Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology. 2020;296:E32–E40.
8. Tahamtan A, Ardebili A. Real-time RT-PCR in COVID-19 detection: issues affecting the results. Expert Rev Mol Diagn. 2020;20:453–454.
9. Long C, Xu H, Shen Q, et al. Diagnosis of the coronavirus disease (COVID-19): rRT-PCR or CT?Eur J Radiol. 2020;126:108961.
10. He JL, Luo L, Luo Z, et al. Diagnostic performance between CT and initial real-time RT-PCR for clinically suspected 2019 coronavirus disease (COVID-19) patients outside Wuhan, China. Respir Med. 2020;168:105980.
11. Waller JV, Allen IE, Lin KK, et al. The limited sensitivity of chest computed tomography relative to reverse transcription polymerase chain reaction for severe acute respiratory syndrome coronavirus-2 infection: a systematic review on COVID-19 diagnostics. Invest Radiol. 2020;55:754–761.
12. Rodriguez-Morales AJ, Cardona-Ospina JA, Gutiérrez-Ocampo E, et al. Clinical, laboratory and imaging features of COVID-19: a systematic review and meta-analysis. Travel Med Infect Dis. 2020;34:101623.
13. Yoon SH, Lee KH, Kim JY, et al. Chest radiographic and CT findings of the 2019 novel coronavirus disease (COVID-19): analysis of nine patients treated in Korea. Korean J Radiol. 2020;21:494–500.
14. Xu X, Yu C, Qu J, et al. Imaging and clinical features of patients with 2019 novel coronavirus SARS-CoV-2. Eur J Nucl Med Mol Imaging. 2020;47:1275–1280.
15. Albarello F, Pianura E, Di Stefano F, et al. 2019-novel coronavirus severe adult respiratory distress syndrome in two cases in Italy: an uncommon radiological presentation. Int J Infect Dis. 2020;93:192–197.
16. Li K, Wu J, Wu F, et al. The clinical and chest CT features associated with severe and critical COVID-19 pneumonia. Invest Radiol. 2020;55:327–331.
17. Lyu P, Liu X, Zhang R, et al. The performance of chest CT in evaluating the clinical severity of COVID-19 pneumonia: identifying critical cases based on CT characteristics. Invest Radiol. 2020;55:412–421.
18. Wu J, Wu X, Zeng W, et al. Chest CT findings in patients with coronavirus disease 2019 and its relationship with clinical features. Invest Radiol. 2020;55:257–261.
19. Xiong Y, Sun D, Liu Y, et al. Clinical and high-resolution CT features of the COVID-19 infection: comparison of the initial and follow-up changes. Invest Radiol. 2020;55:332–339.
20. Koo HJ, Lim S, Choe J, et al. Radiographic and CT features of viral pneumonia. Radiographics. 2018;38:719–739.
21. Simpson S, Kay FU, Abbara S, et al. Radiological Society of North America expert consensus statement on reporting chest CT findings related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA—secondary publication. J Thorac Imaging. 2020;35:219–227.
22. Jin YH, Cai L, Cheng ZS, et al; for the Zhongnan Hospital of Wuhan University Novel Coronavirus Management and Research Team, Evidence-Based Medicine Chapter of China International Exchange and Promotive Association for Medical and Health Care (CPAM). A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-nCoV) infected pneumonia (standard version). Mil Med Res. 2020;7:4.
23. Yuan M, Yin W, Tao Z, et al. Association of radiologic findings with mortality of patients infected with 2019 novel coronavirus in Wuhan, China. PLoS One. 2020;15:e0230548.
24. Cozzi D, Albanesi M, Cavigli E, et al. Chest X-ray in new coronavirus disease 2019 (COVID-19) infection: findings and correlation with clinical outcome. Radiol Med. 2020;125:730–737.
25. Borghesi A, Maroldi R. COVID-19 outbreak in Italy: experimental chest X-ray scoring system for quantifying and monitoring disease progression. Radiol Med. 2020;125:509–513.
26. Borghesi A, Zigliani A, Golemi S, et al. Chest X-ray severity index as a predictor of in-hospital mortality in coronavirus disease 2019: a study of 302 patients from Italy. Int J Infect Dis. 2020;96:291–293.
27. Borghesi A, Zigliani A, Masciullo R, et al. Radiographic severity index in COVID-19 pneumonia: relationship to age and sex in 783 Italian patients. Radiol Med. 2020;125:461–464.
28. Yasin R, Gouda W. Chest X-ray findings monitoring COVID-19 disease course and severity. Egypt J Radiol Nucl Med. 2020;51:193.
29. Orsi MA, Oliva G, Toluian T, et al. Feasibility, reproducibility, and clinical validity of a quantitative chest x-ray assessment for COVID-19. Am J Trop Med Hyg. 2020;103:822–827.
30. Bagnera S, Bisanti F, Tibaldi C, et al. Performance of radiologists in the evaluation of the chest radiography with the use of a “new software score” in coronavirus disease 2019 pneumonia suspected patients. J Clin Imaging Sci. 2020;10:40.
31. Baratella E, Crivelli P, Marrocchio C, et al. Severity of lung involvement on chest X-rays in SARS-coronavirus-2 infected patients as a possible tool to predict clinical progression: an observational retrospective analysis of the relationship between radiological, clinical, and laboratory data. J Bras Pneumol. 2020;46:e20200226.
32. Toussie D, Voutsinas N, Finkelstein M, et al. Clinical and chest radiography features determine patient outcomes in young and middle-aged adults with COVID-19. Radiology. 2020;297:E197–E206.
33. Xiao N, Cooper JG, Godbe JM, et al. Chest radiograph at admission predicts early intubation among inpatient COVID-19 patients. Eur Radiol. 2020;1–8.
34. Balbi M, Caroli A, Corsi A, et al. Chest X-ray for predicting mortality and the need for ventilatory support in COVID-19 patients presenting to the emergency department. Eur Radiol. 2020;1–14.
35. Wong HYF, Lam HYS, Fong AH, et al. Frequency and distribution of chest radiographic findings in patients positive for COVID-19. Radiology. 2020;296:E72–E78.
36. Lessmann N, Sánchez CI, Beenen L, et al. Automated assessment of COVID-19 reporting and data system and chest CT severity scores in patients suspected of having COVID-19 using artificial intelligence. Radiology. 2021;298:E18–E28.
37. Li L, Qin L, Xu Z, et al. Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology. 2020;296:E65–E71.
38. Hwang EJ, Kim H, Yoon SH, et al. Implementation of a deep learning–based computer-aided detection system for the interpretation of chest radiographs in patients suspected for COVID-19. Korean J Radiol. 2020;21:1150–1160.
39. Ozturk T, Talo M, Yildirim EA, et al. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput Biol Med. 2020;121:103792.
40. Chaganti S, Balachandran A, Chabin G, et al. Quantification of tomographic patterns associated with COVID-19 from chest CT. arXiv. Published online April 2, 2020. arXiv:2004.01279v5.
41. Liu S, Georgescu B, Xu Z, et al. 3D tomographic pattern synthesis for enhancing the quantification of COVID-19. arXiv. Published online May 5, 2020. doi:eess.IV/2005.01903.
42. Shan F, Gao Y, Wang J, et al. Lung infection quantification of COVID-19 in CT images with deep learning. arXiv. Published online March 10, 2020. doi:cs.CV/2003.04655.
43. Zhang K, Liu X, Shen J, et al. Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell. 2020;181:1423–1433.e11.
44. Prokop M, van Everdingen W, van Rees Vellinga T, et al. CO-RADS: a categorical CT assessment scheme for patients suspected of having COVID-19-definition and evaluation. Radiology. 2020;296:E97–E104.
45. Narin A, Kaya C, Pamuk Z. Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks. arXiv. Published online March 24, 2020. doi: eess.IV/2003.10849.
46. El Asnaoui K, Chawki Y. Using X-ray images and deep learning for automated detection of coronavirus disease. J Biomol Struct Dyn. 2020;1–12.
47. Mushtaq J, Pennella R, Lavalle S, et al. Initial chest radiographs and artificial intelligence (AI) predict clinical outcomes in COVID-19 patients: analysis of 697 Italian patients. Eur Radiol. 2020;1–10.
48. Yushkevich PA, Gao Y, Gerig G. ITK-SNAP: an interactive tool for semi-automatic segmentation of multi-modality biomedical images. Annu Int Conf IEEE Eng Med Biol Soc. 2016;2016:3342–3345.
49. Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017:2261–2269. doi:10.1109/CVPR.2017.243.
50. Philipsen RH, Maduskar P, Hogeweg L, et al. Localized energy-based normalization of medical images: application to chest radiography. IEEE Trans Med Imaging. 2015;34:1965–1975.
51. Ronnenberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. arXiv. Published online May 18, 2015. doi: arXiv: 1505.04597.
52. Lakshminarayan B, Prizel A, Blundell C. Simple and scalable predictive uncertainty estimation using deep ensembles. arXiv. Published online November 4, 2017. doi: arXiv: 1612.01474v3.
53. He K, Zhang X, Ren S, et al. Identify Mappings in Deep Residual Networks. ArXiv. Published online March 16, 2016. doi: arXiv:1603.05027v3.
54. Deng J, Dong R, Socher L, et al. ImageNet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009:248–255. doi:1109/CVPR.2009.5206848.
55. Sudre CH, Li W, Vercauteren T, et al. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. 2017. Available at: https://arxiv.org/abs/1707.03237v3.
56. Bock S, Goppold J, Weiß M. An improvement of the convergence proof of the ADAM-Optimizer. ArXiv. Published online April 27, 2018. doi: arXiv: 1804.10587v1.
57. ACR recommendations for the use of chest radiography and computed tomography (CT) for suspected COVID-19 infection. American College of Radiology. March 11, 2020. https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection. Accessed July 24, 2020.
58. Rubin GD, Ryerson CJ, Haramati LB, et al. The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society. Radiology. 2020;296:172–180.
59. Chua F, Armstrong-James D, Desai S, et al. The role of CT in case ascertainment and management of COVID-19 pneumonia in the UK: insights from high-incidence regions. Lancet Respir Med. 2020;8:438–440.
60. Gozes O, Frid-Adar M, Greenspan H, et al. Rapid AI development cycle for the coronavirus (COVID-19) pandemic: initial results for automated detection & patient monitoring using deep learning CT image analysis. ArXiv. Published online March 10, 2020. doi: 2003.05037.
61. Ying S, Shuangjia Z, Li L, et al. Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images. MedRxiv. 2020;1–10.
62. Kundu S, Elhalawani H, Gichoya J, et al. How might AI and chest imaging help unravel COVID-19's mysteries?Radiol Artif Intell. 2020;2:e200053.
Keywords:

artificial intelligence; deep learning; radiography, thoracic; tomography; COVID-19

Copyright © 2021 Wolters Kluwer Health, Inc. All rights reserved.