Interpretation of lung opacities in ICU supine chest radiographs remains challenging. We evaluated a prototype artificial intelligence algorithm to classify basal lung opacities according to underlying pathologies.
Retrospective study. The deep neural network was trained on two publicly available datasets including 297,541 images of 86,876 patients.
One hundred sixty-six patients received both supine chest radiograph and CT scans (reference standard) within 90 minutes without any intervention in between.
Measurements and Main Results:
Algorithm accuracy was referenced to board-certified radiologists who evaluated supine chest radiographs according to side-separate reading scores for pneumonia and effusion (0 = absent, 1 = possible, and 2 = highly suspected). Radiologists were blinded to the supine chest radiograph findings during CT interpretation. Performances of radiologists and the artificial intelligence algorithm were quantified by receiver-operating characteristic curve analysis. Diagnostic metrics (sensitivity, specificity, positive predictive value, negative predictive value, and accuracy) were calculated based on different receiver-operating characteristic operating points. Regarding pneumonia detection, radiologists achieved a maximum diagnostic accuracy of up to 0.87 (95% CI, 0.78–0.93) when considering only the supine chest radiograph reading score 2 as positive for pneumonia. Radiologist’s maximum sensitivity up to 0.87 (95% CI, 0.76–0.94) was achieved by additionally rating the supine chest radiograph reading score 1 as positive for pneumonia and taking previous examinations into account. Radiologic assessment essentially achieved nonsignificantly higher results compared with the artificial intelligence algorithm: artificial intelligence-area under the receiver-operating characteristic curve of 0.737 (0.659–0.815) versus radiologist’s area under the receiver-operating characteristic curve of 0.779 (0.723–0.836), diagnostic metrics of receiver-operating characteristic operating points did not significantly differ. Regarding the detection of pleural effusions, there was no significant performance difference between radiologist’s and artificial intelligence algorithm: artificial intelligence-area under the receiver-operating characteristic curve of 0.740 (0.662–0.817) versus radiologist’s area under the receiver-operating characteristic curve of 0.698 (0.646–0.749) with similar diagnostic metrics for receiver-operating characteristic operating points.
Considering the minor level of performance differences between the algorithm and radiologists, we regard artificial intelligence as a promising clinical decision support tool for supine chest radiograph examinations in the clinical routine with high potential to reduce the number of missed findings in an artificial intelligence–assisted reading setting.