This study aimed to develop a dual-input convolutional neural network (CNN)–based deep-learning algorithm that utilizes both anteroposterior (AP) and lateral elbow radiographs for the automated detection of pediatric supracondylar fracture
in conventional radiography
, and assess its feasibility and diagnostic performance.
Materials and Methods
To develop the deep-learning model, 1266 pairs of AP and lateral elbow radiographs examined between January 2013 and December 2017 at a single institution were split into a training set (1012 pairs, 79.9%) and a validation set (254 pairs, 20.1%). We performed external tests using 2 types of distinct datasets: one temporally and the other geographically separated from the model development. We used 258 pairs of radiographs examined in 2018 at the same institution as a temporal test set and 95 examined between January 2016 and December 2018 at another hospital as a geographic test set. Images underwent preprocessing, including cropping and histogram equalization, and were input into a dual-input neural network constructed by merging 2 ResNet models. An observer study was performed by radiologists on the geographic test set. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the model and human readers were calculated and compared.
Our trained model showed an AUC of 0.976 in the validation set, 0.985 in the temporal test set, and 0.992 in the geographic test set. In AUC comparison, the model showed comparable results to the human readers in the geographic test set; the AUCs of human readers were in the range of 0.977 to 0.997 (P
's > 0.05). The model had a sensitivity of 93.9%, a specificity of 92.2%, a PPV of 80.5%, and an NPV of 97.8% in the temporal test set, and a sensitivity of 100%, a specificity of 86.1%, a PPV of 69.7%, and an NPV of 100% in the geographic test set. Compared with the developed deep-learning model, all 3 human readers showed a significant difference (P
's < 0.05) using the McNemar test, with lower specificity and PPV in the model. On the other hand, there was no significant difference (P
's > 0.05) in sensitivity and NPV between all 3 human readers and the proposed model.
The proposed dual-input deep-learning model that interprets both AP and lateral elbow radiographs provided an accurate diagnosis of pediatric supracondylar fracture
comparable to radiologists.