This study aimed to compare accuracy and efficiency of a convolutional neural network (CNN)–enhanced workflow for pancreas segmentation versus radiologists in the context of interreader reliability.
Volumetric pancreas segmentations on a data set of 294 portal venous computed tomographies were performed by 3 radiologists (R1, R2, and R3) and by a CNN. Convolutional neural network segmentations were reviewed and, if needed, corrected (“corrected CNN [c-CNN]” segmentations) by radiologists. Ground truth was obtained from radiologists' manual segmentations using simultaneous truth and performance level estimation algorithm. Interreader reliability and model's accuracy were evaluated with Dice-Sorenson coefficient (DSC) and Jaccard coefficient (JC). Equivalence was determined using a two 1-sided test. Convolutional neural network segmentations below the 25th percentile DSC were reviewed to evaluate segmentation errors. Time for manual segmentation and c-CNN was compared.
Pancreas volumes from 3 sets of segmentations (manual, CNN, and c-CNN) were noninferior to simultaneous truth and performance level estimation–derived volumes [76.6 cm3 (20.2 cm3), P < 0.05]. Interreader reliability was high (mean [SD] DSC between R2-R1, 0.87 [0.04]; R3-R1, 0.90 [0.05]; R2-R3, 0.87 [0.04]). Convolutional neural network segmentations were highly accurate (DSC, 0.88 [0.05]; JC, 0.79 [0.07]) and required minimal-to-no corrections (c-CNN: DSC, 0.89 [0.04]; JC, 0.81 [0.06]; equivalence, P < 0.05). Undersegmentation (n = 47 [64%]) was common in the 73 CNN segmentations below 25th percentile DSC, but there were no major errors. Total inference time (minutes) for CNN was 1.2 (0.3). Average time (minutes) taken by radiologists for c-CNN (0.6 [0.97]) was substantially lower compared with manual segmentation (3.37 [1.47]; savings of 77.9%–87% [P < 0.0001]).
Convolutional neural network–enhanced workflow provides high accuracy and efficiency for volumetric pancreas segmentation on computed tomography.