A commentary on ‘Performance of CT-based deep learning in diagnostic assessment of suspicious lateral lymph nodes in papillary thyroid cancer: a prospective diagnostic study’

We read the article recently published in an upcoming issue of International Journal of Surgery by Zheng et al . [1] with great interest. The authors conducted a prospective diagnostic study involving external validation to assess the ef ﬁ cacy of a 3D ResNet-based ensemble model, which utilizes contrast-enhanced computed tomography (CT) to differentiate between benign and metastatic lateral lymph nodes (LNs) from the suspicious LNs in papillary thyroid cancer (PTC). This innovative deep learning model is both noninvasive and time-ef ﬁ cient, presenting a viable approach for surgical planning before lateral neck dissection of PTC. However, we would like to discuss several concerns regarding the study


Dear Editor,
We read the article recently published in an upcoming issue of International Journal of Surgery by Zheng et al. [1] with great interest.The authors conducted a prospective diagnostic study involving external validation to assess the efficacy of a 3D ResNet-based ensemble model, which utilizes contrast-enhanced computed tomography (CT) to differentiate between benign and metastatic lateral lymph nodes (LNs) from the suspicious LNs in papillary thyroid cancer (PTC).This innovative deep learning model is both noninvasive and time-efficient, presenting a viable approach for surgical planning before lateral neck dissection of PTC.However, we would like to discuss several concerns regarding the study.
First, unlike prior research employing deep learning models to analyze all detected LNs, this study specifically focused on LNs already stratified using CT-based criteria similar to a risk stratification system proposed by the Korean Society of Thyroid Radiology (KSThR) [2] .The authors included the suspicious LNs in the analysis while excluding the intermediate and probably benign LNs.The CT criteria for identifying suspicious LNs necessitate the presence of at least one of the following characteristics: cystic change, strong enhancement, heterogeneous enhancement, or calcification, all of which are indicative of malignancy radiologically.Notably, two senior radiologists performed well in further distinguishing these suspicious LNs as benign or metastatic, with area under the curve (AUC) values of 0.832 and 0.770, respectively, without significant differences from the ensemble model.However, to the best of our knowledge, there are no reported CT criteria for further classifying LNs that have already been stratified as suspicious.It would improve the understanding of conventional CT features if the authors could provide an elucidation of radiologic criteria employed by the radiologists to carry out this further differentiation of suspicious LNs in PTC.
Moreover, there might be potential selection bias in the studied LNs due to the use of ultrasound (US)-guided fine-needle aspiration (FNA) and washout thyroglobulin examination as a reference for malignancy, even if this method can achieve very high accuracy compared with surgical pathology [3] .As only LNs that could be relocated to US and accessible by US-guided biopsy were enrolled, which is an operator-dependent process compared with CT, we believe that only a proportion of suspected LNs discovered on CT were incorporated in this study.Consequently, this raises the question of whether this ensemble model could be applied to other suspicious LNs that did not undergo FNA but is important to determine the extent of surgery.Notably, it is one of the advantages of CT to evaluate the status of LNs that are not suitable for FNA.From these points of view, we suggest combining FNA and neck dissection pathology as a standard reference for further studies to show whether this model could be applied to all the suspicious LNs detected by CT.Additionally, we were also interested in the percentage of suspicious LNs that underwent FNA and the criteria for selecting LNs for FNA in the prospective settings.
Finally, it was highlighted that the majority of LNs included in the present study had a short-axis diameter 8 mm or less, which is an improvement to a previous deep learning model by Lee et al. [4] in which only LNs with size greater than 8-10 mm were assessed.However, there is another concern about the size of the LNs being examined.Since the information for the short-axis diameter has already been incorporated into radiologic data during segmentation in the deep learning model, LN short-axis diameter reemerged as one of the six clinical risk factors included in the ensemble model.We question the necessity of including shortaxis diameter as both radiologic and clinical determinants within the same model.
Overall, an in-depth understanding of the radiological features derived from preoperative contrast-enhanced CT could contribute toward accurately distinguishing between the benign and malignant LNs among the suspicious ones, which can be complementary or even a potential substitute to the US.While we have expressed specific concerns, it is worth clarifying that we do not contest the authors' conclusions.This useful deep learning model can help in the treatment planning of lateral neck dissection and in avoiding unnecessary FNA in patients with PTC.