Clinical Nuclear Medicine:
December 2019 - Volume 44 - Issue 12 -
Non–small cell lung cancer (NSCLC) is a heterogeneous group of diseases that includes adenocarcinoma (ADC), squamous cell carcinoma (SCC), adenosquamous cell carcinoma, large cell carcinoma, and sarcomatoid carcinoma. The most common histological subtypes are ADC and SCC. As the treatment-related outcomes of and the chemotherapeutic regimens for lung ADC and SCC differ, histopathologic diagnosis is essential prior to treatment initiation. CT-guided needle biopsy is the criterion standard for histological classification, but it has several limitations: it is invasive, cannot provide spatial information, typically cannot be repeated, does not allow for whole-body assessment, and can cause complications.1 Whole-tumor radiomics can characterize lesions noninvasively and repeatedly. This technique has been termed “virtual biopsy.”2 “Radiomics” refers to the extraction of many quantitative features such as pixel intensity, shape, and texture; these are used to convert standard clinical imaging data to higher-dimensional mineable data.3 High-throughput radiomics has recently emerged as a powerful approach for identifying of imaging biomarkers that can be used to build decision-support systems for cancer. The recent explosion of medical imaging data has created a fertile environment for machine learning and medical informatics.4,5
18F-FDG PET/CT has been widely used for staging NSCLC. Several studies have reported that glucose metabolism differs between ADC and SCC.6–9 Also, radiomic approaches using FDG PET/CT data have identified differences in the textural features of ADC and SCC.1,10,11 Therefore, we hypothesized that machine-learning approach could be used to differentiate ADC from SCC, and we explored whether a machine-learning algorithm using PET-based radiomic features could distinguish lung ADC from SCC.
PATIENTS AND METHODS
We retrospectively reviewed the pretreatment FDG PET/CT scans of 556 consecutive NSCLC patients taken between January 2013 and December 2016 at a single institution. We subsequently evaluated 509 patients after excluding 47 histological subtypes other than ADC or SCC. An additional 113 patients (102 with ADC and 11 with SCC) were excluded because tumor size was too small for accurate texture analysis (see Measurement of PET Parameters for technical details). The final patient cohort was 369 patients. Histological subclassification was performed based on the 2015 World Health Organization Classification of Lung Tumors.12 This retrospective study was approved by the ethics committee of our institution; the requirement for informed consent was waived given the retrospective nature of the work.
FDG PET/CT Acquisition
FDG PET/CT was performed using a Discovery ST or STE PET/CT scanner (GE Healthcare, Milwaukee, Wis). All patients fasted for at least 6 hours before FDG PET/CT; their blood glucose levels at the time of FDG injection were less than 150 mg/dL. Unenhanced CT was performed 60 minutes after the injection of 5 MBq/kg of FDG using a 16-slice helical CT scanner (120 keV; 30–100 mA in the AutomA mode; section width 3.75 mm). Emission PET data were acquired from the thigh to the head for 3.0 minutes per frame in 3-dimensional mode. Attenuation-corrected PET images (CT data were used for correction) were reconstructed using an ordered-subset expectation maximization algorithm (20 subsets, 2 iterations).
Measurement of PET Parameters
Various PET parameters of primary lung lesions were measured. A fixed SUV of 2.5 was used to define the boundaries of volumes of interest (VOIs). The LIFEx package (version 4.00)13 was used to extract 40 radiomic features on PET images (Table 1). LIFEx calculates textural features only for VOIs of at least 64 voxels. The PET VOI did not attain the minimum number of 64 voxels in 102 patients with ADC and 11 patients with SCC because of poor image matrix resolution; finally, 396 patients were evaluated.
Machine Learning Approach and Statistical Analyses
A total of 4 clinical and 40 radiomic features were used to predict tumor histological subtype employing machine-learning approaches. The classification target was the histological subtype of ADC. The clinical features considered included age, sex, tumor size, and smoking status. It was necessary to employ a feature reduction procedure when selecting a subset of features increasing predictive accuracy. A ranking-based feature selection method with the Gini coefficient14 was used to reduce feature dimensions. The Gini coefficient is a measure of inequality of a distribution. It is defined as a ratio with values between 0 and 1, where 0 denotes that all elements belong to a certain class or if there exists only 1 class, and 1 denotes that the elements are randomly distributed across various classes. Clinical and radiomic features were ranked based on the Gini coefficient score derived by evaluating their associations with histological class. To identify the optimal feature selection size, nine feature subsets were selected; the selection size ranged from 5 to 44 in steps of 5.
Five different machine-learning algorithms for binary classification were evaluated: a random forest, a neural network, a naive Bayes method, logistic regression, and a support vector machine. Model performance was internally validated via random sampling; data were randomly split into training and testing sets, and the entire procedure was repeated 100 times (training set size 70%). To compare the predictive performances of the models and feature subsets, we drew receiver operating characteristic curves and measured the areas under the curve (AUCs). We computed the following performance measures: AUC, accuracy, F1 score, precision (also called positive predictive value), and recall (also known as sensitivity). The F1 score (also known as F score or F measure) is the harmonic average of precision and recall.
The machine-learning approach was performed using Orange version 3.16 software (Bioinformatics Laboratory at the University of Ljubljana, Slovenia), an open-source data-mining and visualization package.15
We show means ± SDs of continuous variables and percentages of categorical variables. Differences between the 2 groups were compared using the t test for continuous variables and the χ2 test for dichotomous variables. All tests were 2-sided. Confidence intervals are reported at the 95% level, and P < 0.05 was considered to reflect statistical significance.
The clinical characteristics of the 396 patients are summarized in Table 2. The PET radiomics cohort consisted of 288 males and 108 females aged 67.3 ± 10.5 years (range, 23–89 years). Of these, 210 had ADC and 186 had SCC. A total of 205 patients had a smoking history. The tumor size measured by CT was 4.7 ± 2.2 cm (range, 1.2–14.2 cm). Patients with SCCs had larger tumors than patients with ADCs (5.3 ± 2.4 vs 4.2 ± 1.8 cm, P < 0.001).
Radiomic Feature Selection and Receiver Operating Characteristic Curve Analysis
Clinical and radiomic features were ranked using the Gini coefficient (Table 3). Sex, SUVmax, gray-level zone length nonuniformity (GLZLM_ZLNU), gray-level nonuniformity for zone (GLZLM_GLNU), and total lesion glycolysis (TLG) were the 5 best predictors of tumor histological subtype. Figure 1 shows the differences in the radiomics features of ADC and SCC. SUVmax (14.2 ± 6.1 vs 9.6 ± 4.5, P < 0.001), TLG (604.4 ± 809.7 vs 274.1 ± 499.0, P < 0.001), GLZLM_ZLNU (51.2 ± 77.7 vs 29.6 ± 59.7, P = 0.002), and GLZLM_GLNU (13.5 ± 14.8 vs 6.9 ± 6.8, P < 0.001) were significantly higher for SCCs than ADCs. The overall classification performances of the 5 machine-learning methods were compared by calculating the AUCs of 9 feature subsets with 5, 10, 15, 20, 25, 30, 35, 40, and 44 features (Table 4, Fig. 2). The logistic regression model outperformed the other classifiers when 15 features subset was used (AUC = 0.859, accuracy = 0.769, F1 score = 0.774, precision = 0.804, recall = 0.746), followed by the neural network model (AUC = 0.854, accuracy = 0.772, F1 score = 0.777, precision = 0.807, recall = 0.750).
We developed and validated a PET-based radiomics model for the prediction of NSCLC histological subtype. The logistic regression model effectively differentiated ADC from SCC. Sex, SUVmax, gray-level zone length nonuniformity, gray-level nonuniformity for zone, and TLG were highly associated with tumor histological subtype.
Although ADC and SCC are categorized as NSCLC, their biological features, clinical characteristics, and treatment-related outcomes differ, allowing machine learning to distinguish the subtypes. Of the clinical characteristics, female sex best correlated with the ADC subtype. Sex differences in NSCLC have been widely reported.16–18 We previously showed that textural features differed by sex and histological subtype.10 We here confirm that sex is important when comparing ADC and SCC; more females than males had ADCs.
We found that the SUVmax and TLG were among the top 5 features correlating with histological subtypes of lung cancer. The glycolytic phenotypes of lung ADC and SCC differ. Schuurbiers et al8 suggested that ADCs engage in glycolysis under normoxic conditions, whereas SCCs experience diffusion-limited hypoxia, resulting in a very high anaerobic glycolytic rate. Our previous PET study showed that SCCs have considerably higher metabolic rates than ADCs.6
PET-based radiomics analysis assesses the textural features of entire tumors noninvasively. Several studies have shown that ADCs and SCCs differ in terms of PET textural features.1,10,11 However, no single feature adequately describes the pathological phenotype because tumors exhibit multiple tissue patterns. Thus, a combination of different textural features (the radiomics signature) is needed to describe the lesion. We used machine-learning approaches to select radiomics features distinguishing ADCs from SCCs; the diagnostic performances were promising.
Our study has several limitations. First, the study population was relatively small. Although we initially evaluated 509 patients, radiomic features were extracted for only 396. Many ADC cases exhibiting faint FDG uptake could not be subjected to textural analysis. The tumor lesions too small for texture analysis were also excluded. With increased use of lung cancer screening, the many small lesions are more likely to be discovered in the early stage. Machine-learning tools that can accommodate smaller size would therefore be an important direction for future research. Second, the lack of external validation and the retrospective nature of the work limit the generalizability of our results. Although an internal validation was performed, external validation is necessary using a larger cohort.
In conclusion, a machine-learning approach with PET-based radiomics successfully identified the histological subtypes of lung cancer. A PET-based radiomic features may help clinicians improve the histopathologic diagnosis of lung cancer in a noninvasive manner.
1. Kirienko M, Cozzi L, Rossi A, et al. Ability of FDG PET
and CT radiomics features to differentiate between primary and metastatic lung lesions. Eur J Nucl Med Mol Imaging
2. Lambin P, Leijenaar RTH, Deist TM, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol
3. Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology
4. Choi H. Deep learning in nuclear medicine and molecular imaging: current perspectives and future directions. Nucl Med Mol Imaging
5. Lee JG, Jun S, Cho YW, et al. Deep learning in medical imaging: general overview. Korean J Radiol
6. Koh YW, Lee SJ, Park SY. Differential expression and prognostic significance of GLUT1 according to histologic type of non–small-cell lung cancer and its association with volume-dependent parameters. Lung Cancer
7. Kim DH, Jung JH, Son SH, et al. Prognostic significance of intratumoral metabolic heterogeneity on 18
/CT in pathological N0 non–small cell lung cancer
. Clin Nucl Med
8. Schuurbiers OC, Meijer TW, Kaanders JH, et al. Glucose metabolism in NSCLC is histology-specific and diverges the prognostic potential of 18
and squamous cell carcinoma. J Thorac Oncol
9. Meijer TW, Schuurbiers OC, Kaanders JH, et al. Differences in metabolism between adeno- and squamous cell non–small cell lung carcinomas: spatial distribution and prognostic value of GLUT1 and MCT4. Lung Cancer
10. Koh YW, Lee D, Lee SJ. Intratumoral heterogeneity as measured using the tumor-stroma ratio and PET
texture analyses in females with lung adenocarcinomas differs from that of males with lung adenocarcinomas or squamous cell carcinomas. Medicine (Baltimore)
11. Ha S, Choi H, Cheon GJ, et al. Autoclustering of non–small cell lung carcinoma subtypes on (18)F-FDG PET
using texture analysis
: a preliminary result. Nucl Med Mol Imaging
12. Travis WD, Brambilla E, Nicholson AG, et al. The 2015 World Health Organization Classification of Lung Tumors: impact of genetic, clinical and radiologic advances since the 2004 classification. J Thorac Oncol
13. Nioche C, Orlhac F, Boughdad S, et al. LIFEx: a freeware for radiomic feature calculation in multimodality imaging to accelerate advances in the characterization of tumor heterogeneity. Cancer Res
14. Gini C. Concentration and dependency ratios. Riv Pol Econ
15. Demšar J, Curk T, Erjavec A, et al. Orange: data mining toolbox in Python. J Mach Learn Res
16. Ben Aissa A, Mach N. Is lung cancer in women different? [in French]. Rev Med Suisse
17. Donington JS, Colson YL. Sex and gender differences in non–small cell lung cancer
. Semin Thorac Cardiovasc Surg
18. Paggi MG, Vona R, Abbruzzese C, et al. Gender-related disparities in non–small cell lung cancer
. Cancer Lett
Keywords:Copyright © 2019 Wolters Kluwer Health, Inc. All rights reserved.
adenocarcinoma; machine learning; non–small cell lung cancer; PET; texture analysis