Baseline and early digital [18F]FDG PET/CT and multiparametric MRI contain promising features to predict response to neoadjuvant therapy in locally advanced rectal cancer patients: a pilot study

Objective In this pilot study, we investigated the feasibility of response prediction using digital [18F]FDG PET/computed tomography (CT) and multiparametric MRI before, during, and after neoadjuvant chemoradiation therapy in locally advanced rectal cancer (LARC) patients and aimed to select the most promising imaging modalities and timepoints for further investigation in a larger trial. Methods Rectal cancer patients scheduled to undergo neoadjuvant chemoradiation therapy were prospectively included in this trial, and underwent multiparametric MRI and [18F]FDG PET/CT before, 2 weeks into, and 6–8 weeks after chemoradiation therapy. Two groups were created based on pathological tumor regression grade, that is, good responders (TRG1-2) and poor responders (TRG3-5). Using binary logistic regression analysis with a cutoff value of P ≤ 0.2, promising predictive features for response were selected. Results Nineteen patients were included. Of these, 5 were good responders, and 14 were poor responders. Patient characteristics of these groups were similar at baseline. Fifty-seven features were extracted, of which 13 were found to be promising predictors of response. Baseline [T2: volume, diffusion-weighted imaging (DWI): apparent diffusion coefficient (ADC) mean, DWI: difference entropy], early response (T2: volume change, DWI: ADC mean change) and end-of-treatment presurgical evaluation MRI (T2: gray level nonuniformity, DWI: inverse difference normalized, DWI: gray level nonuniformity normalized), as well as baseline (metabolic tumor volume, total lesion glycolysis) and early response PET/CT (Δ maximum standardized uptake value, Δ peak standardized uptake value corrected for lean body mass), were promising features. Conclusion Both multiparametric MRI and [18F]FDG PET/CT contain promising imaging features to predict response to neoadjuvant chemoradiotherapy in LARC patients. A future larger trial should investigate baseline, early response, and end-of-treatment presurgical evaluation MRI and baseline and early response PET/CT.


Introduction
Patients diagnosed with locally advanced rectal cancer (LARC) are currently treated with neoadjuvant chemoradiotherapy (nCRT), prior to surgical resection. The goal of nCRT is to downsize and downstage rectal cancer, Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal's website, www.nuclearmedicinecomm.com.
This is an open access article distributed under the Creative Commons Attribution License 4.0 (CCBY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. thereby improving the rate of complete resections and lowering the risk of local recurrence [1]. The majority of patients have a partial tumor response after nCRT [1], while in 15-20% this even results in a pathological complete response (pCR) of all tumor tissue [1,2]. Most recently, results from the RAPIDO trial demonstrate even higher rates of pCR (28%) after neoadjuvant short-course radiotherapy followed by chemotherapy [3]. Unfortunately, not all patients respond well to nCRT, but the exact number of nonresponders is uncertain [4].
According to current guidelines, treatment stratification and response assessment are performed using MRI and in selected cases, rectoscopy [5]. MRI features include the tumor-node-metastasis stage, extramural vascular invasion (EMVI), and tumor distance to the mesorectal fascia [6]. Unfortunately, current imaging modalities are unable to predict response to nCRT accurately. In recent years, the watch-and-wait strategy has been implemented for patients with clinical complete response (cCR) after neoadjuvant therapy, with excellent long-term outcomes [1,7]. By means of improved stratification before or early after the onset of nCRT, a precise selection of patients might be possible. In patients predicted to respond well, the (watchful) waiting period before surgery could be prolonged, possibly increasing the rate of cCR. Accurate identification of cCR patients can prevent futile surgery and its associated morbidity and mortality [8]. In patients with a predicted poor response, unbeneficial continuation of nCRT, therapy-related toxicity and unwanted delay in the initiation of a potentially effective treatment could be avoided.
Currently, 2-[ 18 F]fluoro-2-deoxy-d-glucose ([ 18 F]FDG) PET combined with computed tomography (PET/CT) is advised in the national guideline for the detection of recurrence of rectal cancer in case of increased carcinoembryonic antigen levels [9]. Many MRI and [ 18 F] FDG PET/CT features have been investigated separately to predict response to nCRT before or early after the onset of nCRT [10][11][12][13][14][15][16][17][18][19][20][21]. The combination of both modalities could possibly have complimentary value to predict response. Available data in the literature are insufficient to evaluate this approach, and no studies have investigated the application of digital PET/CT in this field [13,16,19]. Owing to its increased energy resolution and time-of-flight performance, digital PET/CT has the potential to improve the quantification of small or heterogeneous tumors and thereby provide more accurate metabolic information on tumor response, and might (in combination with multiparametric MRI) facilitate improved response prediction to nCRT.
In this pilot study, we investigate the feasibility of response prediction using digital [ 18 F]FDG PET/CT and multiparametric MRI before, during, and after nCRT in LARC patients and aim to determine the most promising imaging modalities and time points for further investigation.

Patient population
A multicenter, nonrandomized prospective study was performed in patients admitted to the Leiden University Medical Center (n = 8), Haaglanden Medical Center (n = 6), Alrijne Hospital Leiderdorp (n = 4), and Groene Hart Hospital (n = 1), diagnosed with (biopsy proven) LARC and treated according to national guidelines. Eligible patients were selected at multidisciplinary meetings and asked for participation during their outpatient clinic visits. Treatment consisted of nCRT (25 × 2 Gy combined with 825 mg/m 2 bid capecitabine 5 days per week), followed by reevaluation after 6-8 weeks. Surgery followed within 4-6 weeks after reevaluation. In case of a near complete response, reevaluation was repeated after 6-8 weeks.
In the case of cCR, follow-up was initiated according to the watch-and-wait protocol [7]. The study was conducted in concordance with the Declaration of Helsinki, and was approved by the Leiden-Den Haag-Delft medical ethics review board and the local boards of participating centers. All subjects provided written informed consent. The study was registered in the Netherlands Trial Register (identification number NL-756). Including standard of care imaging (rectoscopy, MRI scan of abdomen, and CT scan of the chest and abdomen), all patients underwent [ 18 F] FDG PET/CT and multiparametric MRI before nCRT, 10-14 days after nCRT onset (early response evaluation), and 6-8 weeks after the last treatment (end-of-treatment presurgical evaluation).

Data acquisition and image reconstruction
All digital [ 18 F]FDG PET/CT scans of the lower abdomen were acquired on the same scanner, a Vereos PET/CT (Philips Healthcare, Best, the Netherlands). All acquisitions and reconstructions were in accordance with European Association of Nuclear Medicine (EANM) guidelines for tumor PET imaging version 2.0 [22]. Prior to PET/CT scanning, patients fasted for 6 h and were prehydrated using 1 L of water.  Volumes of interest (VOIs) were drawn manually (F.V. under the supervision of S.F.S.) to include the primary tumor on the DWI and T2 maps. Various quantitative features were extracted using 3DSlicer (version 4.11) [23] and PyRadiomics (version 3.0) which was running in Python (version 3.7; Python Software Foundation, Wilmington, Delaware, USA) [24]. First, following the methodology of Schurink et al. [19], the following features were extracted from the VOIs: T2 mesh volume, T2 entropy, DWI mesh volume, mean ADC, ADC entropy, and their respective response indices. Second, to allow full comparison to the results from Schurink et al. [19,20] and following recent promising results from Delli Pizzi et al. [25], 105 radiomic features were extracted from the T2 baseline images for additional radiomic analysis: shape [14], first order [18], gray level cooccurrence matrix [22], gray level run length matrix [16], gray level size zone matrix [16], gray level dependence matrix [14] and neighboring gray-tone difference matrix [5] features. Images were interpolated to isotropic voxels of 2.00 × 2.00 × 2.00 mm 3 using B-spline interpolation, with grids aligned by the input origin and only covering the VOI. Both T2 and DWI images were normalized to a mean of 300 and a SD of 100, allowing comparison of the relative gray values between patients [26]. Features were extracted using a fixed bin size, which was determined in such a way that most VOIs contained between 30 and 130 bins. This resulted in a bin size of 5 and 15 for T2 and DWI images, respectively.
PET/CT assessment was performed by a board-certified nuclear medicine physician (L.G., 25 years of experience), using Sectra IDS7 software (version 21.2; Sectra AB, Linköping, Sweden). VOIs were automatically delineated with an isocontour threshold of 50% of the maximum standardized uptake value (SUV max ) using IntelliSpace Portal (version 9.0; Koninklijke Philips N.V., Amsterdam, the Netherlands). The following features were included in the analysis with their corresponding response indices based on the following articles. Joye et al. pooled data from 25 studies investigating [ 18 F]FDG PET/CT and found the following features to be promising predictors for response [17]: the SUV max post-therapy, response indices of the SUV max , the metabolic tumor volume [MTV, obtained using a peak standardized uptake value corrected for lean body mass (SUL peak ) threshold of 50%] and total lesion glycolysis (TLG, SUV mean × MTV). All features were body weighted, except SUL peak , which was weighted using the lean body mass following the methodology described in PERCIST 1.0 and by O et al [27]. They advise the use of SUL peak as exploratory data when the liver is not present in all scans. No radiomic feature analysis was performed on data from [ 18 F]FDG PET/CT, as this has not been described in literature before.

Pathology
Pathological assessment of the resection specimen was performed according to the Dutch national guidelines [9]. In addition to this, the extent of tumor regression was evaluated according to Mandard's tumor regression grade (TRG) by the local board-certified pathologist [28]. Mandard's TRG classifies response to given therapy into five classes based on the number of vital tumor cells and the extent of therapy-induced fibrosis. When classified TRG 1, no residual tumor cells were seen, and the patient is considered to have a pathologic complete response (pCR). A regrowth-free survival time of >6 months was considered a surrogate endpoint for TRG1 in patients with a cCR in watch-and-wait follow-up.

Statistical analysis
Statistical analysis was performed using SPSS (version 25; IBM SPSS, Inc., Chicago, Illinois, USA) and R (version 3.6.0; R Foundation for Statistical Computing, Vienna, Austria). For statistical analysis, patients were divided into two groups based on the pathological TRG or regrowth-free follow-up in the case of watch-and-wait: good responders (TRG1-2) and poor responders (TRG [3][4][5]. Descriptive data were displayed as mean ± SD or median (interquartile range), depending on the distribution of data. Non-parametric data were compared using the Mann-Whitney U test, whereas parametric data were compared using a T-test. Results were considered significant when P < 0.05. Promising imaging features were selected using binary logistic regression, after dividing through their respective SD. Due to the small sample size and large amount of tested features, MRI and PET/ CT features were considered promising when a P value ≤0.2 was reached.
Unsupervised radiomic feature selection using redundancy filtering and factor analysis was performed using FMradio (Factor Modeling for Radiomics Data, package version 1.1.1; Amsterdam UMC, Amsterdam, the Netherlands), developed for R (version 3.6.0; R Foundation for Statistical Computing, Vienna, Austria) [29]. The large feature dimensionality compared to the small sample size might result in overfitting and deteriorates the generalizability of the radiomic model. Therefore, one feature was selected for every 10 subjects [30]. Features were scaled (centered around 0, variance of 1) to avoid the features with the largest value would dominate the analysis. Redundancy filtering on the Pearson correlation matrix was performed with a threshold of τ = 0.95 and from each group, one feature was retained. Factor analysis of the redundancy-filtered correlation matrix was performed and two factors (19 patients) were selected per sequence and time point. The sampling adequacy of the model was determined by the Kaiser-Meier-Olkin measure, which had to be between 0.9 and 1.0. The features with the highest loading on the factors were selected.

Results
Nineteen patients were included in the period between July 2018 and March 2020. All patients completed chemoradiotherapy, and all but one underwent surgery after an average of 14.1 ± 6.6 weeks (one cCR patient in watchand-wait). All but one patient completed all six imaging studies: in one patient the final [ 18 F]FDG PET/CT was not performed due to logistical problems. Sixteen men and three women were included in this study with a median age of 63.1 (56.3-67.0) years old. The median follow-up time was 11.6 (9.0-17.1) months. No recurrent disease was found. One patient had a cCR without regrowth during follow-up, 4 patients had a pTRG1, 9 pTRG3, 4 pTRG4, and 1 pTRG5. On the basis of the pTRG, five patients (26.3%) were good responders, and 14 (73.7%) were poor responders. There were no significant differences at baseline between groups regarding age, sex, cT stage, cN stage, EMVI, and tumor differentiation, as summarized in Table 1.

Quantitative features
A total of 57 quantitative features were extracted. Redundancy filtering and factor analysis of the radiomic feature sets were performed and Kaiser-Meier-Olkin (KMO) measures were excellent (>0.96). The features corresponding best with the two factors per sequence and timepoint were included in the analysis.
Using binary logistic regression analysis with a predefined cutoff value of P ≤ 0.2, 13 features were found to be promising predictors of response. At baseline imaging, three MRI and two PET/CT features were found to be promising. At early response evaluation, no promising features were found; however, two MRI and two PET/ CT early response evaluations to baseline response index features were found to be promising. At the end-of-treatment presurgical evaluation, three MRI and one PET/ CT feature were found to be promising, but no response index features were promising.
These results are shown in more detail in the forest plot in Fig. 1, which displays all features with their respective odds ratios and confidence interval. It shows numerous features to have preferable odds ratios; however, only 13 have a P ≤ 0.2. Detailed results from binary logistic regression analysis are displayed in Table 2. Figures 2  and 3 present examples of good and poor responders on sequential multimodality imaging.

Discussion
Results from this pilot study indicate that 13 out of 57 features are promising predictors of response, with baseline and early change showing the most clinically relevant features. As deduced from these results, endof-treatment presurgical evaluation digital PET/CT was least probable to provide predictive (and clinically relevant) features. As far as we know, this is the first prospective study in LARC patients investigating the predictive value of multiparametric MRI and digital [ 18 F]FDG PET/CT, at three set time points during neoadjuvant chemoradiation.
The results from this study confirm the feasibility of response prediction using digital [ 18 F]FDG PET/CT and multiparametric MRI. These results are in line with previous reports from various small trials demonstrating the predictive value of various T2-and DW MRI and [ 18 F] FDG PET/CT features, which have up until now not resulted in clinically usable prediction models [14,17]. In contrast to our results, a recent study in 19 LARC patients found only baseline MTV and no early response evaluation features (2 weeks into nCRT) to be possible predictors of response [31]. In our study we also found baseline MTV to be a promising feature; however, we  [19]. The second study found an AUC of 0.83 using clinical (T-stage, N-stage, age, sex, interval between nCRT and end-of-treatment presurgical evaluation) and baseline features (T2 entropy, ADC entropy, and SUV mean ) [20]. Interestingly, models including radiomic features did not outperform the simpler model [20]. Moreover, radiomic analysis of PET/CT images (AUC 0.78) did outperform simpler features (SUV mean , TLG, and mean Hounsfield unit, AUC 0.50) [20]; however, PET/CT radiomic analyses were performed on the CT-only images, thus questioning the added value of PET. In comparison to our study, in which MRI-based radiomic features were analyzed, we found four out of 12 radiomic features to be promising predictors of response (one baseline and three end-of-treatment presurgical evaluation features). Unfortunately, no AUC values were available due to the limited number of patients. Interestingly, the end-oftreatment presurgical evaluation [ 18 F]FDG PET/CT was the least promising in this dataset. This might be due to the occurrence of radiation-induced proctitis interfering Forrest plot of investigated features. Figure shows odds ratio for TRG1-2 with 95% confidence intervals from binary logistical regression analyses on logarithmic scale (x-axis). ADC mean, mean apparent diffusion coefficient; DWI entropy, tumor entropy on diffusion-weighted imaging series; DWI volume, tumor volume on diffusion-weighted imaging series; MTV, metabolic tumor volume; SUL peak , peak standardized uptake value corrected for lean body mass; SUV max , maximum standardized uptake value; T2 entropy, tumor entropy on T2 series; T2 volume, tumor volume on T2 series; TLG, total lesion glycolysis.
with end-of-treatment presurgical evaluation PET/CT, since inflammation results in increased uptake of [ 18 F] FDG and is not present at early response evaluation yet.
Although accurate response prediction is currently challenging, the significant number of unidentified complete responders who undergo surgical resection stresses the importance of accurate response assessment and prediction. Following our results, a future trial should include multiparametric MRI at all three time points, and [ 18 F] FDG PET/CT at baseline and early response evaluation. Furthermore, the sample size should be sufficient to define cutoff values and develop accurate prediction models. While this study focused primarily on predicting response using imaging modalities, the (combined) use of readily available predictive features such as metabolomics and analysis of biopsy material, and the integration of these in prediction models might further increase the accuracy of response prediction. MRI. Last, inter-observer variability has been introduced as the TRG was determined by various local pathologists; however, as the data were divided into only two groups, the influence of this was deemed minimal. Future studies should take these issues into account, and either further investigate the possible influence of various scanner types and acquisition protocols, perform the study on one MRI scanner within the same institute, or develop methods to harmonize the data. Also, a future study should consider the possible shift toward the use of more shortcourse radiotherapy combined with systemic chemotherapy following results from the RAPIDO trial, as opposed to CRT as described by current guidelines [3]. This issue is less relevant for pooling data from [ 18 F]FDG PET/CT because data are (largely) harmonized by following the EANM guidelines and only one single PET/CT scanner was used in this study [22].
In conclusion, results from this study suggest that baseline, early response and end-of-treatment presurgical evaluation MRI and baseline and early response evaluation PET/CT features are promising to predict response to neoadjuvant therapy in rectal cancer patients. These results, in combination with the clinical need for improved treatment stratification, encourage further research into response prediction using [ 18 F]FDG PET/CT and multiparametric MRI. [ 18 F]FDG PET/CT and T2 weighted MRI images before, during, and after neoadjuvant therapy of a patient with clinical complete response. A sixtytwo-year-old man with cT4bN2M0 rectal cancer had a good response to a yiT1-2N0M0 which further regressed to a yiT0N0M0 6 months after chemoradiotherapy, and is currently still followed in the watch-and-wait after 12 months of recurrence-free follow-up. SUV max was 18.1 at baseline, 10.4 at interim assessment, and too low to measure at reevaluation. Figure shows [ 18 F]FDG PET/CT fusion (a-c) and PET-only (d-f) images as well as T2 weighted MRI (g-i) images before (a, d, g), during (b, e, h) and after (c, f, i) neoadjuvant chemoradiotherapy. CT, computed tomography; SUV max , maximum standardized uptake value.
no. 857894). F.V., S.F., W.N., F.v.V., L.G., H.P., and D.H. wrote the main manuscript text and performed the analysis. F.V., F.v.V., F.P., L.G., and D.H. were involved in the trial design. All other authors reviewed and agreed upon the manuscript. The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of Leiden Den Haag Delft. Written informed consent was obtained from all subjects (patients) in this study. Data is available upon reasonable request from the corresponding author.

Conflicts of interest
There are no conflicts of interest.