Multiple myeloma (MM) is a heterogeneous disease regarding symptoms, tumor genetics, and outcome.1 Precise diagnostics to classify disease subtypes and to predict the individual course of disease is of utmost importance to guide treatment accordingly, a principle known as precision oncology. In addition to serological and urine parameters, imaging and bone marrow (BM) biopsy results play decisive roles in diagnosis of monoclonal plasma cell disorders. Parameters received from unguided BM biopsy at the posterior iliac crest as plasma cell infiltration (PCI)2–7 or cytogenetic aberrations8–11 have also proven value as biomarkers and consequently are now used for staging, risk stratification, and response assessment.9,10,12–15 However, tumor load distribution16,17 and genomic aberrations18–20 can be spatially heterogeneous. Invasive biopsies have the disadvantage that they cannot be performed both multifocally and in frequent repetition, which would be necessary to precisely evaluate and monitor the tumor load and capture the complete genomic landscape in each patient. Here, whole-body magnetic resonance imaging (wb-MRI) is advantageous, as it allows to investigate the complete BM of a patient noninvasively and to capture information on spatial distribution and local characteristics of tumor manifestations. Consequently, a method to predict local biopsy results from MRI without multiple invasive biopsies would be of great value, if a link between local imaging findings and local histology or local genetic findings could be established.
Radiomics is a new image analysis approach, characterizing a structure from imaging by calculating hundreds of mathematically defined radiomics features to quantify signal intensity, shape, and texture of the structure.21,22 For various oncologic entities, it has been reported that radiomics can predict tumor tissue characteristics, as histologic or genetic results.23 For patients with MM, it was recently demonstrated that the combination of an nnU-Net24 for segmentation and a subsequent radiomics analysis21,22 allows to automatically obtain an objective, in-depth characterization of the whole BM automatically from wb-MRI.25
Given this clinical, biological, and technical background, the purpose of this study was to develop and test an automated deep learning and radiomics image analysis framework, which analyzes the pelvic BM from whole-body MRIs to predict the local tumor tissue characteristics PCI and cytogenetic aberrations from routine BM biopsy at the iliac crest.
Study Design and Algorithmic Concept
This study was designed as a retrospective study using multicentric data sets to establish and test a methodological concept to perform automatic tissue analysis from routine clinical MRIs and predict BM biopsy results in patients with monoclonal plasma cell disorders. To achieve this goal, an automated multistep pipeline was established, which included automated pelvic BM segmentation by a deep learning algorithm,24 image normalization26 and resampling, radiomics feature calculation,27 and parameter prediction based on a machine learning model (Fig. 1). This study was approved by the institutional review board Medical Faculty, University of Heidelberg (Germany); S-537/2020 with waiver of informed consent. The acquisition of imaging and clinical data was performed between 2008 and 2021. Specific planning, data annotation, establishing algorithms, and data analysis for this project were performed between 2019 and 2022.
Patient Cohorts and Data Sets
There was no prespecified sample size for this study. As recommended, we aimed to assemble an overall data set with a large number of samples so that machine learning methods can be reasonably applied, with a balanced distribution of patients with low and high PCI, and which allows for independent, external, multicentric testing of the final algorithms. Data from center 1 were derived from 5 different subsets, which had in part been included in other studies. Data from center 1 were used for training of the algorithms, and a hold-out subset was reserved for independent, internal testing (internal test set). Data from center 2 were used for external testing: once in a subset of MRIs with a homogenized MRI protocol and high imaging quality (center 2, high-quality test set), and once in a subset that did not fulfill these criteria (center 2, other test set). Finally, a multicentric data set was used to evaluate the performance in a very heterogenous data set (multicenter test set). Details on the inclusion and exclusion process including the respective flow-charts are reported in Supplemental Digital Content 1, https://links.lww.com/RLI/A806. An overview over all data sets is shown in Figure 2.
Coronal T1-weighted (T1w) turbo spin echo images that had been acquired with different MRI scanners and different MRI sequence parameters were included. Detailed information on all scanners and sequence parameters is reported in Supplemental Digital Content 2, https://links.lww.com/RLI/A807.
Image Segmentation and Training and Testing of nnU-Nets
The BM of the right and left hip bone and the medial part of the piriformis muscle were segmented on coronal T1w images. An initial subset of 127 MRIs was annotated manually to train a first nnU-Net,24 which was then used to automatically presegment the rest of the training data. After manual improvement by 1 experienced rater (7 years of experience in MRI segmentation), the final nnU-Net was trained on all training data (470 cases). Testing of the nnU-Net was performed in 3 independent test sets with overall 37 manually segmented cases as reference. Further details are reported in Supplemental Digital Content 3, https://links.lww.com/RLI/A808.
Before feature extraction, all images were resampled to a uniform voxel spacing and normalized to the mean signal intensity of the piriformis muscle to minimize heterogeneity between data sets caused by technical variations in image acquisition.26,28 The IBSI-conform29 and validated software MITK Phenotyping27 were used for radiomics feature calculation. A total of 260 radiomics features were calculated, with 91 first-order features and 169 texture features. Volume and shape features were omitted because these are not expected to carry disease-specific information in this setting, as whole BM spaces are analyzed. Four different radiomics models were trained for the PCI prediction task: once with versus without inclusion of a data set from an older scanner, and once with versus without addition of clinical features. Random forest regression-based models were trained to predict PCI from the radiomics (and clinical) features using the sklearn python package.30 Random forest classifiers based on radiomics (and clinical) features were trained for the predictions of binary variables. All machine learning modeling was performed with Python (Version 3.8.10, Python Software Foundation, Delaware). The models were initially trained on the training set of center 1 only (168 cases) and tested on the center 1 test set (59 cases). Then, the models were retrained on all data from center 1 (227 cases), and finally tested on the external test sets (including a total of 143 cases). Further details on the radiomics analysis are reported in Supplemental Digital Content 4, https://links.lww.com/RLI/A809.
Histological, Cytological, and Cytogenetic Data
Bone marrow biopsies were performed at the posterior iliac crest without image guidance. When assessing PCI from BM biopsy, in line with the International Myeloma Working Group recommendations,14 the higher value from the histological and cytological PCI was used. The 5 cytogenetic aberrations gain(1q), del(13q), del(17p), t(4;14), and t(14;16), which are currently used to form the respective risk stratifications in smoldering MM12 and in MM,10,31 were investigated. In addition to the prediction of the presence of each cytogenetic aberration individually, it was also predicted whether a high-risk cytogenetic status was present. Three different definitions for cytogenetic high-risk were used: high-risk cytogenetic status according to definition 1 (abbreviated “HR-C def 1”), based on the definition of cytogenetic high-risk aberrations in R-ISS [presence of any aberration of the following: del(17p), t(4;14), and t(14;16)]10; “HR-C def 2,” based on the definition of cytogenetic high-risk aberrations in R2-ISS [presence of any aberration of the following: gain(1q), del(17p), t(4;14), and t(14;16)]31; and “HR-C def 3,” based on the definition proposed for smoldering MM [presence of any aberration of the following: gain(1q), del(13q), del(17p), t(4;14), and t(14;16)].12
Dice scores were calculated to quantify the agreement between automatic and manual segmentations. Pearson correlation was used to evaluate the correlation between predicted and actual PCI values. The area under the receiver operating characteristic (AUROC) was used to evaluate the cytogenetic risk status or cytogenetic aberration prediction. Spearman correlation coefficient was used to evaluate the correlation between individual radiomics features and PCI. The Wilcoxon test was used to assess the difference in predicted PCI values between the cytogenetic high-risk and standard risk group, or between the groups with and without the respective cytogenetic aberration. The Gini feature importance was used to report the relative influence of a feature for the prediction model and was calculated as implemented in scikit-learn.30 The 95% confidence intervals (CIs) for Pearson correlation coefficients and AUROCs were calculated. P values <0.05 were considered statistically significant. The statistical analysis was performed with Python (Version 3.8.10; Python Software Foundation, Delaware) and R (Version 4.0.1 R Foundation for Statistical Computing, Vienna, Austria).
Study Cohort and Data Sets
A total of 672 MRIs from 512 patients (median, age 61 years; interquartile range, 53–67 years; 307 men) from 8 centers and 370 corresponding BM biopsies were included in this study. An overview over the data sets is displayed in Figure 2. Details on inclusion and exclusion at each stage are reported in the methods and in the supplements (Supplemental Digital Content 1, https://links.lww.com/RLI/A806). Table 1 reports descriptive information for each data set.
TABLE 1 -
Description of Study Cohorts
|A. Data for Segmentation Experiments
||Training Set for nnU-Net
||Internal Test Set/Interrater Variability
||Center 2 Test Set
||Multicenter Test Set
| n MRIs (n patients)
||470 MRIs (from 310 patients)
||8 wb-MRIs (from 8 patients)
||15 wb-MRIs (from 15 patients)
||14 wb-MRIs (from 14 patients)
| Male sex, n (%)
| Age in yearsb
| ISS I/II/III (n.a.)
| On/after therapy or n.a.
|Tumor load surrogates
| PCI in %b
||23 (12–50; 257)
| M-protein in g/Lb
||20 (11–35; 130)
|B. Data for Radiomics Experiments
||Internal Training Seta Radiomics
||Internal Test Seta
||Center 2, High-Quality Test Set
||Center 2, Other Test Set
||Multicenter Test Set
| n MRIs (n patients)
||168 wb-MRIs (from 166 patients)
||59 wb-MRIs (from 59 patients)
||32 wb-MRIs (from 32 patients)
||75 wb-MRIs (from 75 patients)
||36 wb-MRIs (from 36 patients)
| Male sex, n (%)
| Age in yearsb
| ISS I/II/III (n.a.)
| On/after therapy or n.a.
|Tumor load surrogates
| PCI in %b
| M-protein in g/Lb
||24 (12–41; 28)
||24 (14–43; 13)
||24 (10–36; 7)
||32 (15–40; 21)
||37 (21–46; 16)
| HR-C def 1c
| HR-C def 2c
| HR-C def 3c
|MRI biopsy interval (days)b
Table A reports descriptive information for each data set included the segmentation experiments, and Table B reports descriptive information for each data set included in the radiomics experiments.
aNote: Initially, algorithms were trained on only the training set from center 1 and tested on the internal test set. Then, radiomics algorithms were retrained on all data from center 1 and then the resulting models were tested on the external test sets.
bMedian (interquartile range; n missing [only reported in case there were any cases with missing information]).
cPercentage with high risk (number high risk/number all).
Percentage with cytogenetic aberration (number cytogenetic aberration present/number cytogenetic aberration tested). HR-C def 1, high-risk cytogenetic status according to definition 1 based on the definition of cytogenetic high-risk aberrations in R-ISS [presence of any aberration of the following: del(17p), t(4;14), and t(14;16)]10
; “HR-C def 2,” based on the definition of cytogenetic high-risk aberrations in R2-ISS [presence of any aberration of the following: gain(1q), del(17p), t(4;14), and t(14;16)]31
; and “HR-C def 3,” based on the definition proposed for smoldering MM [presence of any aberration of the following: gain(1q), del(13q), del(17p), t(4;14), and t(14;16)].12
n.a., not available; MGUS, monoclonal gammopathy of unknown significance; SMM, smoldering multiple myeloma; %, percentage of this cohort; PCI, plasma cell infiltration in the bone marrow in %.
Quality of Automated Pelvic Bone Marrow Segmentation
The Dice scores for the automated pelvic BM segmentations and for the interrater variability between 2 radiologists are reported in Table 2. Figure 3 displays automated segmentations in 5 examples, including cases with severest pathologies.
TABLE 2 -
Quality of Automatic Segmentation and Interrater Variability
||nnU-Net vs Radiologist: Right Pelvis
||nnU-Net vs Radiologist: Left Pelvis
||Interrater Variabilitya: Right Pelvis
||Interrater Variabilitya: Left Pelvis
|Internal test setb
||0.96 ± 0.03
||0.96 ± 0.03
||0.88 ± 0.02
||0.87 ± 0.02
|Center 2 test setc
||0.93 ± 0.01
||0.92 ± 0.01
|Multicenter test setd
||0.89 ± 0.03
||0.90 ± 0.03
Mean Dice scores (± standard deviation) are reported to assess quality of automatic bone marrow segmentation, and to compare it with the interrater variability of segmentations between 2 radiologists.
aBetween manual segmentations from 2 different radiologists.
bLast 2 by date for each data set I–IV.
cNewest 15 by date.
dNewest 3 by date per center. From center 4 and center 7, only one processable data set was available. This data set comprises 14 MRIs from 6 centers, acquired with 5 different scanner models from 3 different vendors.
Quantitative Profiling of Bone Marrow Phenotypes Using Radiomics
A wide variety of different morphologic BM patterns can be observed in MRI in patients with monoclonal plasma cell disorders. Figure 4 displays several exemplary cases of these varying MRI BM patterns and the resulting radiomics signature from the pelvic BM for each case. These cases exemplify how differences in morphologic MRI patterns lead to differences in the extracted quantitative, objective radiomics profiles.
The differences between MRI morphologically normal-appearing BM, focal lesion pattern, and severe diffuse infiltration are unequivocal (Fig. 4, P1–P6). Besides those, there is a large group of patterns that might be classified as intermediate diffuse infiltration (Fig. 4, P7–10). These are clearly heterogeneous and therefore might represent different MM disease subtypes. However, the complexity of such patterns can hardly be reported in a structured, reproducible manner based on a visual assessment by radiologists. Detailed, objective analysis of these complex patterns and their systematic correlation with tumor tissue characteristics (assessed by BM biopsy at the iliac crest) are an optimal use case for machine learning–based image analysis approaches as deep learning and radiomics.
Automatic Prediction of Plasma Cell Infiltration
Four different models were trained and tested. Once, only data sets I to III of the internal data sets were used, whereas data set IV, which was acquired with an older scanner and had markedly lower image quality, was omitted. Second, all data sets I to IV from the internal data set were included. In both scenarios, one model was trained on radiomics features only, and an additional model was trained both on the radiomics features and the clinical features age and body mass index. The correlations between the predicted PCI values and the actual PCI values for the different prediction models are reported in Table 3. The model based on data set I–III using only radiomics features without clinical parameters showed the highest correlation coefficient between predicted PCI and actual PCI (r = 0.71, P < 0.001) on the internal data set. The correlation between predicted PCI and actual PCI on the external data sets was worse than in the internal data set. However, on all external data sets, the model trained on radiomics features from data set I–III without clinical parameters predicted PCI values which were significantly correlated to the actual PCI values (all P's ≤ 0.01), with correlation coefficients between 0.30 and 0.56. The model including additional clinical features performed quite similar to the model without clinical features, with the main difference that in the data set from center 2 with variable imaging quality, it performed somewhat better (r = 0.38 vs r = 0.30). Addition of data set IV to enhance the training data set did not markedly change the performance of the PCI prediction models, neither when using only radiomics features nor when additionally including the clinical features age and body mass index. As a benchmark for the interpretation of the correlation coefficients between predicted PCI and actual PCI, we investigated the correlation between PCI from histological and cytological assessment, and found the correlation coefficient to be 0.53 (P < 0.001).
TABLE 3 -
Accuracy of the Prediction of Plasma Cell Infiltration of Different Models
||Model 1: Trained on Data Set I–III; Radiomics Features
||Model 2: Trained on Data Set I–III; Radiomics and Clinical Features*
||Model 3: Trained on Data Set I–IV; Radiomics Features
||Model 4: Trained on Data Set I–IV; Radiomics and Clinical Features*
||0.71 [0.51, 0.83]
||0.66 [0.44, 0.80]
||0.56 [0.35, 0.71]
||0.56 [0.35, 0.72]
|Center 2, high-quality subset
||0.45 [0.12, 0.69]
||0.42 [0.08, 0.67]
||0.42 [0.09, 0.67]
||0.38 [0.04, 0.64]
|Center 2, other subset
||0.30 [0.07, 0.49]
||0.38 [0.13, 0.59]
||0.22 [−0.01, 0.43]
||0.39 [0.15, 0.60]
|Multicenter, test set
||0.57 [0.30, 0.76]
||0.58 [0.07, 0.85]
||0.45 [0.15, 0.68]
||0.58 [0.08, 0.85]
The correlation coefficient r and the P values from Pearson correlation are reported for each prediction model on each test set.
*Composition of test sets center 2, other and multicenter deviated from the other analyses, as body mass index was not available for all patients in these data sets.
Figure 5 visualizes the 15 most important radiomics features (according to the PCI prediction model based on the internal training data set using radiomics features only) for the internal training set and the internal test set and provides a quantitative analysis how each of the features is correlated to the PCI in each data set. When investigating visual patterns in the radiomics heat map of the internal training set, no general, continuous trend of each radiomics feature from patients with low PCI toward patients with high PCI can be observed. Rather, especially in patients with low to intermediate PCI, different patterns in the radiomics signatures can be found. The fact that quite heterogeneous quantitative imaging features are found in patients with low to intermediate PCI is very much in line with the observation that there are very heterogeneous visual patterns in the BM in such patients, as demonstrated in the exemplary cases in Figure 4. However, especially in patients with (very) high PCI, a rather distinct pattern in the radiomics heat map becomes apparent (Fig. 5). This comprises lower features values for features as “first-order numeric mode value”/“first-order histogram mode value,” “first-order numeric 30th percentile,” or “first-order numeric minimum,” representing a rather hypointense BM signal.
Prediction of Cytogenetic Risk Status and Cytogenetic Aberrations
First, we investigated whether cytogenetic aberrations or cytogenetic high-risk status can be predicted by training individual models for this task. Three prediction models were trained to predict the presence or absence of a high-risk cytogenetic status according to 3 different established definitions (abbreviated HR-C def 1–3, as defined in the methods). Four prediction models were trained to predict the presence or absence of individual cytogenetic aberrations (Table 4). Although the models showed some discriminative ability in the internal test set with AUROCs ranging between 0.57 and 0.76, these models did not generalize to the multiple external test sets.
TABLE 4 -
Prediction of Cytogenetic Risk Status and Cytogenetic Aberrations
|AUROC [95% Confidence Interval]
||Internal Test Set
||Center 2, High-Quality Test Set
||Center 2, Other Test Set
||Multicenter Test Set
|HR-C def 1
||0.57 [0.25, 0.89]
||0.41 [0.14, 0.69]
||0.67 [0.51, 0.83]
||0.62 [0.41, 0.84]
|HR-C def 2
||0.73 [0.46, 1.00]
||0.57 [0.33, 0.80]
||0.48 [0.33, 0.62]
||0.50 [0.30, 0.71]
|HR-C def 3
||0.67 [0.39, 0.95]
||0.53 [0.29, 0.76]
||0.36 [0.20, 0.53]
||0.56 [0.36, 0.76]
||0.66 [0.39, 0.94]
||0.61 [0.39, 0.83]
||0.37 [0.22, 0.51]
||0.45 [0.23, 0.67]
||0.65 [0.42, 0.89]
||0.85 [0.68, 1.00]
||0.53 [0.39, 0.68]
||0.30 [0.12, 0.48]
||0.63 [0.43, 0.83]
||0.43 [0, 0.97]
||0.49 [0.29, 0.70]
||0.49 [0.10, 0.88]
||0.76 [0.39, 1.00]
||0.45 [0.11, 0.77]
||0.57 [0.37, 0.77]
||0.57 [0.39, 0.74]
As only 1 patient had shown a t(14;16) in the training set, we did not try to train a machine learning model to predict the presence of a t(14;16) due to the imbalance in the training set.
HR-C def 1, high-risk cytogenetic status according to definition 1 [presence of any aberration of the following: del(17p), t(4;14), and t(14;16)]; HR-C def 2, presence of any aberration of the following: gain(1q), del(17p), t(4;14), and t(14;16); HR-C def 3: presence of any aberration of the following: gain(1q), del(13q), del(17p), t(4;14), and t(14;16).
Second, we investigated whether there is a connection between the predicted PCI and the presence of cytogenetic high-risk status/presence of cytogenetic aberrations (Fig. 6). All test sets were merged for this analysis. We found that patients with a cytogenetic high-risk status according to classification 1 showed a significantly higher predicted PCI than patients with cytogenetic standard risk status according to classification 1 (median predicted PCI 46% vs 38%, P = 0.01). Patients with a t(4;14) showed a significantly higher predicted PCI than patients without t(4;14) (median predicted PCI 47% vs 39%, P = 0.04). For the other cytogenetic risk classifications and cytogenetic aberrations, findings were not statistically significant in this data set.
Monoclonal plasma cell disorders can present with a wide variety of complex patterns in the BM (Fig. 4). However, beyond the fact that patients with focal lesions have adverse outcomes,32–39 it is not well understood which exact subpattern might point to specific disease subtypes, as groups with certain genetic alterations or a certain outcome. We hypothesized that machine learning image analysis algorithms could be of use to overcome this gap. In the present study, we applied the recently presented concept for automatic, objective, quantitative BM profiling from wb-MRIs scans25 on a large data set to learn about associations between local imaging patterns and local tumor tissue characteristics, with the ultimate goal to predict local BM biopsy results noninvasively from MRI.
Automated Bone Marrow Segmentation
Establishing precise, automatic BM segmentation is an indispensable prerequisite to bring radiomics analysis for MM into clinical practice. Earlier approaches on automatic BM segmentation40–42 reported results that were markedly worse than the benchmark for manual segmentation set by interrater experiments. Recently, first BM segmentation algorithms were presented which allowed BM segmentation from T1w images25 and ADC-maps43 with a quality similar to manual segmentations by a radiologist, and performed relatively robust even in external multicentric test sets. In the current study, we trained a nnU-Net on 470 cases with a wide variety of pathologies and several different MRI protocols and scanners represented in the training data sets, to perform individual segmentation of the right and left hip bone from T1-w images. We found the algorithm to perform segmentations with very high quality, surpassing the benchmark set by an interrater experiment and performing very robustly even in cases with severe pathologies, including paramedullary lesions, and in multicentric data.
Prediction of Plasma Cell Infiltration
A connection between stages of diffuse infiltration severity in MRI and PCI,39,44–47 as well as between signal intensities/ADC-values and PCI,43,48,49 has been described. However, to the best of our knowledge, these have not yet been used to predict PCI from MRI. The models established in this study predicted PCI values that are significantly correlated (r between 0.66 and 0.71) to the actual PCI values on the internal data set. As expected and commonly observed in machine learning, the accuracy of the prediction models declined in the external test sets (r between 0.30 and 0.58). The best model showed a significant correlation between predicted and actual PCI values in all external test sets (all P's ≤ 0.01), demonstrating the external generalizability of the PCI prediction model.
Experts have recommended the addition of clinical features to the radiomics features to improve the predictive performance.50 As age and BMI are connected to the morphology of BM in MRI,51,52 we trained models that account for these factors. This did not markedly change the performance in our study. The addition of more training data that had been obtained with an older scanner and had lower imaging quality did not markedly change the prediction performance either. When interpreting the performance of the presented algorithms, it needs to be considered that the PCI value from biopsy itself has a certain level of uncertainty. It is well known that the biopsy which is taken in one position without image guidance is not necessarily representative for the tumor load of the whole patient.16 The PCI value also depends on the technique of the biopsy and its evaluation: Joshi and colleagues53 reported that the mean PCI was 13.1% from BM aspirates, while being 31.8% when assessed from trephine. In line with their study, we observed a correlation coefficient of 0.53 between histologically and cytologically assessed PCI in our data set. The fact that the result from the single-site biopsy is not necessarily representative and the moderate correlation between the PCI values obtained by different techniques when analyzing BM samples puts the r values of the predictions by our algorithms, ranging from 0.30 to 0.71, into perspective. Although it must be assumed that cases with nonrepresentative PCI values from biopsies in the training set have impeded our modeling, by using a high number of training cases, we assume that the influence of such outliers has been somewhat limited.
Prediction of Cytogenetic Aberrations
Few earlier works had reported connections between morphologic BM MRI patterns and cytogenetic aberrations/gene expression profiling data from biopsies at the iliac crest.44,47,54–56 The models established in this study to predict the cytogenetic results showed low to moderate discriminative ability in the internal test set. However, none of the models generalized to all 3 external test sets. Thereby, our results challenge the report from 1 earlier publication: in a radiomics study based on 89 patients without external test set, the authors had concluded that high-risk cytogenetic status at the iliac crest could be predicted by radiomics from spinal MRI with good performance.56
It had been reported that patients with (severe) diffuse infiltration in MRI were more likely to have genetic high-risk aberrations.44,47,55 As severe diffuse infiltration in MRI is connected to increased PCI,39,44–47,55 we investigated whether the predicted PCI is connected to presence of high-risk cytogenetic status or certain cytogenetic aberrations. Indeed, we found that patients with cytogenetic high-risk status according to classification 1 had a significantly higher predicted PCI than patients without cytogenetic high-risk status according to classification 1. Furthermore, patients with t(4;14) had a significantly higher predicted PCI than patients without t(4;14). The fact that the cytogenetic risk status/presence of t(4;14) was connected to the predicted PCI supports the hypothesis that certain MRI patterns are at least in part related to genetic properties of the tumor cells. However, in our large study with external test data, direct prediction of the cytogenetic risk status or presence of individual cytogenetic aberrations was not possible with reasonable accuracy.
Although our current work correlated the radiomics features from both hip bones with results from unguided BM biopsies at the posterior iliac crest, in the future, our approach for prediction of genetic results should be transferred to correlating radiomics features from specific focal lesions with targeted biopsies from the respective focal lesions, as connections between imaging findings of focal lesions and genetic properties of local clones have been described.18,57 However, this will only be possible once a sufficiently large quantity of targeted biopsies and correlating, high-quality MRIs is available to reasonably train and test machine learning algorithms.
The algorithms in this study were trained and tested on several data sets, which had been acquired with heterogeneous scan parameters and scanners. Given that, in vivo, the reproducibility of radiomics features at other scanners is worse than their repeatability,26 this heterogeneity probably limited the accuracy of the predictions, which is also in line with our finding that the performance of the predictive models declined in the external data sets. Therefore, further standardization of MRI scanners and protocols, as currently ongoing,58 in general standardization of radiomics pipelines,29 and application of advanced data harmonization methods,59–61 should be pursued to improve the performance of our approach in the future. A further limitation is that when biopsy was performed before MRI, this had caused postbioptic BM changes and thereby influenced the images. However, these limitations had to be accepted to create a data set with a reasonable size to apply machine learning, compromising between quality and quantity of the data to train and test machine learning algorithms. The current model is based on T1w sequences only. We expect that an addition of information from additional MRI sequences, especially (semi)quantitative sequences such as diffusion weighted imaging or Dixon, which have proven high value in imaging of MM,49,62–64 will further improve the results.
CONCLUSIONS AND OUTLOOK
This study proves the feasibility of using machine learning algorithms to predict local BM PCI automatically from MRI, even in independent external test sets. Although 2 significant connections between radiomics profiles and cytogenetic risk status/cytogenetic aberrations were observed, cytogenetic risk status or aberrations could not be predicted with reasonable accuracy in independent test sets. Based on these findings, we do not indicate that the current algorithm should replace conventional BM biopsies, but we conclude that the predicted PCI from MRI might serve as an additional parameter to evaluate and monitor the tumor load. In contrast to invasive BM biopsies, which cause significant discomfort to patients and come with risks as bleeding, infection, or nerve injury, the predicted PCI can be assessed noninvasively, frequently, and is not prone to random sampling errors. Its assessment is fast, as only one T1w MRI block over the pelvis needs to be acquired, with an MRI scan time of less than 2 minutes. Beyond the direct application of the predicted PCI as additional tumor load parameter, on a more general level, our work proves that local radiomics signatures are linked to local tumor tissue characteristics in MM. This supports the further development of automated image analysis algorithms to analyze complex whole-body imaging data sets automatically and in depth. Given the established link between local radiomics signatures and local tumor tissue characteristics, such machine learning models have the potential to inform on local tumor tissue characteristics multifocally across the complete BM, and thereby capture the spatially heterogeneous tumor manifestations and complex biological patterns observed in MM. Enabling individual, detailed monitoring of all local processes is urgently needed in MM, given the recent insights about the spatial heterogeneity and the spatiotemporal evolution of the disease,18–20 as well as the recent evidence that functional whole-body imaging delivers complementary information to BM biopsies in minimal residual disease assessment of MM patients.65–67
The authors thank the German-Speaking Myeloma Multicenter Group (GMMG) for the provision of data from the GMMG-HD7 trial (EudraCT: 2017-004768-37). They would also like to thank Prof Dr Yon-Dschun Ko from the Johanniter-Clinics Bonn and Dr Jörg-Thomas Bittenbring from the University Hospital Saarland for the provision of data from centers 4 and 7.
The authors would like to thank Dr Ekaterina Menis, Dr Oyunbileg von Stackelberg, and Richard Meier for their valuable administrative support.
1. Kumar SK, Rajkumar V, Kyle RA, et al. Multiple myeloma. Nat Rev Dis Prim
2. Waxman AJ, Mick R, Garfall AL, et al. Classifying ultra-high risk smoldering myeloma. Leukemia
3. Rajkumar SV, Larson D, Kyle RA. Diagnosis of smoldering multiple myeloma. N Engl J Med
4. Kastritis E, Terpos E, Moulopoulos L, et al. Extensive bone marrow infiltration and abnormal free light chain ratio identifies patients with asymptomatic myeloma at high risk for progression to symptomatic disease. Leukemia
5. Paiva B, Vidriales M-B, Pérez JJ, et al. Multiparameter flow cytometry quantification of bone marrow plasma cells at diagnosis provides more prognostic information than morphological assessment in myeloma patients. Haematologica
6. Chakraborty R, Muchtar E, Kumar SK, et al. Impact of pre-transplant bone marrow plasma cell percentage on post-transplant response and survival in newly diagnosed multiple myeloma. Leuk Lymphoma
7. Al Saleh AS, Parmar HV, Visram A, et al. Increased bone marrow plasma-cell percentage predicts outcomes in newly diagnosed multiple myeloma patients. Clin Lymphoma Myeloma Leuk
8. Neben K, Jauch A, Hielscher T, et al. Progression in smoldering myeloma is independently determined by the chromosomal abnormalities del(17p), t(4;14), gain 1q, hyperdiploidy, and tumor load. J Clin Oncol
9. Lakshman A, Rajkumar SV, Buadi FK, et al. Risk stratification of smoldering multiple myeloma incorporating revised IMWG diagnostic criteria. Blood Cancer J
10. Palumbo A, Avet-Loiseau H, Oliva S, et al. Revised international staging system for multiple myeloma: a report from International Myeloma Working Group. J Clin Oncol
11. Weinhold N, Salwender HJ, Cairns DA, et al. Chromosome 1q21 abnormalities refine outcome prediction in patients with multiple myeloma—a meta-analysis of 2,596 trial patients. Haematologica
12. Mateos M-V, Kumar S, Dimopoulos MA, et al. International Myeloma Working Group risk stratification model for smoldering multiple myeloma (SMM). Blood Cancer J
13. Sonneveld P, Avet-Loiseau H, Lonial S, et al. Treatment of multiple myeloma with high-risk cytogenetics: a consensus of the International Myeloma Working Group. Blood
14. Rajkumar SV, Dimopoulos MA, Palumbo A, et al. International Myeloma Working Group updated criteria for the diagnosis of multiple myeloma. Lancet Oncol
15. Kumar S, Paiva B, Anderson KC, et al. International Myeloma Working Group consensus criteria for response and minimal residual disease assessment in multiple myeloma. Lancet Oncol
16. Latifoltojar A, Boyd K, Riddell A, et al. Characterising spatial heterogeneity of multiple myeloma in high resolution by whole body magnetic resonance imaging: towards macro-phenotype driven patient management. Magn Reson Imaging
17. Hillengass J, Ellert E, Spira D, et al. Comparison of plasma cell infiltration in random samples of the bone marrow and osteolyses acquired by CT-guided biopsy in patients with symptomatic multiple myeloma. J Clin Oncol
18. Rasche L, Chavan SS, Stephens OW, et al. Spatial genomic heterogeneity in multiple myeloma revealed by multi-region sequencing. Nat Commun
19. Merz M, Merz AMA, Wang J, et al. Deciphering spatial genomic heterogeneity at a single cell resolution in multiple myeloma. Nat Commun
20. Rasche L, Schinke C, Maura F, et al. The spatio-temporal evolution of multiple myeloma from baseline to relapse-refractory states. Nat Commun
21. Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer
22. Aerts HJ, Velazquez ER, Leijenaar RT, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun
23. Aerts HJ. The potential of radiomic-based phenotyping in precision medicine: a review. JAMA Oncol
24. Isensee F, Jaeger PF, Kohl SAA, et al. nnU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods
25. Wennmann M, Klein A, Bauer F, et al. Combining deep learning and radiomics for automated, objective, comprehensive bone marrow characterization from whole-body MRI. Invest Radiol
26. Wennmann M, Bauer F, Klein A, et al. In vivo repeatability and multiscanner reproducibility of MRI radiomics features in patients with monoclonal plasma cell disorders: a prospective bi-institutional study. Invest Radiol
27. Götz M, Nolden M, Maier-Hein K. MITK phenotyping: an open-source toolchain for image-based personalized medicine with radiomics. Radiother Oncol
28. Wennmann M, Thierjung H, Bauer F, et al. Repeatability and reproducibility of ADC measurements and MRI signal intensity measurements of bone marrow in monoclonal plasma cell disorders. Invest Radiol
29. Zwanenburg A, Vallieres M, Abdalah MA, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology
30. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res
31. D'Agostino M, Cairns DA, Lahuerta JJ, et al. Second revision of the international staging system (R2-ISS) for overall survival in multiple myeloma: a European myeloma network (EMN) report within the HARMONY project. J Clin Oncol
32. Mai EK, Hielscher T, Kloth JK, et al. A magnetic resonance imaging-based prognostic scoring system to predict outcome in transplant-eligible patients with multiple myeloma. Haematologica
33. Walker R, Barlogie B, Haessler J, et al. Magnetic resonance imaging in multiple myeloma: diagnostic and clinical implications. J Clin Oncol
34. Kastritis E, Moulopoulos LA, Terpos E, et al. The prognostic importance of the presence of more than one focal lesion in spine MRI of patients with asymptomatic (smoldering) multiple myeloma. Leukemia
35. Dhodapkar MV, Sexton R, Waheed S, et al. Clinical, genomic, and imaging predictors of myeloma progression from asymptomatic monoclonal gammopathies (SWOG S0120). Blood
36. Hillengass J, Fechtner K, Weber M-A, et al. Prognostic significance of focal lesions in whole-body magnetic resonance imaging in patients with asymptomatic multiple myeloma. J Clin Oncol
37. Hillengass J, Weber MA, Kilk K, et al. Prognostic significance of whole-body MRI in patients with monoclonal gammopathy of undetermined significance. Leukemia
38. Merz M, Hielscher T, Wagner B, et al. Predictive value of longitudinal whole-body magnetic resonance imaging in patients with smoldering multiple myeloma. Leukemia
39. Lecouvet FE, Vande Berg BC, Michaux L, et al. Stage III multiple myeloma: clinical and prognostic value of spinal bone marrow MR imaging. Radiology
40. Almeida SD, Santinha J, Oliveira FPM, et al. Quantification of tumor burden in multiple myeloma by atlas-based semi-automatic segmentation of WB-DWI. Cancer Imaging
41. Arabi H, Zaidi H. Whole-body bone segmentation from MRI for PET/MRI attenuation correction using shape-based averaging. Med Phys
42. Lavdas I, Glocker B, Kamnitsas K, et al. Fully automatic, multiorgan segmentation in normal whole body magnetic resonance imaging (MRI), using classification forests (CFs), convolutional neural networks (CNNs), and a multi-atlas (MA) approach. Med Phys
43. Wennmann M, Neher P, Stanczyk N, et al. Deep learning for automatic bone marrow apparent diffusion coefficient measurements from whole-body magnetic resonance imaging in patients with multiple myeloma: a retrospective multicenter study. Invest Radiol
44. Mai EK, Hielscher T, Kloth JK, et al. Association between magnetic resonance imaging patterns and baseline disease features in multiple myeloma: analyzing surrogates of tumour mass and biology. Eur Radiol
45. Kloth JK, Hillengass J, Listl K, et al. Appearance of monoclonal plasma cell diseases in whole-body magnetic resonance imaging and correlation with parameters of disease activity. Int J Cancer
46. Moulopoulos LA, Gika D, Anagnostopoulos A, et al. Prognostic significance of magnetic resonance imaging of bone marrow in previously untreated patients with multiple myeloma. Ann Oncol
47. Messiou C, Porta N, Sharma B, et al. Prospective evaluation of whole-body MRI versus FDG PET/CT for lesion detection in participants with myeloma. Radiol Imaging Cancer
48. Dutoit JC, Vanderkerken MA, Anthonissen J, et al. The diagnostic value of SE MRI and DWI of the spine in patients with monoclonal gammopathy of undetermined significance, smouldering myeloma and multiple myeloma. Eur Radiol
49. Hillengass J, Bauerle T, Bartl R, et al. Diffusion-weighted imaging for non-invasive and quantitative monitoring of bone marrow infiltration in patients with monoclonal plasma cell disease: a comparative study with histology. Br J Haematol
50. Lambin P, Leijenaar RTH, Deist TM, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol
51. Poulton TB, Murphy WD, Duerk JL, et al. Bone marrow reconversion in adults who are smokers: MR imaging findings. AJR Am J Roentgenol
52. Lavdas I, Rockall AG, Castelli F, et al. Apparent diffusion coefficient of normal abdominal organs and bone marrow from whole-body DWI at 1.5 T: the effect of sex and age. AJR Am J Roentgenol
53. Joshi R, Horncastle D, Elderfield K, et al. Bone marrow trephine combined with immunohistochemistry is superior to bone marrow aspirate in follow-up of myeloma patients. J Clin Pathol
54. Waheed S, Mitchell A, Usmani S, et al. Standard and novel imaging methods for multiple myeloma: correlates with prognostic laboratory variables including gene expression profiling data. Haematologica
55. Moulopoulos LA, Dimopoulos MA, Kastritis E, et al. Diffuse pattern of bone marrow involvement on magnetic resonance imaging is associated with high risk cytogenetics and poor outcome in newly diagnosed, symptomatic patients with multiple myeloma: a single center experience on 228 patients. Am J Hematol
56. Liu J, Zeng P, Guo W, et al. Prediction of high-risk cytogenetic status in multiple myeloma based on magnetic resonance imaging: utility of radiomics and comparison of machine learning methods. J Magn Reson Imaging
57. Rasche L, Angtuaco E, McDonald JE, et al. Low expression of hexokinase-2 is associated with false-negative FDG-positron emission tomography in multiple myeloma. Blood
58. Rata M, Blackledge M, Scurr E, et al. Implementation of whole-body MRI (MY-RADS) within the OPTIMUM/MUKnine multi-centre clinical trial for patients with myeloma. Insights Imaging
59. Orlhac F, Frouin F, Nioche C, et al. Validation of a method to compensate multicenter effects affecting CT radiomics. Radiology
60. Leithner D, Nevin RB, Gibbs P, et al. ComBat harmonization for MRI radiomics: impact on nonbinary tissue classification by machine learning. Invest Radiol
. 2023. Online ahead of print. doi:10.1097/RLI.0000000000000970.
61. Gatidis S, Kart T, Fischer M, et al. Better together: data harmonization and cross-study analysis of abdominal MRI data from UK biobank and the German National Cohort. Invest Radiol
. 2022. Online ahead of print. doi:10.1097/RLI.0000000000000941.
62. Giles SL, Messiou C, Collins DJ, et al. Whole-body diffusion-weighted MR imaging for assessment of treatment response in myeloma. Radiology
63. Latifoltojar A, Hall-Craggs M, Bainbridge A, et al. Whole-body MRI quantitative biomarkers are associated significantly with treatment response in patients with newly diagnosed symptomatic multiple myeloma following bortezomib induction. Eur Radiol
64. Chiabai O, Van Nieuwenhove S, Vekemans M-C, et al. Whole-body MRI in oncology: can a single anatomic T2 Dixon sequence replace the combination of T1 and STIR sequences to detect skeletal metastasis and myeloma? Eur Radiol
65. Rasche L, Alapat D, Kumar M, et al. Combination of flow cytometry and functional imaging for monitoring of residual disease in myeloma. Leukemia
66. Böckle D, Tabares P, Zhou X, et al. Minimal residual disease and imaging-guided consolidation strategies in newly diagnosed and relapsed refractory multiple myeloma. Br J Haematol
67. Alonso R, Cedena MT, Gómez-Grande A, et al. Imaging and bone marrow assessments improve minimal residual disease prediction in multiple myeloma. Am J Hematol