Multiple myeloma (MM) is a heterogeneous disease regarding symptoms, tumor genetics, and outcome.1 Precise diagnostics to classify disease subtypes and to predict the individual course of disease is of utmost importance to guide treatment accordingly, a principle known as precision oncology. In addition to serological and urine parameters, imaging and bone marrow (BM) biopsy results play decisive roles in diagnosis of monoclonal plasma cell disorders. Parameters received from unguided BM biopsy at the posterior iliac crest as plasma cell infiltration (PCI)2–7 or cytogenetic aberrations8–11 have also proven value as biomarkers and consequently are now used for staging, risk stratification, and response assessment.9,10,12–15 However, tumor load distribution16,17 and genomic aberrations18–20 can be spatially heterogeneous. Invasive biopsies have the disadvantage that they cannot be performed both multifocally and in frequent repetition, which would be necessary to precisely evaluate and monitor the tumor load and capture the complete genomic landscape in each patient. Here, whole-body magnetic resonance imaging (wb-MRI) is advantageous, as it allows to investigate the complete BM of a patient noninvasively and to capture information on spatial distribution and local characteristics of tumor manifestations. Consequently, a method to predict local biopsy results from MRI without multiple invasive biopsies would be of great value, if a link between local imaging findings and local histology or local genetic findings could be established.
Radiomics is a new image analysis approach, characterizing a structure from imaging by calculating hundreds of mathematically defined radiomics features to quantify signal intensity, shape, and texture of the structure.21,22 For various oncologic entities, it has been reported that radiomics can predict tumor tissue characteristics, as histologic or genetic results.23 For patients with MM, it was recently demonstrated that the combination of an nnU-Net24 for segmentation and a subsequent radiomics analysis21,22 allows to automatically obtain an objective, in-depth characterization of the whole BM automatically from wb-MRI.25
Given this clinical, biological, and technical background, the purpose of this study was to develop and test an automated deep learning and radiomics image analysis framework, which analyzes the pelvic BM from whole-body MRIs to predict the local tumor tissue characteristics PCI and cytogenetic aberrations from routine BM biopsy at the iliac crest.
METHODS
Study Design and Algorithmic Concept
This study was designed as a retrospective study using multicentric data sets to establish and test a methodological concept to perform automatic tissue analysis from routine clinical MRIs and predict BM biopsy results in patients with monoclonal plasma cell disorders. To achieve this goal, an automated multistep pipeline was established, which included automated pelvic BM segmentation by a deep learning algorithm,24 image normalization26 and resampling, radiomics feature calculation,27 and parameter prediction based on a machine learning model (Fig. 1). This study was approved by the institutional review board Medical Faculty, University of Heidelberg (Germany); S-537/2020 with waiver of informed consent. The acquisition of imaging and clinical data was performed between 2008 and 2021. Specific planning, data annotation, establishing algorithms, and data analysis for this project were performed between 2019 and 2022.
FIGURE 1: Multistep workflow for automated image analysis and prediction of biopsy results. Step 0: A coronal T1-weighted turbo spin echo sequence is used as input for the processing pipeline. Step 1: The nnU-Net, a deep learning-based segmentation algorithm,
24 performs automatic segmentation of the right hip bone and the left hip bone and marks each side individually. In addition, the medial parts of the piriformis muscles are also automatically segmented (both sides receive the same label). Step 2: Images are normalized to the mean signal intensity of the piriformis muscle,
26 and resampled to a uniform geometry. Step 3: 260 radiomics features (91 first order and 169 texture features) are calculated by the MITK phenotyping
27 radiomics toolbox. Step 4: Machine-learning models predict the target parameters bone marrow plasma cell infiltration, cytogenetic risk status, and cytogenetic aberrations from the radiomics features. T1w TSE, T1-weighted turbo spin echo sequence.
Patient Cohorts and Data Sets
There was no prespecified sample size for this study. As recommended, we aimed to assemble an overall data set with a large number of samples so that machine learning methods can be reasonably applied, with a balanced distribution of patients with low and high PCI, and which allows for independent, external, multicentric testing of the final algorithms. Data from center 1 were derived from 5 different subsets, which had in part been included in other studies. Data from center 1 were used for training of the algorithms, and a hold-out subset was reserved for independent, internal testing (internal test set). Data from center 2 were used for external testing: once in a subset of MRIs with a homogenized MRI protocol and high imaging quality (center 2, high-quality test set), and once in a subset that did not fulfill these criteria (center 2, other test set). Finally, a multicentric data set was used to evaluate the performance in a very heterogenous data set (multicenter test set). Details on the inclusion and exclusion process including the respective flow-charts are reported in Supplemental Digital Content 1, https://links.lww.com/RLI/A806. An overview over all data sets is shown in Figure 2.
FIGURE 2: Overview of study cohorts and data sets. Data from center 1 were used to train the automatic segmentation algorithm and the radiomics algorithms, and an independent subset of 59 patients was preserved for internal testing of the resulting pipeline. aPatients without concomitant bone marrow biopsy were assigned to the training set for the nnU-Net only. Patients with concomitant bone marrow biopsy were split by date into training and test sets as described in detail in the supplements. bThis training part was then used both for training of the nnU-Net and the radiomics algorithms. Then, the resulting pipeline was tested on the internal test set. cBefore external testing, the radiomics algorithms were retrained on all internal data. Then, the radiomics algorithms were tested on 3 external test sets: (1) the high-quality test set from center 2 that had been acquired with a harmonized MRI protocol at the newest MRI scanner and was free of imaging artifacts; (2) all data from center 2 that did not meet these criteria (center 2, other test set); and (3) the multicenter test set that included data from a wide variety of scanners from 6 different centers.
Imaging
Coronal T1-weighted (T1w) turbo spin echo images that had been acquired with different MRI scanners and different MRI sequence parameters were included. Detailed information on all scanners and sequence parameters is reported in Supplemental Digital Content 2, https://links.lww.com/RLI/A807.
Image Segmentation and Training and Testing of nnU-Nets
The BM of the right and left hip bone and the medial part of the piriformis muscle were segmented on coronal T1w images. An initial subset of 127 MRIs was annotated manually to train a first nnU-Net,24 which was then used to automatically presegment the rest of the training data. After manual improvement by 1 experienced rater (7 years of experience in MRI segmentation), the final nnU-Net was trained on all training data (470 cases). Testing of the nnU-Net was performed in 3 independent test sets with overall 37 manually segmented cases as reference. Further details are reported in Supplemental Digital Content 3, https://links.lww.com/RLI/A808.
Radiomics Analysis
Before feature extraction, all images were resampled to a uniform voxel spacing and normalized to the mean signal intensity of the piriformis muscle to minimize heterogeneity between data sets caused by technical variations in image acquisition.26,28 The IBSI-conform29 and validated software MITK Phenotyping27 were used for radiomics feature calculation. A total of 260 radiomics features were calculated, with 91 first-order features and 169 texture features. Volume and shape features were omitted because these are not expected to carry disease-specific information in this setting, as whole BM spaces are analyzed. Four different radiomics models were trained for the PCI prediction task: once with versus without inclusion of a data set from an older scanner, and once with versus without addition of clinical features. Random forest regression-based models were trained to predict PCI from the radiomics (and clinical) features using the sklearn python package.30 Random forest classifiers based on radiomics (and clinical) features were trained for the predictions of binary variables. All machine learning modeling was performed with Python (Version 3.8.10, Python Software Foundation, Delaware). The models were initially trained on the training set of center 1 only (168 cases) and tested on the center 1 test set (59 cases). Then, the models were retrained on all data from center 1 (227 cases), and finally tested on the external test sets (including a total of 143 cases). Further details on the radiomics analysis are reported in Supplemental Digital Content 4, https://links.lww.com/RLI/A809.
Histological, Cytological, and Cytogenetic Data
Bone marrow biopsies were performed at the posterior iliac crest without image guidance. When assessing PCI from BM biopsy, in line with the International Myeloma Working Group recommendations,14 the higher value from the histological and cytological PCI was used. The 5 cytogenetic aberrations gain(1q), del(13q), del(17p), t(4;14), and t(14;16), which are currently used to form the respective risk stratifications in smoldering MM12 and in MM,10,31 were investigated. In addition to the prediction of the presence of each cytogenetic aberration individually, it was also predicted whether a high-risk cytogenetic status was present. Three different definitions for cytogenetic high-risk were used: high-risk cytogenetic status according to definition 1 (abbreviated “HR-C def 1”), based on the definition of cytogenetic high-risk aberrations in R-ISS [presence of any aberration of the following: del(17p), t(4;14), and t(14;16)]10; “HR-C def 2,” based on the definition of cytogenetic high-risk aberrations in R2-ISS [presence of any aberration of the following: gain(1q), del(17p), t(4;14), and t(14;16)]31; and “HR-C def 3,” based on the definition proposed for smoldering MM [presence of any aberration of the following: gain(1q), del(13q), del(17p), t(4;14), and t(14;16)].12
Statistical Analysis
Dice scores were calculated to quantify the agreement between automatic and manual segmentations. Pearson correlation was used to evaluate the correlation between predicted and actual PCI values. The area under the receiver operating characteristic (AUROC) was used to evaluate the cytogenetic risk status or cytogenetic aberration prediction. Spearman correlation coefficient was used to evaluate the correlation between individual radiomics features and PCI. The Wilcoxon test was used to assess the difference in predicted PCI values between the cytogenetic high-risk and standard risk group, or between the groups with and without the respective cytogenetic aberration. The Gini feature importance was used to report the relative influence of a feature for the prediction model and was calculated as implemented in scikit-learn.30 The 95% confidence intervals (CIs) for Pearson correlation coefficients and AUROCs were calculated. P values <0.05 were considered statistically significant. The statistical analysis was performed with Python (Version 3.8.10; Python Software Foundation, Delaware) and R (Version 4.0.1 R Foundation for Statistical Computing, Vienna, Austria).
RESULTS
Study Cohort and Data Sets
A total of 672 MRIs from 512 patients (median, age 61 years; interquartile range, 53–67 years; 307 men) from 8 centers and 370 corresponding BM biopsies were included in this study. An overview over the data sets is displayed in Figure 2. Details on inclusion and exclusion at each stage are reported in the methods and in the supplements (Supplemental Digital Content 1, https://links.lww.com/RLI/A806). Table 1 reports descriptive information for each data set.
TABLE 1 -
Description of Study Cohorts
A. Data for Segmentation Experiments |
Training Set for nnU-Net |
Internal Test Set/Interrater Variability |
Center 2 Test Set |
Multicenter Test Set |
n MRIs (n patients) |
470 MRIs (from 310 patients) |
8 wb-MRIs (from 8 patients) |
15 wb-MRIs (from 15 patients) |
14 wb-MRIs (from 14 patients) |
Patient characteristic |
|
|
|
|
Male sex, n (%) |
186 (60%) |
4 (50%) |
8 (53%) |
8 (57%) |
Age in yearsb
|
61 (53–67) |
59 (48–67) |
51 (47–66) |
60 (54–64) |
Disease stage |
|
|
|
|
MGUS |
26 |
2 |
0 |
0 |
SMM |
131 |
2 |
4 |
0 |
NDMM |
182 |
4 |
11 |
14 |
ISS I/II/III (n.a.) |
114/25/22 (21) |
0/2/1 (1) |
4/3/4 |
5/4/5 |
On/after therapy or n.a. |
131 |
0 |
0 |
0 |
Tumor load surrogates |
|
|
|
|
PCI in %b
|
23 (12–50; 257) |
27 (20–63) |
43 (19–59) |
48 (15–71) |
M-protein in g/Lb
|
20 (11–35; 130) |
30 (15–50) |
26 (17–35) |
36 (28–44) |
B. Data for Radiomics Experiments |
Internal Training Seta Radiomics |
Internal Test Seta
|
Center 2, High-Quality Test Set |
Center 2, Other Test Set |
Multicenter Test Set |
n MRIs (n patients) |
168 wb-MRIs (from 166 patients) |
59 wb-MRIs (from 59 patients) |
32 wb-MRIs (from 32 patients) |
75 wb-MRIs (from 75 patients) |
36 wb-MRIs (from 36 patients) |
Patient characteristic |
|
|
|
|
|
Male sex, n (%) |
101 (61%) |
30 (51%) |
16 (50%) |
56 (75%) |
19 (53%) |
Age in yearsb
|
61 (52–67) |
60 (51–69) |
64 (53–70) |
61 (53–68) |
60 (51–65) |
Disease stage |
|
|
|
|
|
MGUS |
6 |
4 |
0 |
2 |
0 |
SMM |
47 |
17 |
8 |
11 |
0 |
NDMM |
115 |
38 |
24 |
62 |
36 |
ISS I/II/III (n.a.) |
76/18/14 (7) |
23/9/2 (4) |
13/3/8 (0) |
33/13/12 (4) |
11/15/10 (0) |
On/after therapy or n.a. |
0 |
0 |
0 |
0 |
0 |
Tumor load surrogates |
|
|
|
|
|
PCI in %b
|
23 (12–46) |
20 (15–55) |
43 (18–68) |
26 (10–48) |
51 (30–80) |
M-protein in g/Lb
|
24 (12–41; 28) |
24 (14–43; 13) |
24 (10–36; 7) |
32 (15–40; 21) |
37 (21–46; 16) |
Cytogenetics |
|
|
|
|
|
HR-C def 1c
|
37% (22/60) |
27% (4/15) |
23% (7/31) |
23% (15/65) |
17% (6/35) |
HR-C def 2c
|
56% (40/71) |
53% (8/15) |
45% (14/31) |
47% (31/66) |
43% (15/35) |
HR-C def 3c
|
78% (62/80) |
74% (14/19) |
64% (18/28) |
69% (45/65) |
49% (17/35) |
gain(1q)d
|
35% (35/101) |
33% (7/21) |
37% (10/27) |
40% (26/65) |
35% (12/34) |
del(13q)d
|
48% (49/102) |
57% (13/23) |
54% (15/28) |
58% (38/65) |
35% (12/34) |
del(17p)d
|
10% (10/102) |
4% (1/24) |
7% (2/27) |
12% (8/65) |
9% (3/35) |
t(4;14)d
|
24% (14/59) |
27% (4/15) |
19% (5/27) |
14% (9/65) |
9% (3/35) |
t(14;16)d
|
2% (1/54) |
0% (0/16) |
4% (1/27) |
2% (1/65) |
3% (1/35) |
MRI biopsy interval (days)b
|
5 (0–29) |
17 (2–61) |
6 (1–24) |
4 (1–13) |
5 (2–15) |
Table A reports descriptive information for each data set included the segmentation experiments, and Table B reports descriptive information for each data set included in the radiomics experiments.
aNote: Initially, algorithms were trained on only the training set from center 1 and tested on the internal test set. Then, radiomics algorithms were retrained on all data from center 1 and then the resulting models were tested on the external test sets.
bMedian (interquartile range; n missing [only reported in case there were any cases with missing information]).
cPercentage with high risk (number high risk/number all).
dPercentage with cytogenetic aberration (number cytogenetic aberration present/number cytogenetic aberration tested). HR-C def 1, high-risk cytogenetic status according to definition 1 based on the definition of cytogenetic high-risk aberrations in R-ISS [presence of any aberration of the following: del(17p), t(4;14), and t(14;16)]
10; “HR-C def 2,” based on the definition of cytogenetic high-risk aberrations in R2-ISS [presence of any aberration of the following: gain(1q), del(17p), t(4;14), and t(14;16)]
31; and “HR-C def 3,” based on the definition proposed for smoldering MM [presence of any aberration of the following: gain(1q), del(13q), del(17p), t(4;14), and t(14;16)].
12 n.a., not available; MGUS, monoclonal gammopathy of unknown significance; SMM, smoldering multiple myeloma; %, percentage of this cohort; PCI, plasma cell infiltration in the bone marrow in %.
Quality of Automated Pelvic Bone Marrow Segmentation
The Dice scores for the automated pelvic BM segmentations and for the interrater variability between 2 radiologists are reported in Table 2. Figure 3 displays automated segmentations in 5 examples, including cases with severest pathologies.
TABLE 2 -
Quality of Automatic Segmentation and Interrater Variability
Test Set |
n |
nnU-Net vs Radiologist: Right Pelvis |
nnU-Net vs Radiologist: Left Pelvis |
Interrater Variabilitya: Right Pelvis |
Interrater Variabilitya: Left Pelvis |
Internal test setb
|
8 |
0.96 ± 0.03 |
0.96 ± 0.03 |
0.88 ± 0.02 |
0.87 ± 0.02 |
Center 2 test setc
|
15 |
0.93 ± 0.01 |
0.92 ± 0.01 |
|
|
Multicenter test setd
|
14 |
0.89 ± 0.03 |
0.90 ± 0.03 |
|
|
Mean Dice scores (± standard deviation) are reported to assess quality of automatic bone marrow segmentation, and to compare it with the interrater variability of segmentations between 2 radiologists.
aBetween manual segmentations from 2 different radiologists.
bLast 2 by date for each data set I–IV.
cNewest 15 by date.
dNewest 3 by date per center. From center 4 and center 7, only one processable data set was available. This data set comprises 14 MRIs from 6 centers, acquired with 5 different scanner models from 3 different vendors.
FIGURE 3: Exemplary automated segmentations in different pathologies and different external centers. On the left, one slice of the MRI scan for each case is displayed. The automatic segmentation (bone marrow of the right pelvis: red, bone marrow of the left pelvis: yellow, medial part of piriformis muscle: blue) is superimposed on the respective image in the middle, and the 3-dimensional model of the automatic segmentations is shown on the right. A, A 68-year-old male patient with normal-appearing, T1w-hyperintense, “fatty” bone marrow. B, A 55-year-old male patient with over 30 T1w-hypointense focal lesions in the right and left pelvis, ranging from 7 mm to 6.2 cm. All focal lesions were correctly included in the segmentation. C, A 66-year-old female patient with severely T1w-hypointense bone marrow. The signal intensity of the bone marrow approximates the signal intensity of muscle, representing a severe diffuse infiltration pattern. This patient showed 70% plasma cell infiltration in the bone marrow biopsy. Despite the severe pathology, the bone marrow was segmented correctly. A–C, Images were all acquired at center 2. D, A 67-year-old female patient with large, secondary extramedullary lesion at the right posterior iliac crest. The scan was acquired at center 5 with a scanner from a different vendor, and consequently, the image appears somewhat different compared with examples A–C. Despite the large focal lesion (6.1 cm by 4.2 cm) with a paramedullary component and the fact that the image was acquired at a scanner from a different vendor, the segmentation was precise and even contained parts of the lesions, which extend beyond the regular shape of the bone. E, A 67-year-old male patient with a large (6.5cm × 4.8 cm), secondary extramedullary focal lesion at the right posterior iliac crest and several focal lesions in the left iliac crest. This scan was acquired at center 6 with an older MRI scanner model, and the image appears somewhat different compared with examples A–C. Despite the considerable paramedullary part of the focal lesion at the right side and the different image characteristic, the automated segmentation was precise and included even the paramedullary parts.
Quantitative Profiling of Bone Marrow Phenotypes Using Radiomics
A wide variety of different morphologic BM patterns can be observed in MRI in patients with monoclonal plasma cell disorders. Figure 4 displays several exemplary cases of these varying MRI BM patterns and the resulting radiomics signature from the pelvic BM for each case. These cases exemplify how differences in morphologic MRI patterns lead to differences in the extracted quantitative, objective radiomics profiles.
FIGURE 4: Visual bone marrow patterns and resulting radiomics profiles. Panel A displays different visual bone marrow patterns as observed in T1w images, and panel B shows the corresponding radiomics signatures extracted from the pelvic bone marrow for each case. Patient 1 (P1) and P2 show hyperintense, relatively homogeneous bone marrow, which is the physiological pattern observed in elder patients. P3 and P4 show a pattern with T1w-hypointense focal lesions. P3 shows a patient with over 30 focal lesions in the right and left pelvis, sizes ranging from 7 mm to 6.2 cm, representing a pattern of multiple focal lesions with varying size, combined with a homogeneous, intermediate diffuse infiltration. P4 shows a patient with multiple (20) focal lesions in the pelvis, ranging from 5 mm to 1.6 cm, and a patchy diffuse infiltration. P5 and P6 show 2 cases with severely T1w hypointense bone marrow, which approximates the signal intensity of muscle tissue, representing a severe diffuse infiltration. In line with this imaging finding, patients showed 70% (P5) and 85% (P6) plasma cell infiltration in the bone marrow. P7 to P10 represent subtypes of intermediate diffuse infiltration, which would be neither classified as normal, focal, nor severe diffuse infiltration. P7 shows a pattern with very small nodular components in which most nodules have relatively similar size, and a relatively low contrast between the hypointense and hyperintense nodules. P8, in contrast, shows a pattern in which fewer nodules are present, and nodules are on average larger and show greater variation regarding size. In addition, in contrast to P7, at least some of the nodules are more hypointense (approximating signal intensity of muscle), and the pattern is characterized by a higher contrast between the hypointense nodules and the fatty bone marrow. P9 shows a case with a relatively homogeneous, intermediate T1w signal intensity, which lies in between the signal intensity in normal fatty bone marrow and severe diffuse infiltration. P10 shows a case in which homogeneous fatty plates and interspersed, confluent hypointense insulae are observed, in which the insulae are not as hypointense as muscle.
The differences between MRI morphologically normal-appearing BM, focal lesion pattern, and severe diffuse infiltration are unequivocal (Fig. 4, P1–P6). Besides those, there is a large group of patterns that might be classified as intermediate diffuse infiltration (Fig. 4, P7–10). These are clearly heterogeneous and therefore might represent different MM disease subtypes. However, the complexity of such patterns can hardly be reported in a structured, reproducible manner based on a visual assessment by radiologists. Detailed, objective analysis of these complex patterns and their systematic correlation with tumor tissue characteristics (assessed by BM biopsy at the iliac crest) are an optimal use case for machine learning–based image analysis approaches as deep learning and radiomics.
Automatic Prediction of Plasma Cell Infiltration
Four different models were trained and tested. Once, only data sets I to III of the internal data sets were used, whereas data set IV, which was acquired with an older scanner and had markedly lower image quality, was omitted. Second, all data sets I to IV from the internal data set were included. In both scenarios, one model was trained on radiomics features only, and an additional model was trained both on the radiomics features and the clinical features age and body mass index. The correlations between the predicted PCI values and the actual PCI values for the different prediction models are reported in Table 3. The model based on data set I–III using only radiomics features without clinical parameters showed the highest correlation coefficient between predicted PCI and actual PCI (r = 0.71, P < 0.001) on the internal data set. The correlation between predicted PCI and actual PCI on the external data sets was worse than in the internal data set. However, on all external data sets, the model trained on radiomics features from data set I–III without clinical parameters predicted PCI values which were significantly correlated to the actual PCI values (all P's ≤ 0.01), with correlation coefficients between 0.30 and 0.56. The model including additional clinical features performed quite similar to the model without clinical features, with the main difference that in the data set from center 2 with variable imaging quality, it performed somewhat better (r = 0.38 vs r = 0.30). Addition of data set IV to enhance the training data set did not markedly change the performance of the PCI prediction models, neither when using only radiomics features nor when additionally including the clinical features age and body mass index. As a benchmark for the interpretation of the correlation coefficients between predicted PCI and actual PCI, we investigated the correlation between PCI from histological and cytological assessment, and found the correlation coefficient to be 0.53 (P < 0.001).
TABLE 3 -
Accuracy of the Prediction of Plasma Cell Infiltration of Different Models
Test Set |
Model 1: Trained on Data Set I–III; Radiomics Features |
Model 2: Trained on Data Set I–III; Radiomics and Clinical Features* |
Model 3: Trained on Data Set I–IV; Radiomics Features |
Model 4: Trained on Data Set I–IV; Radiomics and Clinical Features* |
Internal |
|
|
|
|
r
|
0.71 [0.51, 0.83] |
0.66 [0.44, 0.80] |
0.56 [0.35, 0.71] |
0.56 [0.35, 0.72] |
p
|
<0.001 |
<0.001 |
<0.001 |
<0.001 |
Center 2, high-quality subset |
|
|
|
|
r
|
0.45 [0.12, 0.69] |
0.42 [0.08, 0.67] |
0.42 [0.09, 0.67] |
0.38 [0.04, 0.64] |
p
|
0.009 |
0.02 |
0.02 |
0.03 |
Center 2, other subset |
|
|
|
|
r
|
0.30 [0.07, 0.49] |
0.38 [0.13, 0.59] |
0.22 [−0.01, 0.43] |
0.39 [0.15, 0.60] |
p
|
0.01 |
0.004 |
0.06 |
0.003 |
Multicenter, test set |
|
|
|
|
r
|
0.57 [0.30, 0.76] |
0.58 [0.07, 0.85] |
0.45 [0.15, 0.68] |
0.58 [0.08, 0.85] |
p
|
<0.001 |
0.03 |
0.006 |
0.03 |
The correlation coefficient r and the P values from Pearson correlation are reported for each prediction model on each test set.
*Composition of test sets center 2, other and multicenter deviated from the other analyses, as body mass index was not available for all patients in these data sets.
Figure 5 visualizes the 15 most important radiomics features (according to the PCI prediction model based on the internal training data set using radiomics features only) for the internal training set and the internal test set and provides a quantitative analysis how each of the features is correlated to the PCI in each data set. When investigating visual patterns in the radiomics heat map of the internal training set, no general, continuous trend of each radiomics feature from patients with low PCI toward patients with high PCI can be observed. Rather, especially in patients with low to intermediate PCI, different patterns in the radiomics signatures can be found. The fact that quite heterogeneous quantitative imaging features are found in patients with low to intermediate PCI is very much in line with the observation that there are very heterogeneous visual patterns in the BM in such patients, as demonstrated in the exemplary cases in Figure 4. However, especially in patients with (very) high PCI, a rather distinct pattern in the radiomics heat map becomes apparent (Fig. 5). This comprises lower features values for features as “first-order numeric mode value”/“first-order histogram mode value,” “first-order numeric 30th percentile,” or “first-order numeric minimum,” representing a rather hypointense BM signal.
FIGURE 5: Radiomic heat map for internal training and internal test data sets. Panel A displays the radiomic heat map for the top 15 features of the PCI prediction model based on radiomics features only and trained data set I–III of the internal training data set. The patients are displayed as columns and are ordered by increasing plasma cell infiltration from left to right, with the dot plot on top visualizing the actual PCI of each patient. Panel B shows the same visualization for the internal test set. Panel C reports the names of the top 15 radiomics features and their correlation with PCI, both in the internal training and internal test set. Trends of radiomics features increasing/decreasing with rising PCI can be observed visually in the radiomic heat maps (A and B) and are confirmed by the quantitative correlation analysis (C). However, there is no gradual, continuous development of each feature from low to high PCI. Rather, in patients with low to intermediate PCI, very different radiomics signatures can be present. The fact that patients with low to intermediate PCI show a great heterogeneity in their quantitative radiomics profiles is very much in line with the fact that patients with low to intermediate PCI can show very heterogeneous bone marrow patterns in the MRIs, as illustrated in
Figure 4 P1, P2, and P7–P10. In patients with very high PCI (>60%), a distinct pattern can be observed in the radiomics heat map of both the training data set and the test data set, and this pattern is at least in part explainable: first-order features that strongly depend on the signal intensity show markedly reduced values, as for example the first order numeric minimum (RF #9), first-order numeric 30th percentile (RF #13), first-order numeric mode value (RF#14), or first-order histogram mode value (RF#15). This is well explainable by the fact that in such patients the bone marrow is known to decrease in signal intensity in T1w (compare Fig. 4, P5 and P6). PCI, plasma cell infiltration; RF, radiomics feature; FON, first order numeric; FOH, first-order histogram; RL, run length; Comb, combined; COBF, co-occurrence-based feature; SD, standard deviation.
Prediction of Cytogenetic Risk Status and Cytogenetic Aberrations
First, we investigated whether cytogenetic aberrations or cytogenetic high-risk status can be predicted by training individual models for this task. Three prediction models were trained to predict the presence or absence of a high-risk cytogenetic status according to 3 different established definitions (abbreviated HR-C def 1–3, as defined in the methods). Four prediction models were trained to predict the presence or absence of individual cytogenetic aberrations (Table 4). Although the models showed some discriminative ability in the internal test set with AUROCs ranging between 0.57 and 0.76, these models did not generalize to the multiple external test sets.
TABLE 4 -
Prediction of Cytogenetic Risk Status and Cytogenetic Aberrations
AUROC [95% Confidence Interval] |
Internal Test Set |
Center 2, High-Quality Test Set |
Center 2, Other Test Set |
Multicenter Test Set |
HR-C def 1 |
0.57 [0.25, 0.89] |
0.41 [0.14, 0.69] |
0.67 [0.51, 0.83] |
0.62 [0.41, 0.84] |
HR-C def 2 |
0.73 [0.46, 1.00] |
0.57 [0.33, 0.80] |
0.48 [0.33, 0.62] |
0.50 [0.30, 0.71] |
HR-C def 3 |
0.67 [0.39, 0.95] |
0.53 [0.29, 0.76] |
0.36 [0.20, 0.53] |
0.56 [0.36, 0.76] |
gain(1q) present |
0.66 [0.39, 0.94] |
0.61 [0.39, 0.83] |
0.37 [0.22, 0.51] |
0.45 [0.23, 0.67] |
del(13q) present |
0.65 [0.42, 0.89] |
0.85 [0.68, 1.00] |
0.53 [0.39, 0.68] |
0.30 [0.12, 0.48] |
del(17p) present |
0.63 [0.43, 0.83] |
0.43 [0, 0.97] |
0.49 [0.29, 0.70] |
0.49 [0.10, 0.88] |
t(4;14) present |
0.76 [0.39, 1.00] |
0.45 [0.11, 0.77] |
0.57 [0.37, 0.77] |
0.57 [0.39, 0.74] |
As only 1 patient had shown a t(14;16) in the training set, we did not try to train a machine learning model to predict the presence of a t(14;16) due to the imbalance in the training set.
HR-C def 1, high-risk cytogenetic status according to definition 1 [presence of any aberration of the following: del(17p), t(4;14), and t(14;16)]; HR-C def 2, presence of any aberration of the following: gain(1q), del(17p), t(4;14), and t(14;16); HR-C def 3: presence of any aberration of the following: gain(1q), del(13q), del(17p), t(4;14), and t(14;16).
Second, we investigated whether there is a connection between the predicted PCI and the presence of cytogenetic high-risk status/presence of cytogenetic aberrations (Fig. 6). All test sets were merged for this analysis. We found that patients with a cytogenetic high-risk status according to classification 1 showed a significantly higher predicted PCI than patients with cytogenetic standard risk status according to classification 1 (median predicted PCI 46% vs 38%, P = 0.01). Patients with a t(4;14) showed a significantly higher predicted PCI than patients without t(4;14) (median predicted PCI 47% vs 39%, P = 0.04). For the other cytogenetic risk classifications and cytogenetic aberrations, findings were not statistically significant in this data set.
FIGURE 6: Connection between predicted plasma cell infiltration and cytogenetic high-risk status/cytogenetic aberrations. HR-C def 1, high-risk cytogenetic status according to definition 1 [presence of any aberration of the following: del(17p), t(4;14), and t(14;16); HR-C def 2: presence of any aberration of the following: gain(1q), del(17p), t(4;14), and t(14;16); HR-C def 3: presence of any aberration of the following: gain(1q), del(13q), del(17p), t(4;14), and t(14;16)]. HR, high-risk; SR, standard risk; Y, yes, cytogenetic aberration present; N, no, cytogenetic aberration not present.
DISCUSSION
Monoclonal plasma cell disorders can present with a wide variety of complex patterns in the BM (Fig. 4). However, beyond the fact that patients with focal lesions have adverse outcomes,32–39 it is not well understood which exact subpattern might point to specific disease subtypes, as groups with certain genetic alterations or a certain outcome. We hypothesized that machine learning image analysis algorithms could be of use to overcome this gap. In the present study, we applied the recently presented concept for automatic, objective, quantitative BM profiling from wb-MRIs scans25 on a large data set to learn about associations between local imaging patterns and local tumor tissue characteristics, with the ultimate goal to predict local BM biopsy results noninvasively from MRI.
Automated Bone Marrow Segmentation
Establishing precise, automatic BM segmentation is an indispensable prerequisite to bring radiomics analysis for MM into clinical practice. Earlier approaches on automatic BM segmentation40–42 reported results that were markedly worse than the benchmark for manual segmentation set by interrater experiments. Recently, first BM segmentation algorithms were presented which allowed BM segmentation from T1w images25 and ADC-maps43 with a quality similar to manual segmentations by a radiologist, and performed relatively robust even in external multicentric test sets. In the current study, we trained a nnU-Net on 470 cases with a wide variety of pathologies and several different MRI protocols and scanners represented in the training data sets, to perform individual segmentation of the right and left hip bone from T1-w images. We found the algorithm to perform segmentations with very high quality, surpassing the benchmark set by an interrater experiment and performing very robustly even in cases with severe pathologies, including paramedullary lesions, and in multicentric data.
Prediction of Plasma Cell Infiltration
A connection between stages of diffuse infiltration severity in MRI and PCI,39,44–47 as well as between signal intensities/ADC-values and PCI,43,48,49 has been described. However, to the best of our knowledge, these have not yet been used to predict PCI from MRI. The models established in this study predicted PCI values that are significantly correlated (r between 0.66 and 0.71) to the actual PCI values on the internal data set. As expected and commonly observed in machine learning, the accuracy of the prediction models declined in the external test sets (r between 0.30 and 0.58). The best model showed a significant correlation between predicted and actual PCI values in all external test sets (all P's ≤ 0.01), demonstrating the external generalizability of the PCI prediction model.
Experts have recommended the addition of clinical features to the radiomics features to improve the predictive performance.50 As age and BMI are connected to the morphology of BM in MRI,51,52 we trained models that account for these factors. This did not markedly change the performance in our study. The addition of more training data that had been obtained with an older scanner and had lower imaging quality did not markedly change the prediction performance either. When interpreting the performance of the presented algorithms, it needs to be considered that the PCI value from biopsy itself has a certain level of uncertainty. It is well known that the biopsy which is taken in one position without image guidance is not necessarily representative for the tumor load of the whole patient.16 The PCI value also depends on the technique of the biopsy and its evaluation: Joshi and colleagues53 reported that the mean PCI was 13.1% from BM aspirates, while being 31.8% when assessed from trephine. In line with their study, we observed a correlation coefficient of 0.53 between histologically and cytologically assessed PCI in our data set. The fact that the result from the single-site biopsy is not necessarily representative and the moderate correlation between the PCI values obtained by different techniques when analyzing BM samples puts the r values of the predictions by our algorithms, ranging from 0.30 to 0.71, into perspective. Although it must be assumed that cases with nonrepresentative PCI values from biopsies in the training set have impeded our modeling, by using a high number of training cases, we assume that the influence of such outliers has been somewhat limited.
Prediction of Cytogenetic Aberrations
Few earlier works had reported connections between morphologic BM MRI patterns and cytogenetic aberrations/gene expression profiling data from biopsies at the iliac crest.44,47,54–56 The models established in this study to predict the cytogenetic results showed low to moderate discriminative ability in the internal test set. However, none of the models generalized to all 3 external test sets. Thereby, our results challenge the report from 1 earlier publication: in a radiomics study based on 89 patients without external test set, the authors had concluded that high-risk cytogenetic status at the iliac crest could be predicted by radiomics from spinal MRI with good performance.56
It had been reported that patients with (severe) diffuse infiltration in MRI were more likely to have genetic high-risk aberrations.44,47,55 As severe diffuse infiltration in MRI is connected to increased PCI,39,44–47,55 we investigated whether the predicted PCI is connected to presence of high-risk cytogenetic status or certain cytogenetic aberrations. Indeed, we found that patients with cytogenetic high-risk status according to classification 1 had a significantly higher predicted PCI than patients without cytogenetic high-risk status according to classification 1. Furthermore, patients with t(4;14) had a significantly higher predicted PCI than patients without t(4;14). The fact that the cytogenetic risk status/presence of t(4;14) was connected to the predicted PCI supports the hypothesis that certain MRI patterns are at least in part related to genetic properties of the tumor cells. However, in our large study with external test data, direct prediction of the cytogenetic risk status or presence of individual cytogenetic aberrations was not possible with reasonable accuracy.
Although our current work correlated the radiomics features from both hip bones with results from unguided BM biopsies at the posterior iliac crest, in the future, our approach for prediction of genetic results should be transferred to correlating radiomics features from specific focal lesions with targeted biopsies from the respective focal lesions, as connections between imaging findings of focal lesions and genetic properties of local clones have been described.18,57 However, this will only be possible once a sufficiently large quantity of targeted biopsies and correlating, high-quality MRIs is available to reasonably train and test machine learning algorithms.
Limitations
The algorithms in this study were trained and tested on several data sets, which had been acquired with heterogeneous scan parameters and scanners. Given that, in vivo, the reproducibility of radiomics features at other scanners is worse than their repeatability,26 this heterogeneity probably limited the accuracy of the predictions, which is also in line with our finding that the performance of the predictive models declined in the external data sets. Therefore, further standardization of MRI scanners and protocols, as currently ongoing,58 in general standardization of radiomics pipelines,29 and application of advanced data harmonization methods,59–61 should be pursued to improve the performance of our approach in the future. A further limitation is that when biopsy was performed before MRI, this had caused postbioptic BM changes and thereby influenced the images. However, these limitations had to be accepted to create a data set with a reasonable size to apply machine learning, compromising between quality and quantity of the data to train and test machine learning algorithms. The current model is based on T1w sequences only. We expect that an addition of information from additional MRI sequences, especially (semi)quantitative sequences such as diffusion weighted imaging or Dixon, which have proven high value in imaging of MM,49,62–64 will further improve the results.
CONCLUSIONS AND OUTLOOK
This study proves the feasibility of using machine learning algorithms to predict local BM PCI automatically from MRI, even in independent external test sets. Although 2 significant connections between radiomics profiles and cytogenetic risk status/cytogenetic aberrations were observed, cytogenetic risk status or aberrations could not be predicted with reasonable accuracy in independent test sets. Based on these findings, we do not indicate that the current algorithm should replace conventional BM biopsies, but we conclude that the predicted PCI from MRI might serve as an additional parameter to evaluate and monitor the tumor load. In contrast to invasive BM biopsies, which cause significant discomfort to patients and come with risks as bleeding, infection, or nerve injury, the predicted PCI can be assessed noninvasively, frequently, and is not prone to random sampling errors. Its assessment is fast, as only one T1w MRI block over the pelvis needs to be acquired, with an MRI scan time of less than 2 minutes. Beyond the direct application of the predicted PCI as additional tumor load parameter, on a more general level, our work proves that local radiomics signatures are linked to local tumor tissue characteristics in MM. This supports the further development of automated image analysis algorithms to analyze complex whole-body imaging data sets automatically and in depth. Given the established link between local radiomics signatures and local tumor tissue characteristics, such machine learning models have the potential to inform on local tumor tissue characteristics multifocally across the complete BM, and thereby capture the spatially heterogeneous tumor manifestations and complex biological patterns observed in MM. Enabling individual, detailed monitoring of all local processes is urgently needed in MM, given the recent insights about the spatial heterogeneity and the spatiotemporal evolution of the disease,18–20 as well as the recent evidence that functional whole-body imaging delivers complementary information to BM biopsies in minimal residual disease assessment of MM patients.65–67
ACKNOWLEDGMENTS
The authors thank the German-Speaking Myeloma Multicenter Group (GMMG) for the provision of data from the GMMG-HD7 trial (EudraCT: 2017-004768-37). They would also like to thank Prof Dr Yon-Dschun Ko from the Johanniter-Clinics Bonn and Dr Jörg-Thomas Bittenbring from the University Hospital Saarland for the provision of data from centers 4 and 7.
The authors would like to thank Dr Ekaterina Menis, Dr Oyunbileg von Stackelberg, and Richard Meier for their valuable administrative support.
REFERENCES
1. Kumar SK, Rajkumar V, Kyle RA, et al. Multiple myeloma.
Nat Rev Dis Prim. 2017;3:17046.
2. Waxman AJ, Mick R, Garfall AL, et al. Classifying ultra-high risk smoldering myeloma.
Leukemia. 2015;29:751–753.
3. Rajkumar SV, Larson D, Kyle RA. Diagnosis of smoldering multiple myeloma.
N Engl J Med. 2011;365:474–475.
4. Kastritis E, Terpos E, Moulopoulos L, et al. Extensive bone marrow infiltration and abnormal free light chain ratio identifies patients with asymptomatic myeloma at high risk for progression to symptomatic disease.
Leukemia. 2013;27:947–953.
5. Paiva B, Vidriales M-B, Pérez JJ, et al. Multiparameter flow cytometry quantification of bone marrow plasma cells at diagnosis provides more prognostic information than morphological assessment in myeloma patients.
Haematologica. 2009;94:1599–1602.
6. Chakraborty R, Muchtar E, Kumar SK, et al. Impact of pre-transplant bone marrow plasma cell percentage on post-transplant response and survival in newly diagnosed multiple myeloma.
Leuk Lymphoma. 2017;58:308–315.
7. Al Saleh AS, Parmar HV, Visram A, et al. Increased bone marrow plasma-cell percentage predicts outcomes in newly diagnosed multiple myeloma patients.
Clin Lymphoma Myeloma Leuk. 2020;20:596–601.
8. Neben K, Jauch A, Hielscher T, et al. Progression in smoldering myeloma is independently determined by the chromosomal abnormalities del(17p), t(4;14), gain 1q, hyperdiploidy, and tumor load.
J Clin Oncol. 2013;31:4325–4332.
9. Lakshman A, Rajkumar SV, Buadi FK, et al. Risk stratification of smoldering multiple myeloma incorporating revised IMWG diagnostic criteria.
Blood Cancer J. 2018;8:59.
10. Palumbo A, Avet-Loiseau H, Oliva S, et al. Revised international staging system for multiple myeloma: a report from International Myeloma Working Group.
J Clin Oncol. 2015;33:2863–2869.
11. Weinhold N, Salwender HJ, Cairns DA, et al. Chromosome 1q21 abnormalities refine outcome prediction in patients with multiple myeloma—a meta-analysis of 2,596 trial patients.
Haematologica. 2021;106:2754–2758.
12. Mateos M-V, Kumar S, Dimopoulos MA, et al. International Myeloma Working Group risk stratification model for smoldering multiple myeloma (SMM).
Blood Cancer J. 2020;10:102.
13. Sonneveld P, Avet-Loiseau H, Lonial S, et al. Treatment of multiple myeloma with high-risk cytogenetics: a consensus of the International Myeloma Working Group.
Blood. 2016;127:2955–2962.
14. Rajkumar SV, Dimopoulos MA, Palumbo A, et al. International Myeloma Working Group updated criteria for the diagnosis of multiple myeloma.
Lancet Oncol. 2014;15:e538–e548.
15. Kumar S, Paiva B, Anderson KC, et al. International Myeloma Working Group consensus criteria for response and minimal residual disease assessment in multiple myeloma.
Lancet Oncol. 2016;17:e328–e346.
16. Latifoltojar A, Boyd K, Riddell A, et al. Characterising spatial heterogeneity of multiple myeloma in high resolution by whole body magnetic resonance imaging: towards macro-phenotype driven patient management.
Magn Reson Imaging. 2021;75:60–64.
17. Hillengass J, Ellert E, Spira D, et al. Comparison of plasma cell infiltration in random samples of the bone marrow and osteolyses acquired by CT-guided biopsy in patients with symptomatic multiple myeloma.
J Clin Oncol. 2016;34(15_suppl):8040.
18. Rasche L, Chavan SS, Stephens OW, et al. Spatial genomic heterogeneity in multiple myeloma revealed by multi-region sequencing.
Nat Commun. 2017;8:268.
19. Merz M, Merz AMA, Wang J, et al. Deciphering spatial genomic heterogeneity at a single cell resolution in multiple myeloma.
Nat Commun. 2022;13:807.
20. Rasche L, Schinke C, Maura F, et al. The spatio-temporal evolution of multiple myeloma from baseline to relapse-refractory states.
Nat Commun. 2022;13:4517.
21. Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics: extracting more information from medical images using advanced feature analysis.
Eur J Cancer. 2012;48:441–446.
22. Aerts HJ, Velazquez ER, Leijenaar RT, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach.
Nat Commun. 2014;5:4006.
23. Aerts HJ. The potential of radiomic-based phenotyping in precision medicine: a review.
JAMA Oncol. 2016;2:1636–1642.
24. Isensee F, Jaeger PF, Kohl SAA, et al. nnU-net: a self-configuring method for deep learning-based biomedical image segmentation.
Nat Methods. 2021;18:203–211.
25. Wennmann M, Klein A, Bauer F, et al. Combining deep learning and radiomics for automated, objective, comprehensive bone marrow characterization from whole-body MRI.
Invest Radiol. 2022;57:752–763.
26. Wennmann M, Bauer F, Klein A, et al. In vivo repeatability and multiscanner reproducibility of MRI radiomics features in patients with monoclonal plasma cell disorders: a prospective bi-institutional study.
Invest Radiol. 2023;58:253–264.
27. Götz M, Nolden M, Maier-Hein K. MITK phenotyping: an open-source toolchain for image-based personalized medicine with radiomics.
Radiother Oncol. 2019;131:108–111.
28. Wennmann M, Thierjung H, Bauer F, et al. Repeatability and reproducibility of ADC measurements and MRI signal intensity measurements of bone marrow in monoclonal plasma cell disorders.
Invest Radiol. 2022;57:272–281.
29. Zwanenburg A, Vallieres M, Abdalah MA, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping.
Radiology. 2020;295:328–338.
30. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python.
J Mach Learn Res. 2011;12:2825–2830.
31. D'Agostino M, Cairns DA, Lahuerta JJ, et al. Second revision of the international staging system (R2-ISS) for overall survival in multiple myeloma: a European myeloma network (EMN) report within the HARMONY project.
J Clin Oncol. 2022;40:3406–3418.
32. Mai EK, Hielscher T, Kloth JK, et al. A magnetic resonance imaging-based prognostic scoring system to predict outcome in transplant-eligible patients with multiple myeloma.
Haematologica. 2015;100:818–825.
33. Walker R, Barlogie B, Haessler J, et al. Magnetic resonance imaging in multiple myeloma: diagnostic and clinical implications.
J Clin Oncol. 2007;25:1121–1128.
34. Kastritis E, Moulopoulos LA, Terpos E, et al. The prognostic importance of the presence of more than one focal lesion in spine MRI of patients with asymptomatic (smoldering) multiple myeloma.
Leukemia. 2014;28:2402–2403.
35. Dhodapkar MV, Sexton R, Waheed S, et al. Clinical, genomic, and imaging predictors of myeloma progression from asymptomatic monoclonal gammopathies (SWOG S0120).
Blood. 2014;123:78–85.
36. Hillengass J, Fechtner K, Weber M-A, et al. Prognostic significance of focal lesions in whole-body magnetic resonance imaging in patients with asymptomatic multiple myeloma.
J Clin Oncol. 2010;28:1606–1610.
37. Hillengass J, Weber MA, Kilk K, et al. Prognostic significance of whole-body MRI in patients with monoclonal gammopathy of undetermined significance.
Leukemia. 2014;28:174–178.
38. Merz M, Hielscher T, Wagner B, et al. Predictive value of longitudinal whole-body magnetic resonance imaging in patients with smoldering multiple myeloma.
Leukemia. 2014;2:1902–1908.
39. Lecouvet FE, Vande Berg BC, Michaux L, et al. Stage III multiple myeloma: clinical and prognostic value of spinal bone marrow MR imaging.
Radiology. 1998;209:653–660.
40. Almeida SD, Santinha J, Oliveira FPM, et al. Quantification of tumor burden in multiple myeloma by atlas-based semi-automatic segmentation of WB-DWI.
Cancer Imaging. 2020;20:6.
41. Arabi H, Zaidi H. Whole-body bone segmentation from MRI for PET/MRI attenuation correction using shape-based averaging.
Med Phys. 2016;43:5848–5861.
42. Lavdas I, Glocker B, Kamnitsas K, et al. Fully automatic, multiorgan segmentation in normal whole body magnetic resonance imaging (MRI), using classification forests (CFs), convolutional neural networks (CNNs), and a multi-atlas (MA) approach.
Med Phys. 2017;44:5210–5220.
43. Wennmann M, Neher P, Stanczyk N, et al. Deep learning for automatic bone marrow apparent diffusion coefficient measurements from whole-body magnetic resonance imaging in patients with multiple myeloma: a retrospective multicenter study.
Invest Radiol. 2023;58:273–282.
44. Mai EK, Hielscher T, Kloth JK, et al. Association between magnetic resonance imaging patterns and baseline disease features in multiple myeloma: analyzing surrogates of tumour mass and biology.
Eur Radiol. 2016;26:3939–3948.
45. Kloth JK, Hillengass J, Listl K, et al. Appearance of monoclonal plasma cell diseases in whole-body magnetic resonance imaging and correlation with parameters of disease activity.
Int J Cancer. 2014;135:2380–2386.
46. Moulopoulos LA, Gika D, Anagnostopoulos A, et al. Prognostic significance of magnetic resonance imaging of bone marrow in previously untreated patients with multiple myeloma.
Ann Oncol. 2005;16:1824–1828.
47. Messiou C, Porta N, Sharma B, et al. Prospective evaluation of whole-body MRI versus FDG PET/CT for lesion detection in participants with myeloma.
Radiol Imaging Cancer. 2021;3:e210048.
48. Dutoit JC, Vanderkerken MA, Anthonissen J, et al. The diagnostic value of SE MRI and DWI of the spine in patients with monoclonal gammopathy of undetermined significance, smouldering myeloma and multiple myeloma.
Eur Radiol. 2014;24:2754–2765.
49. Hillengass J, Bauerle T, Bartl R, et al. Diffusion-weighted imaging for non-invasive and quantitative monitoring of bone marrow infiltration in patients with monoclonal plasma cell disease: a comparative study with histology.
Br J Haematol. 2011;153:721–728.
50. Lambin P, Leijenaar RTH, Deist TM, et al. Radiomics: the bridge between medical imaging and personalized medicine.
Nat Rev Clin Oncol. 2017;14:749–762.
51. Poulton TB, Murphy WD, Duerk JL, et al. Bone marrow reconversion in adults who are smokers: MR imaging findings.
AJR Am J Roentgenol. 1993;161:1217–1221.
52. Lavdas I, Rockall AG, Castelli F, et al. Apparent diffusion coefficient of normal abdominal organs and bone marrow from whole-body DWI at 1.5 T: the effect of sex and age.
AJR Am J Roentgenol. 2015;205:242–250.
53. Joshi R, Horncastle D, Elderfield K, et al. Bone marrow trephine combined with immunohistochemistry is superior to bone marrow aspirate in follow-up of myeloma patients.
J Clin Pathol. 2008;61:213–216.
54. Waheed S, Mitchell A, Usmani S, et al. Standard and novel imaging methods for multiple myeloma: correlates with prognostic laboratory variables including gene expression profiling data.
Haematologica. 2013;98:71–78.
55. Moulopoulos LA, Dimopoulos MA, Kastritis E, et al. Diffuse pattern of bone marrow involvement on magnetic resonance imaging is associated with high risk cytogenetics and poor outcome in newly diagnosed, symptomatic patients with multiple myeloma: a single center experience on 228 patients.
Am J Hematol. 2012;87:861–864.
56. Liu J, Zeng P, Guo W, et al. Prediction of high-risk cytogenetic status in multiple myeloma based on magnetic resonance imaging: utility of radiomics and comparison of machine learning methods.
J Magn Reson Imaging. 2021;54:1303–1311.
57. Rasche L, Angtuaco E, McDonald JE, et al. Low expression of hexokinase-2 is associated with false-negative FDG-positron emission tomography in multiple myeloma.
Blood. 2017;130:30–34.
58. Rata M, Blackledge M, Scurr E, et al. Implementation of whole-body MRI (MY-RADS) within the OPTIMUM/MUKnine multi-centre clinical trial for patients with myeloma.
Insights Imaging. 2022;13:123.
59. Orlhac F, Frouin F, Nioche C, et al. Validation of a method to compensate multicenter effects affecting CT radiomics.
Radiology. 2019;291:53–59.
60. Leithner D, Nevin RB, Gibbs P, et al. ComBat harmonization for MRI radiomics: impact on nonbinary tissue classification by machine learning.
Invest Radiol. 2023. Online ahead of print. doi:10.1097/RLI.0000000000000970.
61. Gatidis S, Kart T, Fischer M, et al. Better together: data harmonization and cross-study analysis of abdominal MRI data from UK biobank and the German National Cohort.
Invest Radiol. 2022. Online ahead of print. doi:10.1097/RLI.0000000000000941.
62. Giles SL, Messiou C, Collins DJ, et al. Whole-body diffusion-weighted MR imaging for assessment of treatment response in myeloma.
Radiology. 2014;271:785–794.
63. Latifoltojar A, Hall-Craggs M, Bainbridge A, et al. Whole-body MRI quantitative biomarkers are associated significantly with treatment response in patients with newly diagnosed symptomatic multiple myeloma following bortezomib induction.
Eur Radiol. 2017;27:5325–5336.
64. Chiabai O, Van Nieuwenhove S, Vekemans M-C, et al. Whole-body MRI in oncology: can a single anatomic T2 Dixon sequence replace the combination of T1 and STIR sequences to detect skeletal metastasis and myeloma?
Eur Radiol. 2023;33:244–257.
65. Rasche L, Alapat D, Kumar M, et al. Combination of flow cytometry and functional imaging for monitoring of residual disease in myeloma.
Leukemia. 2019;33:1713–1722.
66. Böckle D, Tabares P, Zhou X, et al. Minimal residual disease and imaging-guided consolidation strategies in newly diagnosed and relapsed refractory multiple myeloma.
Br J Haematol. 2022;198:515–522.
67. Alonso R, Cedena MT, Gómez-Grande A, et al. Imaging and bone marrow assessments improve minimal residual disease prediction in multiple myeloma.
Am J Hematol. 2019;94:853–861.