Armato, Samuel G. III PhD*; Nowak, Anna K. MD, PhD†‡; Francis, Roslyn J. MD, PhD†§; Kocherginsky, Masha PhD‖; Byrne, Michael J. MD†‡
Linear measurement of tumor diameter on computed tomography (CT) scans remains the standard clinical metric for the evaluation of tumor growth or response to therapy. The Response Evaluation Criteria in Solid Tumors (RECIST) guidelines1 specify a tumor measurement approach (a single unidimensional measurement of the tumor’s longest diameter) and a set of thresholds to convert numeric change in tumor measurements across temporally sequential CT scans into categories of tumor response (complete response, partial response, stable disease, and progressive disease). The modified RECIST guidelines2 changed the tumor measurement approach, specifically for mesothelioma, from longest tumor diameter to tumor thickness perpendicular to the chest wall (or mediastinum) to accommodate the unique morphology of this disease.
Also contained within the RECIST guidelines is the specification of “measurable disease” as a tumor with a minimum diameter of 10 mm, which, for geometric and CT partial-volume–effect considerations, is a threshold that represents twice the then-state-of-the-art 5-mm thickness of CT section images. Modified RECIST did not change this threshold, which has not been challenged in the intervening years, even as CT technology has evolved. RECIST was conceptualized under assumptions of spherical tumor morphology. A 10-mm-diameter (“just-measurable”) spherical tumor has a volume (i.e., tumor burden) of 523 mm3; however, one possible morphological representation of mesothelioma tumor with a “just-measurable” 10-mm in-plane thickness on a single 5-mm CT section encompasses a volume of 7672 mm3 (the equivalent of a 24.5-mm-diameter spherical tumor)3 (Fig. 1). Given that the anatomical extent of mesothelioma is rarely (if ever) constrained to a single CT section, the actual volume of a tumor with 10-mm in-plane thickness will likely be much greater than the equivalent of a 24.5-mm-diameter spherical tumor. Consequently, clinical trials that require “measurable disease” under RECIST as a criterion for enrollment may disadvantage subjects and the success of the trial through a greater baseline tumor burden. Following the rationale of RECIST that “measurable disease” should be defined as at least twice the thickness of CT section images, advances in CT technology may justify a revised definition, because state-of-the-art scanners are capable of section thicknesses less than 1 mm, and section thicknesses less than 3 mm have become more common.
Another factor that should be considered when defining “measurable disease,” however, is observer measurement variability, a concept alluded to in the RECIST guidelines.1,4 Measurements, to be a reliable quantitative tumor assessment metric on which patient management decisions are made and clinical trial efficacy is evaluated, must demonstrate an acceptable level of variability across the observers who acquire those measurements. The increase in measurement variability with decreased size of the object being measured is a well-known trend,5 which lends credibility to the notion that some minimum tumor size should be defined below which inherent measurement variability would limit the practical utility of the acquired measurements. Although variability in mesothelioma tumor thickness measurements has been reported previously,6 the impact of physical tumor characteristics on measurement variability has not been investigated.
The purpose of this study was to determine the dependence of mesothelioma tumor thickness measurement variability on tumor thickness, lesion morphology, and anatomical location, with the aim of informing a mesothelioma-specific definition of “measurable disease” and optimal measurement site selection.
MATERIALS AND METHODS
A retrospective database of 105 thoracic CT scans from 50 patients with mesothelioma was collected from Sir Charles Gairdner Hospital in Perth, Western Australia. Images were intentionally selected from heterogeneous time points throughout the disease course to obtain a range of tumor thickness and location. Scans had been performed on a GE Medical Systems (Milwaukee, WI) Hi Speed (n = 72), GE LightSpeed (n = 16), or Philips (Highland Heights, OH) Brilliance 64 (n = 17) CT scanner. Peak voltage was 120 kVp for all scans, pixel size ranged from 0.57 to 0.91 mm, and section thickness was 0.625 mm (n = 2), 1 mm (n = 1), 1.25 mm (n = 2), 2.5 mm (n = 1), 5 mm (n = 96), 7 mm (n = 2), or 10 mm (n = 1). All images had been reconstructed as 512 × 512-pixel images.
With approval from the local Human Research Ethics Committee, each scan was reviewed by a medical oncologist (AKN) (the “primary observer”), who used an in-house image visualization and manipulation software package (Abras, version 1.6) to identify 170 sites of mesothelioma tumor that represented a range of thicknesses, lesion morphologies, and anatomical locations across all scans. Through Abras, the primary observer identified a specific outer tumor margin point along the chest wall or mediastinal structures at each measurement site and created a line segment that spanned the tumor from the outer tumor margin point to an appropriate location along the inner tumor margin, in accordance with the modified RECIST tumor measurement approach.2 The primary observer then categorized local tumor morphology (concave rind, convex rind, convex mass, or fusiform mass) (Fig. 2) and anatomical location (chest wall, mediastinum, anterior angle, or posterior angle; upper, middle, or lower zone of the thorax in the craniocaudal direction according to uniformly specified boundaries; outer tumor margin point along bone or soft tissue; and laterality). It is important to note that these 170 measurement sites were not selected to capture foci of clinical relevance but rather to represent a range of tumor thicknesses and morphologies, with anatomical location a secondary consideration.
An observer study was conducted in which Abras was used to present each of five other physicians with the specific CT section and the same preselected fixed location of the outer tumor margin point at each of the 170 primary-observer–defined tumor measurement sites. Each observer independently used Abras to create at each measurement site a line segment that spanned the tumor from the annotated predefined outer margin point to an appropriate location along the inner tumor margin (Fig. 3); the length of each observer’s line segment became the tumor thickness measurement for that observer. This process was exactly the same as for the primary observer, except that the outer tumor margin point identified and recorded by the primary observer became the common fixed starting point for the measurements of the other observers; no data regarding lesion morphology or anatomical location were captured from these other observers.
Interobserver measurement variability was calculated as a function of mean tumor thickness measurement, lesion morphology, and anatomical location to identify the smallest thickness at which linear measurements could be made reliably. Comparisons were made with the RECIST tumor response criterion of 20% for progression. Relative differences among the tumor thickness measurements of observers were estimated using a random-effects analysis of variance model as previously described.6 The following model was fitted using PROC MIXED in the SAS software (SAS Institute Inc., Cary, NC): log (yijkm) = α + ai + bj(i) + ck(ij) + rm + εijkm, where yijkm is the length of each constructed line segment, and random effects are ai (the patient effect), bj(i) (the CT scan effect), ck(ij) (the measurement site effect), rm (the rater effect), and εijkm (the residual error). To estimate the relative differences for each morphology type, location, and size category, separate random-effects models were fitted within each relevant subgroup.
The process by which an observer obtains a measurement involves the angle that a constructed line segment makes with respect to the predefined outer margin point and the length of that line segment.6 Two observers might construct line segments that extend from the outer margin point in the same direction but with different interpretations of inner tumor margin location; alternatively, observers might interpret inner tumor margin similarly but construct line segments that extend in different directions. To capture this second effect, the angular spread of the line segments created by observers to capture tumor thickness measurements at each site was recorded; this analysis was possible because all line segments at a given measurement site shared a common fixed starting point. Summary statistics for the angular distribution data were computed.
Patient demographic and disease characteristics are summarized in Table 1. Table 2 presents the distribution of tumor measurement site morphology and location as recorded by the primary observer. Approximately half (42.9%) of the measurement sites were categorized as concave rind in local morphology; the remaining sites were more equally distributed across convex rind, convex mass, and fusiform mass morphologies. The majority (62.9%) of measurement sites were located along the chest wall. The middle craniocaudal zone of the thorax contained 77.1% of the measurement sites, and 63.5% of the sites were defined by outer tumor margin points placed along bone rather than soft tissue. The measurement sites were nearly equally divided between right and left hemithoraces. Again, the sites were not selected to capture foci of clinical relevance but rather to represent a range of tumor thicknesses and morphologies, with anatomical location a secondary consideration; accordingly, these data do not suggest generalizable distributions of tumor morphology or location in patients with mesothelioma, and any statistically significant differences that may exist among these data would not be relevant. These data do, however, indicate a clear preference for the primary observer to select tumor measurement sites along the chest wall and anchored by bone, and they indicate a tendency for the primary observer to avoid the placement of tumor measurement sites near the lung apices and lung bases, as recommended by the modified RECIST guidelines for mesothelioma.
Figure 4 presents a histogram of the per-site mean tumor thickness measurements. The mean tumor thickness measurement at each of the 170 measurement sites was computed as the arithmetic mean of each of the six observers’ tumor thickness measurements at that site. The average of the mean measurements across all 170 sites was 11.61 mm (range, 2.60–54.35 mm) with an SD of 8.19 mm. The median mean measurement across all sites was 9.68 mm (i.e., half of the 170 measurement sites were smaller than 9.68 mm), which reflects the study’s aim to investigate measurement variability as a function of tumor thickness given that the current minimum measurable tumor thickness has been set at 10 mm; specifically, 87 of the 170 measurement sites had a mean measurement less than 10.0 mm. Each of the six scatter plots of Figure 5 displays, for each of the 170 measurement sites, the relative difference between the measurement of one observer and the average across all other observers as a function of the mean measurement (of all six observers) at the site. Approximately 154 (90.6%) of the measurements from Observer 0 (the primary observer) are less than the per-site average measurement across all other observers, whereas 136 (80.0%) and 143 (84.1%) of the measurements from Observers 2 and 5, respectively, are greater than the per-site average measurement across all other observers.
Figure 6 presents a histogram of the range of tumor thickness measurements at each site as a percent of the mean length at that site (reflecting interobserver variability). The range of tumor thickness measurements at a site was defined as the difference between the largest and smallest of the six observers’ measurements divided by their average. The mean range across all 170 sites was 15.1% of the mean per-site measurement with an SD of 9.1%, and the median range as a percent of the mean per-site measurement across all sites was 13.1%.
Figure 7 presents the range distribution data from Figure 6 separated by local lesion morphology. The median range of measurements acquired for the convex mass lesion morphology was less than the 25th percentile range of the three other lesion morphologies although this finding could be explained by the fact that line segments associated with convex mass lesions were statistically significantly longer than those associated with the other lesion morphologies (Mann–Whitney U test). No statistically significant differences among the distributions of mean measurements were observed across the lesion anatomical location categories (Kruskal–Wallis test). Figure 8 presents the relative differences in tumor measurements (i.e., the data in Figure 5 aggregated across observers) as box-and-whiskers plots over mean tumor thickness categories. The interquartile range and whisker span initially decrease with increasing mean tumor thickness measurement and then become more consistent for measurements acquired at sites with mean tumor thickness measurement greater than 7.5 mm.
On the basis of the random-effects analysis of variance model, the 95% confidence interval for interobserver tumor thickness measurement differences was (−16.8%, 20.1%) across all measurements. The 95% confidence intervals for relative interobserver differences within the different lesion morphology, lesion location, and mean measurement length categories are presented in Table 3. The 95% confidence intervals are relatively constant across lesion morphology and measurement site location, except that the confidence interval is smaller for convex masses, which, as mentioned above, tend to have longer tumor thickness measurements. The 95% confidence interval for relative interobserver measurement differences obtained for mean measurement lengths in the range 5 to 7.5 mm was −14.6% to 17.1%; because the RECIST tumor response thresholds (−30%, +20%) fall outside this interval, it is therefore likely that, in tumors of this size or larger, assessments of response or progression are real and not simply due to interobserver variability.
The mean angular spread among observers’ measurement line segments was 10.4 ± 5.6 degrees (range, 3.4–37.0 degrees) across all measurement sites. Angular spread did not demonstrate a linear relation with mean tumor thickness at a measurement site (r2 = 0.012). When the relation between angular spread and mean tumor thickness was considered by lesion morphology (concave rind, convex rind, convex mass, or fusiform mass), anatomical location (chest wall, mediastinum, or anterior/posterior angle), or outer tumor margin point (bone or soft tissue), the r2 values ranged from 0.00015 to 0.21. Tumor thickness range as a percent of mean measurement at a site exhibited an r2 value of 0.011 when correlated with angular spread at the site.
Single time-point unidimensional tumor thickness measurements define measurable disease for clinical trial inclusion. The modified RECIST guidelines did not alter the recommendation of 10 mm for a minimally measurable lesion, but these guidelines also did not validate the use of this specific tumor thickness threshold in mesothelioma. Given that the notion of “minimally measurable lesion” derives, in part, from considerations of observer variability and the correspondingly greater precision of measurements that accompanies reduced variability, the present study quantified observer variability in the acquisition of modified RECIST–based linear measurements of mesothelioma tumor thickness in an effort to empirically determine a definition of measurable lesion unique to mesothelioma. The statistical findings of this study suggest that a reduction in minimally measurable lesion thickness from the 10 mm stipulated by RECIST for all tumors to a mesothelioma-specific value of 5 or 7.5 mm might be warranted.
The RECIST tumor response thresholds were not derived from interobserver variability considerations and were not intended to define acceptable limits of such variability (although the 20% threshold between “stable disease” and “progressive disease” was defined with an element of interobserver variability in mind). The RECIST thresholds, however, provide a practical, clinical context in which interobserver variability may be evaluated. If variability among observers who use a particular measurement technique to independently measure the same lesion on the same CT scan at the same time point is so large that the measurements acquired by these different observers, when compared, are likely to extend beyond the range of measurement differences categorized as “stable disease,” then that measurement technique would clearly lack clinical utility.
The findings of this study are directed at single time-point measurements on what would be considered a baseline CT scan, that is, the setting in which the concept of “minimally measurable lesion” applies. Once a lesion measurement site is established on the baseline scan, that site is meant to be measured on all subsequent scans. Furthermore, previous results demonstrate that interobserver variability in the acquisition of mesothelioma measurements on follow-up scans is lower than for baseline scan measurements, because observers acquiring measurements on a follow-up scan, in practice, will visualize the baseline scan measurements7; the bias introduced by this visual reference necessarily reduces expected interobserver variability in the follow-up scan setting.
The underlying notion of modified RECIST is that linear measurements can be used as a surrogate for tumor burden; the validity of this approach was not challenged by the present study. Although the use of area-based measurements for mesothelioma response assessment has been shown to suffer from exaggerated levels of observer variability,7 the investigation of volumetric measurements has gained increased attention with promising results.8–11 The practical utility of volumetric measurements, however, remains an open topic of investigation. Linear measurements for response assessment remain simple, inexpensive, and widely used.
The consistent bias to longer or shorter measurements by some observers reinforces the recommendation that measurements should be performed by the same observer at all time points during a clinical trial. However, although interobserver variability may be ameliorated by using the same observer for sequential measurements, the question of a minimum reproducibly measurable tumor in mesothelioma may assume additional importance in a revision of mesothelioma staging. The International Association for the Study of Lung Cancer is currently collecting prospective data to inform potential revision of the staging of pleural mesothelioma. Exploratory information that is currently being collected includes the maximum tumor thickness at three craniocaudal sites on the axial CT scan. It will be critical to understand the reproducibility of tumor thickness measurements among observers before any categorization and analysis for staging purposes.
Importantly, this study removed many sources of variability by preselecting and electronically presenting the parietal anchor for the measurement site to the observer. Even under these controlled conditions and with the ability to magnify the measurement window, interobserver variability remained high at smaller tumor thicknesses. This finding further validates the practical need for some defined tumor thickness threshold.
This study has important implications for the definition of “minimally measurable lesion” adopted by mesothelioma clinical trials. The findings suggest that a reduction in minimally measurable lesion thickness from 10 mm to 5 or 7.5 mm might be warranted; however, real-world sources of variability that were not present in this study, including differences in patient positioning, differences in imaging technical parameters, and the acquisition of measurements by different readers across the temporal sequence of patient scans, may impose a larger margin to this minimum. Ultimately, clinical factors may dictate the most appropriate definition of “minimally measurable lesion” in the mesothelioma setting.
The authors thank Adam Starkey for development of and support with the Abras software package. The authors also thank the assistance of Drs. Michael Millward, Jeanette Soon, and Arman Hasani in performing tumor measurements.
Supported, in part, by the Raine Medical Research Foundation and the Cancer Council Western Australia.
1. Therasse P, Arbuck SG, Eisenhauer EA, et al. New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J Natl Cancer Inst. 2000;92:205–216
2. Byrne MJ, Nowak AK. Modified RECIST criteria for assessment of response in malignant pleural mesothelioma. Ann Oncol. 2004;15:257–260
3. Oxnard GR, Armato SG 3rd, Kindler HL. Modeling of mesothelioma growth demonstrates weaknesses of current response criteria. Lung Cancer. 2006;52:141–148
4. Ford R, Schwartz L, Dancey J, et al. Lessons learned from independent central review. Eur J Cancer. 2009;45:268–274
5. Gavrielides MA, Kinnard LM, Myers KJ, Petrick N. Noncalcified lung nodules: volumetric assessment with thoracic CT. Radiology. 2009;251:26–37
6. Armato SG 3rd, Oxnard GR, MacMahon H, et al. Measurement of mesothelioma on thoracic CT scans: a comparison of manual and computer-assisted techniques. Med Phys. 2004;31:1105–1115
7. Labby ZE, Straus C, Caligiuri P, et al. Variability of tumor area measurements for response assessment in malignant pleural mesothelioma. Med Phys. 2013;40:081916
8. Frauenfelder T, Tutic M, Weder W, et al. Volumetry: an alternative to assess therapy response for malignant pleural mesothelioma? Eur Respir J. 2011;38:162–168
9. Labby ZE, Nowak AK, Dignam JJ, Straus C, Kindler HL, Armato SG 3rd. Disease volumes as a marker for patient response in malignant pleural mesothelioma. Ann Oncol. 2013;24:999–1005
10. Liu F, Zhao B, Krug LM, et al. Assessment of therapy responses and prediction of survival in malignant pleural mesothelioma through computer-aided volumetric measurement on computed tomography scans. J Thorac Oncol. 2010;5:879–884
11. Sensakovic WF, Armato SG 3rd, Straus C, et al. Computerized segmentation and measurement of malignant pleural mesothelioma. Med Phys. 2011;38:238–244
Malignant pleural mesothelioma; Response assessment; Staging; Thoracic computed tomography; Interobserver variability
Copyright © 2014 by the European Lung Cancer Conference and the International Association for the Study of Lung Cancer.