We conducted a systematic analysis of factors (manual vs automated and unidimensional vs 3-dimensional size assessment, and impact of different target lesion selection) contributing to variability of response categorization in the Response Evaluation Criteria for Solid Tumors 1.1.
Patients and Methods
A total of 41 female patients (58.1 ± 13.2 years old) with metastatic breast cancer underwent contrast-enhanced thoracoabdominal computed tomography for initial staging and first follow-up after systemic chemotherapy. Data were independently interpreted by 3 radiologists with 5 to 9 years of experience. In addition, response was evaluated by a computer-assisted diagnosis system that allowed automated unidimensional and 3-dimensional assessment of target lesions.
Overall, between-reader agreement was moderate (κ = 0.53), with diverging response classification observed in 19 of 41 patients (46%). In 25 patients, readers had chosen the same, and in 16, readers had chosen different target lesions. Selection of the same target lesions was associated with a 76% rate of agreement (19/25) with regard to response classification; selection of different target lesions was associated with an 81% rate of disagreement (13/16) (P < 0.001). After dichotomizing response classes according to their therapeutic implication in progressive versus nonprogressive, disagreement was observed in 11 of 41 patients (27%) (κ = 0.57). In 9 of these 11 patients, readers had chosen different target lesions. Disagreement rates due to manual versus automated or unidimensional versus volumetric size measurements were less important (11/41 and 6/41; 27% and 15%, respectively).
A major source of variability is not the manual or unidimensional measurement, but the variable choice of target lesions between readers. Computer-assisted diagnosis–based analysis or tumor volumetry can help avoid variability due to manual or unidimensional measurements only but will not solve the problem of target lesion selection.