Tiss, Ali PhD*; Timms, John F. PhD†; Smith, Celia BSc*; Devetyarov, Dmitry PhD‡; Gentry-Maharaj, Aleksandra PhD†; Camuzeaux, Stephane BSc†; Burford, Brian PhD‡; Nouretdinov, Ilia PhD‡; Ford, Jeremy BSc†; Luo, Zhiyuan PhD‡; Jacobs, Ian MD†; Menon, Usha MD†; Gammerman, Alex PhD‡; Cramer, Rainer PhD*
Ovarian cancer is the leading cause of death from gynecologic malignancy in the western world, which is mainly attributable to its diagnosis at an advanced stage.1,2 This suggests that detecting ovarian cancer at an earlier stage may improve survival. Crucial to early detection is the identification of accurate biomarkers. Serum CA125 is the most extensively assessed biomarker for ovarian cancer with elevated levels of CA125 found in more than 90% of patients with advanced disease. However, CA125 has been shown to lack sensitivity (50%-60%) for early-stage disease detection,1,3-8 and its expression is not specific to malignant ovarian cancers.9 Indeed, CA125 can be elevated in women with benign gynecologic conditions such as ovarian cysts, endometriosis, and uterine fibroids as well as in other cancers (breast, bladder, pancreatic, liver, and lung).10 Efforts have therefore been made to identify additional biomarkers to complement CA125.
Over the last 2 decades, dozens of new biomarkers of ovarian carcinomas have been proposed, with combinations of these biomarkers with or without CA125 reported to significantly increase the accuracy in detecting ovarian cancer at both early and late stages.6,8,11-16 Some of the multiple marker panels performed the important benchmark value of more than 99.6% specificity that is required to achieve a positive predictive value of 10% (for an incidence rate of 40 per 100,000 women). However, this was accompanied by a decrease in sensitivity values to less than 60% for early-stage and less than 77% for late-stage cancer.16
In the context of ovarian cancer screening, CA125 interpreted using a risk of ovarian cancer algorithm has a high sensitivity and specificity for detecting primary invasive ovarian and tubal malignancies. For multimodal screening using annual CA125 screening with transvaginal ultrasound scan as a second-line test, the sensitivity for primary ovarian and tubal malignancies was 89.4% at a specificity of 99.8%.17 Although the performance of screening strategies has greatly improved in recent years, the need for additional screening modalities providing both high sensitivity and specificity remains. Likewise, the differential diagnosis of symptomatic patients would also benefit from improved and simpler tests. With this in mind, the objective of this study was to evaluate whether combinations of serum CA125 and mass spectrometry (MS) profiling data could enhance the identification of ovarian cancer patients from benign cases and healthy controls compared with the use of CA125 values alone.
MATERIALS AND METHODS
Subjects, Sample Collection, and Handling
The study was approved by the local ethics committee (MREC 05/Q0505/58), and written informed consent was obtained from all donors. Women were recruited to the UK Ovarian Cancer Population Study from 10 National Health Service Trusts across the United Kingdom. Patients were recruited at gynecologic oncology departments and healthy volunteers were recruited from women attending annual screening in the UK Collaborative Trial of Ovarian Cancer Screening.17,18 Supplemental Data 1 (see Supplemental Digital Content 1, http://links.lww.com/IGC/A26) provides details on the initial set of subjects, sample collection, transport, and storage. For the combined MS profiling and CA125 assay analyses, we used the same sample set as previously reported.19 After excluding samples with missing CA125 values and those from borderline ovarian cancer cases, the data from 321 women were used for identifying the best classification models when comparing malignant versus healthy and malignant versus benign. Sixty-seven samples were from individuals newly diagnosed with invasive epithelial ovarian cancer, 84 were from women diagnosed with benign ovarian neoplasm, and 170 were from age-matched healthy controls. For model generation and validation, samples were divided into 2 training and test sets (Table 1). Figure 1A shows CA125 assay values across the groups. For the extended CA125 analysis, CA125 serum levels were evaluated from 2719 women. Supplemental Data 2 (see Supplemental Digital Content 2, http://links.lww.com/IGC/A27) shows the division of this set into the 3 classes (malignant, benign, and healthy), International Federation of Gynecology and Obstetrics stage distribution and average and median age in each class and stage group.
Samples collected, processed, and frozen at the regional centers were transported on dry ice to the University College London laboratory and thawed. After thawing, samples were mixed by gentle inversion and CA125 analysis was performed using an electrochemiluminescence immunoassay on a Roche Elecsys 2010 analyzer (Roche Diagnostics, Burgess Hill, UK). The assay uses monoclonal antibodies OC125 as the detection antibody and M11 as the capture antibody (Fujirebio Diagnostics; Oxford Biosystems, Oxford, UK).
Matrix-Assisted Laser Desorption/Ionization Time-of-Flight MS-Based Profiling
Samples were processed and analyzed in 2 batches. Samples in each batch were randomized at the University College London laboratory, thawed and aliquoted into 96-well plates, then transported on dry ice to the BioCentre at the University of Reading and stored at −80°C. For MS serum polypeptide profiling, samples were prepared according to previously published methods19,20 and profiled using an Ultraflex II matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF)/TOF MS (Bruker Daltonics, Coventry, UK). Various spectral quality control criteria were implemented with adequate quality assurance for the entire sample preparation process, data collection, and analysis.19,20 Supplemental Data 3 (see Supplemental Digital Content 3, http://links.lww.com/IGC/A28) provides details on sample preparation, data acquisition, and preprocessing.
Data Processing and Classification
Raw spectral data were processed using algorithms developed in house.19 Data from the 2 batch analyses were combined with corresponding CA125 values and used to construct 2 training and blinded test sets for classification (Table 1). Prediction models were constructed for 2 types of discriminations independently, malignant versus healthy and malignant versus benign, and compared with simple classification using a CA125 cutoff value of 30 U/mL. Multiple models using the weighted k-nearest neighbors (kNN) algorithm, logical combinations of cutoff rules, cutoff rules for linear combinations, and support vector machines with various kernels were applied to subsets of peaks of certain cardinality (usually a small number). Cross validation was performed by randomizing sample labels in 1000 iterations and calculating P values (Monte-Carlo method) for the randomly permuted and correctly labeled samples. The models performing best on the training sets (all weighted kNN models) were then validated on the blinded test sets. For calculation of significance of improvement through addition of MS profiling data, a Monte-Carlo test was applied that measured the chance to get the accuracy greater than or equal to the best accuracy performed on the test set if peak intensities were reshuffled across samples at the given CA125 accuracy. The P value was calculated as the proportion of iterations with accuracy greater than or equal to the accuracy performed with the best model. For analysis of the extended sample set using CA125, various cutoff values were tested and receiver operating characteristic analysis was applied using GraphPad Prism v5.0 software (GraphPad Software, Inc., La Jolla, CA).
Peak Identification by MALDI-Quadrupole Time-of-Flight MS/MS
Peak identifications in this study were obtained by analyzing a pool of serum samples on a Premier Q-TOF MS (Waters, Manchester, UK) in the MALDI mode using a comparable MALDI sample preparation method that was only changed to account for the different types of MALDI target used.
Classification Performance Using MS Profiling Data and CA125 Values
As shown in our previous study,19 MALDI-TOF MS profiling was robust and reproducible with interassay coefficients of variance less than 15%. Nonetheless, MALDI MS profiling data alone had only limited diagnostic value for ovarian cancer, particularly when compared with recent reports using multibiomarker panels.6,8,11-17,21 Consequently, we have evaluated the performance of classification models derived from the combination of MS profiling and CA125 immunoassay data.
Our first analysis used randomly selected sets for training and testing as detailed in the upper part of Table 1. Modeling for the separation of women with invasive epithelial ovarian cancer from healthy controls revealed that weighted kNN algorithms performed best, with several models outperforming simple CA125 cutoff classification (at a 30-U/mL threshold) in the training set. Three models were chosen for validation in the blinded test set selecting each for highest sensitivity, specificity, and quality, respectively. One of these models performed better than CA125 alone in the test set with an accuracy of 100% (Table 2; upper part), although this improvement was not statistically significant (P = 0.24). Comparison of invasive epithelial ovarian cancer (malignant) and benign cases showed inferior performance on the basis of overall accuracy and quality in the test set, although several models outperformed the CA125 cutoff classification in the training set (Table 2; lower part).
This analysis demonstrates that a classification model using CA125 values alone using a cutoff level of 30 U/mL performs extremely well in this sample set. In this specific case, all malignant samples in the test set had CA125 values of more than 30 U/mL, making it impossible to improve on sensitivity. Similarly, only 1 healthy sample had a CA125 value of more than 30 U/mL, giving little space for improvement in specificity. As a consequence, we reshuffled the training and test sets according to 2 conditions. First, the ratio between the training and test set was set at approximately 2:1 for all classes. Second, in both sets, each class had the same ratio of samples more than and less than 30 U/mL of CA125 (lower part of Table 1). Modeling analysis using these new training and test sets showed that more models were now able to improve on specificity in comparison with CA125 alone. Nonetheless, the improvement in discriminating malignant from healthy controls was still limited to a maximum of 2 additional correctly classified healthy samples in the test set, whereas sensitivity could not be further improved. Likewise, for classification of malignant versus benign cases, there was an improvement in specificity for many models in the test set, but none matched the sensitivity when using CA125 alone, limiting overall accuracy (Table 3). Improvement in overall accuracy with the best model, from 76.9% (CA125 cutoff model) to 78.9% (5 of 10 model), was not significant (P = 0.72). Furthermore, neither of the discriminatory peaks (m/z 2755 and 2094) in this model were found in any of the best models obtained from the initial sample sets. These 2 peaks were identified as fragments of serum albumin (25-48; Swiss-Prot entry P02768) and fibrinogen α-chain (605-624; Swiss-Prot entry P02671). Only 1 peak (m/z 4787) from the models in Table 3 was also used in the best models from the first analysis (cf. Table 2).
Classification Performance Using Simple CA125 Cutoff Models
As a consequence of the good performance of CA125 cutoff classification, we further investigated an extended set of United Kingdom Ovarian Cancer Population Study samples looking at CA125 alone for classification. This extended set comprised 2236 healthy controls (median age, 64.31), 290 benign (median age, 57.96), and 193 invasive ovarian cancers (median age, 63.88), of which 48.2% were International Federation of Gynecology and Obstetrics stage I (n = 74) or stage II (n = 19; see Supplemental Data 2). In the comparison of malignant and healthy samples, using a 65-U/mL cutoff level, only 10 of 2236 healthy women had elevated CA125, giving a specificity of 99.6% (95% confidence interval [CI], 99.1%-99.8%) and a sensitivity of 83.9% (95% CI, 78.0%-88.8%). At a 30-U/mL cutoff level, a sensitivity of 94.8% (95% CI, 90.7%-97.5%) and a specificity of 96.6% (95% CI, 95.5%-97.3%) were obtained. For malignant versus benign, the 65-U/mL cutoff gave a specificity of 76.2% (95% CI, 70.9%-81.0%) at a sensitivity of 83.9%, whereas a 30-U/mL cutoff gave 53.4% specificity (95% CI, 47.5%-59.3%). The area under the receiver operating characteristic curve for this classification was 0.877 (P < 0.0001; 95% CI, 0.846-0.908; Fig. 2A). Analysis of early-stage cancer versus benign cancer revealed that at the 65-U/mL CA125 threshold, the sensitivity was 77.4% (95% CI, 67.6%-85.5%), and at 30 U/mL, the sensitivity was 92.5% (95% CI, 85.1%-96.9%). Because of the relatively low number of stage II samples (n = 19), these sensitivity values only changed marginally when only stage I samples (n = 74) were used; the area under the receiver operating characteristic curve for stage I versus benign cancer was 0.842 (P < 0.0001; 95% CI, 0.794-0.891; Fig. 2B), with sensitivity values of 54.1%, 33.8%, and 27.0% for specificity values of 90%, 95%, and 98%, respectively.
We have further evaluated our earlier reported MALDI MS profiling study by combining profiling data with preoperative CA125 serum levels. The rationale was to explore if this combination could improve in discriminating healthy women or those with benign masses from women with invasive epithelial ovarian cancer. Two different training and test sets were used, one using representative sampling with respect to CA125 value distribution more than and less than 30 U/mL. Although improvements in classification performance for discriminating healthy or benign samples from malignant samples were apparent in the training sets (compared with a standard 30 U/mL CA125 cutoff classification), only marginal and statistically insignificant improvement on performance was performed in the test sets. This is in keeping with our earlier observation that MS profiling alone is limited in its ability to discriminate malignant ovarian cancer samples from benign or healthy controls.19 However, the unexpectedly good performance of the CA125 immunoassay on its own made it virtually impossible to improve on performance.
We next used an extended set of more than 2700 samples to investigate further this better-than-expected CA125 performance. At a threshold of 65 U/mL CA125, only 10 of 2236 healthy controls were misclassified providing a specificity of 99.6%. At a 30-U/mL cutoff, the specificity was 96.6%, whereas the sensitivity for correctly identifying malignant samples was 94.8%. For early-stage disease (stages I and II), the sensitivity was still 92.5% at 30 U/mL and 90.3% at 35 U/mL (above-reported values). It is also noteworthy that our CA125 classification of early-stage cancer versus healthy performed as well as, or better than, classification models on the basis of multiple biomarkers.8,11,16 CA125 also showed improved accuracy for discriminating malignant versus benign cases compared with recent literature. For example, the pooled sensitivity of CA125 in a meta-analysis on diagnostic strategies for distinguishing adnexal masses was 78% at a threshold of 35 U/mL, with individual study sensitivities ranging from 45% to 100%.22 For stage I cases alone, comparison with a recent study12 showed that our sensitivity values are more than twice as high for 90% specificity (54.1%) and 95% specificity (33.8%) and more than thrice as high for 98% specificity (27%).
The high sensitivity of CA125 in this study may reflect the fact that samples were obtained from women referred to specialist gynecologic cancer centers who may in part have been referred on the basis of elevated CA125. This is in keeping with a recent report that overrepresentation of operative cases, especially from academic facilities, exaggerates the performance of CA125 with regard to sensitivity and positive predictive value.22 The good performance may also in part be explained by the exclusion of premenopausal women from our cohort because both sensitivity and specificity of CA125 are consistently higher in postmenopausal women. This is the rationale underlying restricting participation in ovarian cancer screening trials such as the UK Collaborative Trial of Ovarian Cancer Screening17,18 to only postmenopausal women. The definition of malignancy is another factor that can influence test accuracy. In this study, samples were restricted to those from cases of primary invasive epithelial cancer, the most common ovarian cancers and the main contributor to the high case fatality ratio associated with the disease. This further increased our accuracy as we excluded both nonepithelial ovarian malignancies and borderline or low malignant potential ovarian cancers, both of which are less likely to produce CA125. The staging of ovarian cancer and the CA125 immunoassay are other sources for potential bias, but both procedures are relatively standardized and, therefore, less likely to have contributed to the observed higher accuracies.
In conclusion, we report the unexpectedly good performance of simple serum CA125 threshold classification in discriminating healthy and benign from malignant samples for the detection of ovarian cancer. Compared with the data on CA125 assays published thus far, a substantially increased accuracy was obtained. Reasons for this increase include recruitment bias in the specialist gynecologic oncology centers participating in sample collection, restriction of the study to postmenopausal women, and restricting the definition of ovarian malignancy to primary invasive epithelial cancer. The performance characteristics of the CA125 immunoassay in our study highlight its dependence on the study population and the crucial need for authors to provide sufficient detail on relevant characteristics of study populations to allow comparisons.
The collection of serum samples and their subsequent handling followed a strict protocol designed for optimal proteomic profiling with the aim of minimizing postsampling difference because of proteolysis. However, the combination of CA125 with MS profiling data provided only marginal improvement. Unfortunately, because of the good performance of CA125 as a discriminatory biomarker, the benefit of MS profiling to provide additional classification power is difficult to judge. In this context, the additional benefit of MS profiling should be evaluated in combination with other biomarkers and/or using study groups in which sensitivity values can be improved upon. Here, the MS identification of proteins of low specificity (serum albumin and fibrinogen α-chain) as the source of potentially discriminatory peaks further supports a more careful approach to MALDI MS profiling for clinical diagnostics.
This work was supported by the Medical Research Council through grant nos. G0301107 and G0401619. Part of this work was undertaken at UCLH/UCL, which received a proportion of funding from the Department of Health's NIHR Biomedical Research Centres funding scheme.
1. Jacobs IJ, Menon U. Progress and challenges in screening for early detection of ovarian cancer. Mol Cell Proteomics
2. Schwartz PE. Current diagnosis and treatment modalities for ovarian cancer. Cancer Treat Res
3. Fritsche HA, Bast RC. CA 125 in ovarian cancer: advances and controversy. Clin Chem
4. Nossov V, Amneus M, Su F, et al. The early detection of ovarian cancer: from traditional methods to proteomics. Can we really do better than serum CA-125? Am J Obstet Gynecol
5. Jacobs I, Bast RC Jr. The CA 125 tumour-associated antigen: a review of the literature. Hum Reprod
6. Visintin I, Feng Z, Longton G, et al. Diagnostic markers for early detection of ovarian cancer. Clin Cancer Res
7. Skates SJ, Horick N, Yu Y, et al. Preoperative sensitivity and specificity for early-stage ovarian cancer when combining cancer antigen CA-125II, CA 15-3, CA 72-4, and macrophage colony-stimulating factor using mixtures of multivariate normal distributions. J Clin Oncol
8. Gorelik E, Landsittel DP, Marrangoni AM, et al. Multiplexed immunobead-based cytokine profiling for early detection of ovarian cancer. Cancer Epidemiol Biomarkers Prev
9. Kabawat SE, Bast RC Jr, Bhan AK, et al. Tissue distribution of a coelomic-epithelium-related antigen recognized by the monoclonal antibody OC125. Int J Gynecol Pathol
10. Sjovall K, Nilsson B, Einhorn N. The significance of serum CA 125 elevation in malignant and nonmalignant diseases. Gynecol Oncol
11. Zhang Z, Yu Y, Xu F, et al. Combining multiple serum tumor markers improves detection of stage I epithelial ovarian cancer. Gynecol Oncol
12. Moore RG, Brown AK, Miller MC, et al. The use of multiple novel tumor biomarkers for the detection of ovarian carcinoma in patients with a pelvic mass. Gynecol Oncol
13. Woolas RP, Xu FJ, Jacobs IJ, et al. Elevation of multiple serum markers in patients with stage I ovarian cancer. J Natl Cancer Inst
14. Mor G, Visintin I, Lai Y, et al. Serum protein markers for early detection of ovarian cancer. Proc Natl Acad Sci U S A
15. Kozak KR, Su F, Whitelegge JP, et al. Characterization of serum biomarkers for detection of early stage ovarian cancer. Proteomics
16. Havrilesky LJ, Whitehead CM, Rubatt JM, et al. Evaluation of biomarker panels for early stage ovarian cancer detection and monitoring for disease recurrence. Gynecol Oncol
17. Menon U, Gentry-Maharaj A, Hallett R, et al. Sensitivity and specificity of multimodal and ultrasound screening for ovarian cancer, and stage distribution of detected cancers: results of the prevalence screen of the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS). Lancet Oncol
18. Menon U, Gentry-Maharaj A, Ryan A, et al. Recruitment to multicentre trials-lessons from UKCTOCS: descriptive study. BMJ
19. Timms JF, Cramer R, Camuzeaux S, et al. Peptides generated ex vivo from serum proteins by tumor-specific exopeptidases are not useful biomarkers in ovarian cancer. Clin Chem
20. Tiss A, Smith C, Camuzeaux S, et al. Serum peptide profiling using MALDI mass spectrometry: avoiding the pitfalls of coated magnetic beads using well-established ZipTip technology. Proteomics
. 2007;7(suppl 1):77-89.
21. Menon U, Skates SJ, Lewis S, et al. Prospective study using the risk of ovarian cancer algorithm to screen for ovarian cancer. J Clin Oncol
22. Myers ER, Bastian LA, Havrilesky LJ, et al. Management of adnexal mass. Evid Rep Technol Assess (Full Rep)