Adnexal masses have a reported prevalence of 7–12% among asymptomatic women.1–3 The high prevalence of benign masses, low prevalence of ovarian cancer, and overlap between benign and malignant ultrasound characteristics help explain the lack of observed benefit of ovarian cancer screening.4,5 However, ultrasound detection of adnexal masses is common, often leading to concern regarding ovarian cancer and subsequent surgical removal or observation by serial ultrasonography.6,7 The challenge is to identify women at high risk so they can be directed to prompt surgical evaluation while avoiding unnecessary surgery and morbidity for women at low risk.8–12 Systems that standardize assessment of cancer risk have been adopted for other abnormal imaging findings: the Breast Imaging Reporting and Data System for mammography, the Fleischner system for lung nodules, and the Bosniak system for renal masses are examples.13–15 Although algorithms have been proposed to standardize the assessment of adnexal masses, none have been widely adopted, due in part to reliance of some algorithms on “expert” ultrasound interpretation, which is difficult to define and not consistently available in all settings.16–20
To address this problem, we created a system designed to standardize ovarian cancer risk assessment for adnexal masses on ultrasonography and be usable by radiologists with varying levels of ultrasound expertise. We previously described the development and implementation of the system.21 We report the ability of the system to differentiate ovarian cancer risk among average-risk women reported to have an adnexal mass in 2016.
MATERIALS AND METHODS
This community-based prospective cohort study identified women aged 18 years and older who underwent a nonobstetric pelvic or abdominal ultrasonogram that included assessment of the adnexa(e) between January 1 and December 31, 2016. Participants were members of Kaiser Permanente Northern California, a closed, integrated health care delivery system, including 21 hospitals and more than 8,000 physicians, that provides care for more than 4.1 million members annually. The members’ ethnic and racial diversity mirrors that of the geographic areas served, with 47% white, 7% black, 22% Hispanic, 20% Asian or Pacific Islander, and 3% multiracial. The study period was selected as the first full year after adoption of standardized ultrasound assessment and reporting of adnexal masses throughout the system. Approval for the study was obtained from the Kaiser Permanente Northern California institutional review board.
Women were excluded for the following factors that would be expected to affect a priori risk of ovarian cancer, assessed using electronic medical record and tumor registry data, if known at the time of index ultrasonography: history of ovarian cancer, history of bilateral salpingo-oophorectomy, or increased genetic risk for ovarian cancer based on deleterious mutation or genetic syndrome (as indicated by International Classification of Diseases, 10th Revision diagnosis codes Q85.8, Z14.8, Z15.01, Z15.09, Z31.5, Z15.09, Z80.0, Z84.81). To minimize the chance that the index ultrasonogram in 2016 represented a follow-up study for a previously detected mass, we excluded women who had an ultrasonogram in 2015 that reported an abnormal adnexal mass as well as women who were not health plan members for at least 12 consecutive preceding months or had a gap in coverage of greater than 3 months.
Patient age and race or ethnicity were obtained from membership databases. We assessed comorbidity using the Charlson comorbidity index22 and measured outpatient use of gynecology and other clinical departments for the 12 months before the index ultrasonogram. We modified the Charlson index to remove the condition “metastatic cancer” because this condition was also represented as a covariate as an ultrasound indication. Ever use of oral contraceptives and hormone therapy was determined using pharmacy dispensing data.
Ultrasound reports are submitted by radiology using a computer program that serves as a user interface linked to electronic medical records. In the organization, reports are read by approximately 300 radiologists, of whom 15–20% have completed fellowship training in either women's imaging or body imaging and 1% in pelvic ultrasonography specifically. The categorization scheme used in 2016 is shown in Table 1. Briefly, ultrasound studies with no abnormal adnexal findings could be reported as either “normal” or category 0. Category 0 also included small simple cysts and cysts with classic ultrasound features of a benign endometrioma, mature teratoma, or hydrosalpinx. Otherwise, all adnexal masses were required to be categorized as 1, 2, 3, or X based on standardized ultrasound criteria.
Ultrasound reports were identified using Current Procedural Terminology codes and abnormal adnexal findings identified using embedded unique hash tags linked to the template as well as text string searches for unique phrases with the keyword “category.” For women with multiple abnormal examination findings in 2016, the first abnormal examination finding was considered the index ultrasonogram.
To classify the indication for ultrasonography, we developed a natural language processing algorithm to query the unstructured history section of reports. A chart review of 250 women identified 13 possible indications and the following order was used to assign a “primary” indication if more than one was found: 1) pain, 2) mass, 3) bloating or ascites, 4) postmenopausal bleeding, 5) urinary symptoms, 6) evaluation of other cancer, 7) family history of cancer, 8) premenopausal bleeding, 9) follow-up other imaging, 10) follow-up fibroids, 11) contraception or infertility, 12) past ovarian cyst or benign ovarian tumor, and 13) other. The first author (E.S.-B.) validated the algorithm with a second set of 250 randomly selected women with 90% agreement with chart review.
For each patient, outcome data were collected from the date of index ultrasonogram until surgical evaluation of the adnexa(e), histopathologic or cytologic diagnosis of a malignant process involving the adnexa, discontinuation of health plan membership, death, or the end of the follow-up period (December 31, 2017). The primary outcome was the diagnosis of invasive cancer arising from the adnexa. Secondary outcomes included epithelial or stromal tumors considered to have low malignant potential and cancers metastatic to the adnexa(e). In this report, the term “ovarian cancer” refers to primary invasive cancers of either ovarian or fallopian tube origin; “borderline tumor” to ovarian epithelial as well as stromal tumors considered to have low malignant potential such as granulosa cell tumor; “secondary cancer” to cancers metastatic to the adnexa; and “peritoneal” cancers to cancers specifically diagnosed of primary peritoneal origin. Cancers diagnosed by cytology or biopsy only were classified as primary ovarian if results indicated adenocarcinoma of likely gynecologic origin and there was no evidence of another primary site. Malignant or borderline histopathology or cytology results were identified from pathology databases as well as data from the Kaiser Permanente Northern California Tumor Registry confirmed by chart review.
Age-standardized risks of ovarian cancer diagnosis were calculated using weights based on the 2000 U.S. standard population from the Census P25-1130 estimates. Age, race or ethnicity, body mass index (BMI, calculated as weight (kg)/[height (m)]2), ever use of oral contraceptive pills or hormone therapy history of hysterectomy or breast cancer, Charlson comorbidity index, and indication for ultrasonography were assessed in bivariate analyses using χ2 tests for categorical variables and the Student t test or Wilcoxon rank-sum test for continuous variables. Multivariable logistic regression was performed to evaluate the relationship between variables and cancer outcomes. The number of cancer and borderline tumor outcomes was adequate to support a fully saturated model that included all variables that had either clinical or statistical significance.
Analyses were performed using SAS 9.3. A P value of <.05 was used for statistical significance.
Among 57,812 women undergoing nonobstetric ultrasonography in 2016, 14,206 were excluded from analysis for one or more factors known at the time of the index ultrasonogram, the majority being excluded for having less than 12 months of continuous membership before ultrasonography (Fig. 1).
Among the 43,606 women remaining, normal or benign adnexal findings, indicated by “normal” or “category 0,” were reported for 36,768 women (84%); abnormal masses (category 1, 2, 3, or X) were reported for 6,838 women (16%). The characteristics of women with normal compared with abnormal ultrasound findings were similar except that women with adnexal masses more frequently had ultrasonography for the indication of “mass” (Table 2). Among the 6,838 women with abnormal masses reported, the distribution of categories assigned at the index ultrasonogram was 70% category 1, 21% category 2, 3.7% category 3, and 5.4% category X. During a median follow-up of 18 months (range 12–24 months), 89 women (1.3%) were diagnosed with ovarian cancer, 59 (0.9%) with borderline tumors, and 11 (0.16%) with secondary cancers (including two cases of peritoneal cancer), for an overall risk of either cancer or borderline tumor diagnosis of 2.3%. As expected, high-grade serous cancers were more common among women aged older than 50 years compared with younger women (33% vs 19%), whereas borderline tumors were relatively less common (24% vs 36%). Among the 36,768 women with ultrasonograms reported as having normal or benign adnexal findings, ovarian cancer diagnoses occurred for 38 women including 29 of 28,073 (0.1%) of women with “normal” reports and 9 of 8,695 (0.1%) women with “category 0” ultrasound reports.
Categories 1, 2, 3, and X were associated with increasing levels of risk of ovarian cancer diagnosis with a risk of 0.2% (95% CI 0.05–0.3%) for category 1, 1.3% (95% CI 0.7–1.9%) for category 2, 6.0% (95% CI 3.0–8.9%), for category 3, and 13.0% (95% CI 9.5–16.4%) for category X. Normal or category 0 studies, henceforth referred to together as category 0, were associated with a risk of 0.1% (95% CI 0.07–0.14%) (Table 3; Fig. 2A). Similar associations were observed in an analysis that included borderline tumors with a risk of 0.1% (95% CI 0.08–0.15%) for category 0 and risks of 0.4% (95% CI 0.2–0.6%), 2.3% (95% CI 1.6–3.1%), 10.4% (95% CI 6.6–14.1%), and 18.9% (95% CI 14.9–23.0%) for categories 1, 2, 3, and X, respectively (Table 3; Fig. 2B). Age-standardized risks were similar. Expressed in terms of numbers needed to examine, 967 (95% CI 705–1,367) women with category 0, 500 (95% CI 306–1,393) with category 1, 77 (95% CI 50–131) with category 2, 17 (95% CI 10–30) with category 3, and eight (95% CI 6–10) with category X reports would need to undergo evaluation to detect one case of invasive ovarian cancer within each respective category. To detect a case of either ovarian cancer or borderline tumor, evaluation would be needed for 875 (95% CI 648–1,215) women with category 0, 253 (95% CI 162–420) with category 1, 43 (95% CI 30–62) with category 2, 10 (95% CI 7–15) with category 3, and five (95% CI 4–7) with category X reports (Table 3).
To evaluate for effect of loss of follow-up, a sensitivity analysis restricted to 40,918 women who had reached at least 12 months of follow-up or the target clinical end point of surgery or cancer diagnosis was done yielding similar results.
Figure 3A and B shows the risk of cancer or borderline tumor stratified by age groups 18–39 years, 40–49 years, and 50 years and older. Although absolute risk was lower for younger women, category 3 and X masses were associated with higher risk, compared with category 1 and 2 masses, of both ovarian cancer diagnosis (Fig. 3A) as well as ovarian cancer or borderline tumor diagnosis (Fig. 3B) for women in all three age groups.
In the multivariable model, in addition to older age, all abnormal category scores (1, 2, 3, X) were associated with the combined outcome of either cancer or borderline outcome relative to category 0 with adjusted odds ratios ranging from 3.3 (95% CI 1.9–5.8, P<.001) for category 1 masses to 130.3 (95% CI 83.3–203.9, P<.001) for category X. Also associated were BMI higher than 30 and the ultrasound indication of “mass.” A significant association for “premenopausal bleeding” was attributable to cases of granulosa cell tumor and “evaluation of other cancer” to cases of dual primary endometrial cancer, but both had wide CIs as a result of low absolute numbers (Table 4).
Lack of standardization of how ultrasonographers assess and communicate ovarian cancer risk increases potential harms from both unnecessary surgery for benign abnormalities (overdiagnosis) and delays in diagnosis of cancer (underdiagnosis). Although generally accepted knowledge exists regarding the ultrasound characteristics associated with malignancy, which is reflected in the overlap between our system and other ultrasound-based risk assessment strategies,9,16–20 few if any prior studies have described applying that knowledge to a community-based population using structured reports. In our view, the ability to optimally care for women with adnexal masses suffers less from a lack of knowledge about the ultrasound characteristics associated with malignancy than a failure to systematically apply that knowledge to every woman’s case.
Our system enabled identification of a higher risk subset (9–10%) of women with category 3 or X masses who are more likely to benefit from surgical referral while identifying 70% of women as having category 1 masses, which were associated with a cancer risk similar to that of women with normal studies. The goal of management for these women should be avoidance of harm.
The International Ovarian Tumor Analysis group has previously described several algorithms for predicting malignancy using ultrasonography and clinical criteria.18–20 However, 25% of masses are found to be indeterminate requiring further evaluation by an expert ultrasonographer. Our strategy differs in that it differentiates all masses into risk groups based on assessments by general radiologists.
A major strength of our study is the community-based nature of the cohort. Studies evaluating other ultrasound-based algorithms have been limited to populations of women already planning to undergo surgery in referral-based settings.17,23–25 In contrast, the associations between our standardized reporting categories and subsequent cancer diagnoses are not enhanced by elevated prevalence of disease. Furthermore, we evaluated categories assigned at first detection of a mass. Because cancers typically change over time, risk assessments based on preoperative imaging do not necessarily inform initial management. Other strengths are the well-characterized nature of the cohort and minimal loss of follow-up. Given that early detection of borderline tumors is desirable but unlikely to affect survival,26,27 we assessed the association of our categories with cancers separate from borderlines. Finally, the demonstrated usability of our system by a large group of general radiologists, without additional training, and the racial and ethnic diversity of our population increase generalizability.
The system used in 2016 allowed category 0 to include masses judged to have classic features of an endometrioma, dermoid, or hydrosalpinx. The system was revised in 2017 to move these presumed benign masses from category 0 to category 1 (see Appendix 1, available online at http://links.lww.com/AOG/B172). However, despite their potential inclusion, the risk of cancer was similar for category 0 (0.1%) and “normal” studies (0.1%). Other considerations include the following: 1) although some patients may have had surgery at outside institutions, the closed nature of the health care setting reduces the likelihood of undetected surgical procedures. Cancer diagnoses made at outside institutions within California would still have been captured through the organization’s participation in the California Cancer Registry; 2) not all cancer outcomes may have been captured in the follow-up period (median 18 months, range 12–24 months). However, high-grade ovarian cancers, which are responsible for 80–90% of ovarian cancer deaths, are generally aggressive28,29 and would be expected to become clinically evident within short time intervals. Screening trial data support a time to diagnosis of less than 12 months for cancers diagnosed after abnormal ultrasound findings.30 Although we expect cancers that have not become apparent to be both rare and unlikely to be high grade, we are continuing to follow women who have not had surgery to evaluate long-term risk; 3) we are unable to apply consistent criteria to identify women who had an abnormal adnexal mass on ultrasonography before 2015. However, the number of women with a previously imaged malignant mass who did not have any imaging in 2015 and whose cancer diagnosis was delayed until 2016 is likely to be small; 4) we did not subject ultrasound categorization to expert review. Given the inherent subjectivity involved in ultrasound interpretation, expert review would likely disagree with the category assignments for some masses.31 Our report reflects the actual, not idealized, performance of a system in the hands of a large diverse group of radiologists.
In summary, in our large community-based population, our structured ultrasound reporting system differentiated adnexal masses into four categories associated with distinct levels of ovarian cancer risk. The system supports risk-based management and establishes a framework that enables ongoing quality assurance and data-driven improvement over time.
1. Greenlee RT, Kessel B, Williams CR, Riley TL, Ragard LR, Hartge P, et al. Prevalence, incidence, and natural history of simple ovarian cysts among women >55 years old in a large cancer screening trial. Am J Obstet Gynecol 2010;202:373.e1–9.
2. Valentin L, Skoog L, Epistein E. Frequency and type of adnexal lesions in autopsy material from postmenopausal women: ultrasound study with histological correlation. Ultrasound Obstet Gynecol 2003;22:284–9.
3. Dørum A, Blom GP, Ekerhovd E, Granberg S. Prevalence and histologic diagnosis of adnexal cysts in postmenopausal women: an autopsy study. Am J Obstet Gynecol 2005;192:48–54.
4. Buys SS, Partridge E, Black A, Johnson CC, Lamerato L, Isaacs C, et al. Effect of screening on ovarian cancer mortality: the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Randomized Controlled Trial. JAMA 2011;305:2295–303.
5. Reade CJ, Riva JJ, Busse JW, Goldsmith CH, Elit L. Risks and benefits of screening asymptomatic women for ovarian cancer: a systematic review and meta-analysis. Gynecol Oncol 2013;130:674–81.
6. Johnson PT, Horton KM, Megibow AJ, Jeffrey RB, Fishman EK. Common incidental findings on MDCT: survey of radiologist recommendations for patient management. J Am Coll Radiol 2011;8:762–7.
7. Solnik MJ, Alexander C. Ovarian incidentaloma. Best Pract Res Clin Endocrinol Metab 2012;26:105–16.
8. Luxman D, Bergman A, Sagi J, David MP. The postmenopausal adnexal mass: correlation between ultrasonic and pathologic findings. Obstet Gynecol 1991;77:726–8.
9. Levine D, Brown DL, Andreotti RF, Benacerraf B, Benson CB, Brewster WR, et al. Management of asymptomatic ovarian and other adnexal cysts imaged at US: Society of Radiologists in Ultrasound Consensus Conference Statement. Radiology 2010;256:943–54.
10. Suh-Burgmann E, Kinney W. Potential harms outweigh benefits of indefinite monitoring of stable adnexal masses. Am J Obstet Gynecol 2015;213:816.e1–4.
11. Evaluation and management of adnexal masses. Practice Bulletin No. 174. American College of Obstetricians and Gynecologists. Obstet Gynecol 2016;128:e210–26.
12. Im SS, Gordon AN, Buttin BM, Leath CA III, Gostout BS, Shah C, et al. Validation of referral guidelines for women with pelvic masses. Obstet Gynecol 2005;105:35–41.
13. Burnside ES, Sickles EA, Bassett LW, Rubin DL, Lee CH, Ikeda DM, et al. The ACR BI-RADS experience: learning from history. J Am Coll Radiol 2009;6:851–60.
14. MacMahon H, Naidich DP, Goo JM, Lee KS, Leung ANC, Mayo JR, et al. Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner Society 2017. Radiology 2017;284:228–43.
15. Bosniak MA. The Bosniak renal cyst classification: 25 years later. Radiology 2012;262:781–5.
16. Jacobs I, Oram D, Fairbanks J, Turner J, Frost C, Grudzinskas JG. A risk of malignancy index incorporating CA 125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer. Br J Obstet Gynaecol 1990;97:922–9.
17. Sayasneh A, Ferrara L, De Cock B, Saso S, Al-Memar M, Johnson S, et al. Evaluating the risk of ovarian cancer before surgery using the ADNEX model: a multicentre external validation study. Br J Cancer 2016;115:542–8.
18. Geomini P, Kruitwagen R, Bremer GL, Cnossen J, Mol BW. The accuracy of risk scores in predicting ovarian malignancy: a systematic review. Obstet Gynecol 2009;113:384–94.
19. Meys E, Rutten I, Kruitwagen R, Slangen B, Lambrechts S, Mertens H, et al. Simple rules, not so simple: the use of International Ovarian Tumor Analysis (IOTA) terminology and simple rules in inexperienced hands in a prospective multicenter cohort study. Ultraschall Med 2017;38:633–41.
20. Kaijser J, Bourne T, Valentin L, Sayasneh A, Van Holsbeke C, Vergote I, et al. Improving strategies for diagnosing ovarian cancer: a summary of the International Ovarian Tumor Analysis (IOTA) studies. Ultrasound Obstet Gynecol 2013;41:9–20.
21. Suh-Burgmann EJ, Flanagan T, Lee N, Osinski T, Sweet C, Lynch M, et al. Large-scale implementation of structured reporting of adnexal masses on ultrasound. J Am Coll Radiol 2018;15:755–61.
22. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 1987;40:373–83.
23. Meys EM, Rutten IJ, Kruitwagen RF, Slangen BF, Bergmans MG, Mertens HJ, et al. Investigating the performance and cost-effectiveness of the simple ultrasound-based rules compared to the risk of malignancy index in the diagnosis of ovarian cancer (SUBSONiC-study): protocol of a prospective multicenter cohort study in the Netherlands. BMC Cancer 2015;15:482.
24. Timmerman D, Van Calster B, Testa A, Savelli L, Fischerova D, Froyman W, et al. Predicting the risk of malignancy in adnexal masses based on the Simple Rules from the International Ovarian Tumor Analysis Group. Am J Obstet Gynecol 2016;214:424–37.
25. Amor F, Alcázar JL, Vaccaro H, León M, Iturra A. GI-RADS reporting system for ultrasound evaluation of adnexal masses in clinical practice: a prospective multicenter study. Ultrasound Obstet Gynecol 2011;38:450–5.
26. Suh-Burgmann E. Long-term outcomes following conservative surgery for borderline tumor of the ovary: a large population-based study. Gynecol Oncol 2006;103:841–7.
27. Vasconcelos I, de Sousa Mendes M. Conservative surgery in ovarian borderline tumours: a meta-analysis with emphasis on recurrence risk. Eur J Cancer 2015;51:620–31.
28. Kurman RJ. Origin and molecular pathogenesis of ovarian high-grade serous carcinoma. Ann Oncol 2013;24(suppl 10):x16–21.
29. Bowtell DD, Böhm S, Ahmed AA, Aspuria PJ, Bast RC Jr, Beral V, et al. Rethinking ovarian cancer II: reducing mortality from high-grade serous ovarian cancer. Nat Rev Cancer 2015;15:668–79.
30. Sharma A, Apostolidou S, Burnell M, Campbell S, Habib M, Gentry-Maharaj A, et al. Risk of epithelial ovarian cancer in asymptomatic women with ultrasound-detected ovarian masses: a prospective cohort study within the UK collaborative trial of ovarian cancer screening (UKCTOCS). Ultrasound Obstet Gynecol 2012;40:338–44.
31. Van Gorp T, Veldman J, Van Calster B, Cadron I, Leunen K, Amant F, et al. Subjective assessment by ultrasound is superior to the risk of malignancy index (RMI) or the risk of ovarian malignancy algorithm (ROMA) in discriminating benign from malignant adnexal masses. Eur J Cancer 2012;48:1649–56.