CPR Results and p16 IHC Status
The associations between the p16 IHC status (positive, negative) and the 2 CPR reference diagnoses (CPRH&E and CPRH&E+p16) are shown in Table 4. p16 IHC positivity rates increased with higher CIN grades (P<0.0001). Using CPRH&E+p16 (followed by CPRH&E in parentheses) as the reference diagnosis and the 1080 of 1100 cases (98.2%) with p16 IHC results available, 1.1% (7.5%) of No-CIN, 50.6% (58.3%) of CIN1, 100% (94.5%) of CIN2, and 100% (98.6%) of CIN3 cases were p16 IHC positive, as was the single invasive carcinoma case in the study.
CPRH&E+p16 showed a higher number of CIN diagnoses across all 3 CIN grades as compared with CPRH&E: the number of cases with a CPR reference diagnosis of CIN1 increased from 163 (CPRH&E) to 170 (CPRH&E+p16), from 91 to 137 for CIN2, and from 70 to 77 for CIN3, respectively, whereas the number of cases diagnosed as No CIN decreased from 755 to 696 (Table 4).
Agreement of ISPs Reading Results With Reference Diagnoses
Seventy ISP from different geographic regions within the United States collectively performed a total of 19,250 slide readings per method, with 17 or 18 pathologists each reading 1 of the 4 reading subsets of 275 cases. Table 5 shows the frequency distribution of the ISPs’ diagnoses using H&E (ISPH&E) and using H&E+p16 (ISPH&E+p16), dichotomized at the CIN2+ (ie, CIN2, CIN3, ACIS, invasive carcinoma)/CIN1− (ie, No CIN, CIN1) threshold, for both reference diagnoses CPRH&E and CPRH&E+p16. The same analysis but dichotomized at the CIN3+ (ie, CIN3, ACIS, invasive carcinoma)/CIN2− (ie, No CIN, CIN1, CIN2) threshold is provided as Supplemental Table S1 (Supplemental Digital Content 1, http://links.lww.com/PAS/A617).
The number of ISPs’ diagnoses in exact agreement with the respective CPRH&E reference diagnoses of No CIN, CIN1, CIN2, CIN3, and ACIS significantly increased when H&E+p16 was used (ISPH&E+p16) compared with H&E only (ISPH&E; difference between ISPH&E+p16 and ISPH&E of 2.8%; 95% CI, 1.8, 3.7; P<0.0001) (Supplemental Table S2, Supplemental Digital Content 2, http://links.lww.com/PAS/A618). These ISP agreement rates on the exact diagnostic categories more than doubled when using the CPRH&E+p16 as the reference diagnoses (difference of 6.1%; 95% CI, 5.1, 7.0; P<0.0001) (Supplemental Table S3, Supplemental Digital Content 3, http://links.lww.com/PAS/A619).
Agreement rates of ISPs’ diagnoses using H&E only versus H&E+p16 with both CPR-derived reference diagnoses (CPRH&E and CPRH&E+p16) dichotomized at the CIN2+ threshold were calculated for overall agreement (OPA) across all cases, as well as for agreement with CIN2+ cases (PPA) and CIN1− cases (NPA). As shown in Table 6, all agreement rates (overall, positive, and negative) significantly improved when ISPs used H&E+p16-stained slides. Using CPRH&E as the reference diagnosis, the improvement was 2.2% (95% CI, 1.3, 3.0; P<0.0001) for OPA, 6.8% (95% CI, 4.7, 9.0; P<0.0001) for PPA, and 1.3% (95% CI, 0.5, 2.3; P=0.0032) for NPA. These differences in agreement rates between ISPs’ diagnoses for the 2 methods were substantially higher when using the CPR-derived diagnoses using H&E+p16-stained slides (CPRH&E+p16) as the reference (Table 6): the improvement for OPA was 4.7% (95% CI, 3.9, 5.4; P<0.0001), for PPA was 11.5% (95% CI, 9.3, 13.5; P<0.0001), and for NPA was 3.0% (95% CI, 2.2, 3.7; P<0.0001).
The effect of adjunctive p16 IHC use on the change in each ISP’s performance using H&E only versus H&E+p16 relative to CPRH&E+p16 is shown graphically in Figure 2: the graph shows a plot of PPA (sensitivity) against 1-NPA (1-“specificity”) for each of the 70 ISPs using H&E only (blue circles) and using H&E+p16 (red circles). Agreement rates shown in this figure (PPA; 1-NPA) are for the dichotomous comparison at CIN2+ versus CIN1− threshold. Supplemental Figure S1 (Supplemental Digital Content 4, http://links.lww.com/PAS/A620) shows the same plotted graph for ISP performance on H&E versus H&E+p16 using the CPRH&E reference diagnosis.
Additional analyses were performed to assess the effect of adjunctive p16 IHC on reader-averaged PLR and NLR, respectively, and PPV and NPV, respectively. As shown in Supplemental Table S4 (Supplemental Digital Content 5, http://links.lww.com/PAS/A621), all 4 measures improved from ISPs’ reading H&E only to H&E-stained plus p16-stained slides, and all improvements were highly statistically significant.
ISP inter-rater reliability using H&E only and H&E+p16 was evaluated for each ISP reader cohort (ie, 17 or 18 pathologists, reading the same cases), and for all ISP readers pooled. The Fleiss kappa coefficient significantly improved from 0.58 (95% CI, 0.57, 0.59) for H&E only (moderate agreement) to 0.73 (96% CI, 0.72, 0.74; P<0.0001) for H&E+p16-stained slides (substantial agreement) pooled across all ISP readers.
CERTAIN is a large US-based study designed to evaluate the adjunctive use of p16 IHC when interpreting cervical biopsies. Our results confirm previous findings that the adjunctive use of p16 IHC significantly increases diagnostic agreement with an adjudicated consensus diagnosis (CPR) as a reference.7–9 CERTAIN included 70 US-based ISPs who provided a total of 19,250 diagnoses per method, using either H&E slides or H&E+p16-stained slides from 1100 cervical biopsies. OPA, PPA, and NPA (equivalent to diagnostic accuracy, sensitivity, and specificity, respectively, when assuming the reference diagnoses as clinical truth) were all significantly increased for ISPH&E+p16 compared with ISPH&E using either a CPR reference diagnosis of CPRH&E or CPRH&E+p16. Although NPA (specificity) showed a statistically significant improvement (difference of 3.0% when using CPRH&E+p16), the biggest difference was observed for PPA (sensitivity) for CIN2+ (difference of 11.5% when using CPRH&E+p16; P<0.0001). This corresponds to a 43.1% reduction of false-negative HSIL (CIN2, CIN3) results using CPRH&E+p16. An almost identical reduction in false-negative results for CIN2+ was observed in a similarly designed large European study (see discussion below): the false-negative rate for CIN2+ dropped by 43.4% when comparing H&E+p16 to H&E only. In the current study, similar sensitivity gains were observed using CIN3+ as the diagnostic threshold. This improvement in the correct identification of CIN3 lesions has considerable clinical relevance, as CIN3 lesions are considered direct precursors to invasive squamous cervical carcinoma.
Concerns have been raised that the routine use of p16 IHC may result in an increase of CIN2+ diagnoses which could lead to an increase in the number of women undergoing treatment.12 However, in CERTAIN the absolute numbers of ISP CIN2+ diagnoses did not increase when adjunctive p16 was used. There were 3996 ISP CIN2+ diagnoses using H&E compared with 3982 using H&E+p16 (Tables 5A, B). In addition, regardless of which CPR reference diagnosis was used, the number of ISP diagnoses matching the CPR reference diagnoses at the CIN2+/CIN1− threshold increased using H&E+p16 compared with H&E only (Tables 5A, B). Just as important, the number of ISP diagnoses of CIN2+ on cases where the CPR reference diagnosis was CIN1− decreased, regardless of which reference diagnosis was used. Of note, whereas the total number of ISP CIN2+ diagnoses was essentially the same using H&E and H&E+p16, the number of ISP CIN3+ diagnoses was ∼12% higher when H&E+p16 was used compared with H&E alone (Supplemental Table S1, Supplemental Digital Content 1, http://links.lww.com/PAS/A617). However, approximately half (96/178) of the additional ISPH&E+p16 CIN3+ diagnoses were classified as CIN3+ by the CPRH&E and all except 7 were classified as CIN3+ by the CPRH&E+p16. These are important findings, as they show that the use of p16 as a diagnostic adjunct to the interpretation of H&E-stained cervical biopsies does not lead to an increase in the overall number of women that may receive treatment using current US treatment guidelines.18 Nevertheless, the authors believe that it is important to remind pathologists that:
- a positive p16 IHC result does not necessarily indicate the presence of a HSIL (CIN2, CIN3) lesion; instead, a diffuse p16-staining pattern in cervical tissue specimens suggests the presence of a transforming HPV infection. Of note, in this study, 50.6% of biopsies classified as LSIL (CIN1) using CPRH&E+p16 and 58.3% using CPRH&E were p16 positive, a rate similar to what was observed in previous studies.7,8 LSIL (CIN1) lesions may show a diffuse p16-staining pattern (considered p16 IHC positive) (Figs. 1D, F), or a focal staining pattern (Fig. 1H), or no immunoreactivity (Fig. 1J), the latter 2 being considered p16 IHC negative;
- the histologic grading of squamous intraepithelial lesions into LSIL (CIN1), HSIL (CIN2), and HSIL (CIN3) subsets has to be performed based on the interpretation of usual morphologic features on the H&E-stained slide(s) used together with the respective p16-stained slide(s);
- the extent of diffuse p16 staining within the squamous epithelium does not necessarily correlate with the grade of the lesion: for example, CIN1 lesions may show one third, one half to two thirds (Fig. 1F), or even full thickness (Fig. 1D) p16 staining within the squamous tissue.
Similar results to those found in CERTAIN were previously reported by a European study that evaluated the adjunctive p16 IHC use on all cervical biopsy specimens.7 The European study involved 12 community pathologists from 4 different European countries and 500 cervical biopsies.7 As in CERTAIN, the diagnoses of the community pathologists were compared with adjudicated consensus diagnoses established by 3 European expert gynecologic pathologists on H&E-stained slides. However, in the European study the expert gynecologic pathologists did not establish a second consensus diagnosis using H&E+p16. In the European study a sensitivity increase of 10% (from 77% for H&E to 87% for H&E+p16) was associated with a loss in specificity of 1.0% (from 89% for H&E to 88% for H&E+p16). Furthermore, almost identical observations were made in both CERTAIN and the European study when analyzing the effect of adjunctive p16 IHC use on the agreement between the community pathologists. Kappa values as a measure of reader agreement corrected by chance showed moderate agreement between pathologists for the interpretation of H&E-stained slides (κ=0.58 in CERTAIN study, κ=0.57 in European study), and a significant increase to substantial agreement for the interpretation of H&E+p16-stained slides (κ=0.73 in CERTAIN study and κ=0.75 in European study; P=0.0001). Thus, besides the improved agreement of surgical pathologists with expert gynecologic pathologists derived consensus diagnoses of HSIL (CIN2, CIN3) (ie, improved “accuracy”), the adjunctive use of p16 IHC also consistently improves the agreement between readers (ie, improved precision). Similar results have been observed in several other studies evaluating the effect of adjunctive p16 IHC on inter-rater reliability.19–22
It is worth noting that the data described in this phase of CERTAIN characterize the maximal potential impact of adjunctive p16 IHC testing on patient outcomes in that all biopsies had p16 staining. And certainly, the data show that individual pathologists vary in their p16 interpretations and conjunctive interpretations. Yet as already noted interpretive variability is reduced substantially compared with H&E alone. One of the major concerns during the development of the LAST recommendations was for potential p16 IHC overuse. The impact of restricting p16 usage within CERTAIN will be detailed in a subsequent manuscript (Wright et al, in preparation). Given the perception of fairly wide adoption of LAST, the data to come will provide insight on how the ISP group makes choices to use p16, and the impact of restricting p16 usage on the sensitivity and specificity and reproducibility of cervical biopsy diagnosis. Thus, the 2 data sets will provide a sense of the boundaries on the benefits of p16 IHC in accurately identifying patients who may need treatment versus the potential harms of overdiagnosis. However, as shown in the present analysis, even if every cervical biopsy was stained adjunctively with p16 the outcome does not lead to excess treatment, while it maximizes the identification of cases identified as CIN3 by CPR. The cost implications of any expanded usage relative to current practice in our opinion are secondary to the clinical importance of doing what is best for our patients, and we hope these data inform the deliberations of professional society practice guideline groups. Furthermore, an economic model has been developed to fully explore the complex impact of p16 usage in clinical practice (Stoler et al, in preparation).
In summary, improvement in both accuracy and precision when diagnosing cervical biopsies is important clinically. False-negative diagnoses of HSIL (CIN2, CIN3) lesions can result in significant patient morbidity, and even potentially mortality, if patients requiring treatment are categorized as not requiring treatment and are subsequently lost to follow-up. Similarly, false-positive HSIL (CIN2, CIN3) diagnoses can lead to unnecessary treatments including loop electrosurgical excision procedures and cone biopsies that can have a significant negative impact on the subsequent pregnancies including preterm delivery.23,24 Therefore, it is reassuring that CERTAIN demonstrated an improvement in both accuracy (agreement with reference diagnoses) and precision with the adjunctive use of p16 IHC while not increasing the overall number of diagnoses of HSIL (CIN2, CIN3).
1. Stoler MH, Schiffman M. Atypical Squamous Cells of Undetermined Significance-Low-grade Squamous Intraepithelial Lesion
Triage Study (ALTS) Group. Interobserver reproducibility of cervical cytologic and histologic interpretations: realistic estimates from the ASCUS-LSIL Triage Study. JAMA. 2001;285:1500–1505.
2. Robertson AJ, Anderson JM, Beck JS, et al. Observer variability in histopathological reporting of cervical biopsy
specimens. J Clin Pathol. 1989;42:231–238.
3. Ismail SM, Colclough AB, Dinnen JS, et al. Reporting cervical intra-epithelial neoplasia (CIN): intra- and interpathologist variation and factors associated with disagreement. Histopathology. 1990;16:371–376.
4. Malpica A, Matisic JP, Niekirk DV, et al. Kappa statistics to measure interrater and intrarater agreement for 1790 cervical biopsy
specimens among twelve pathologists: qualitative histopathologic analysis and methodologic issues. Gynecol Oncol. 2005;99(suppl 1):S38–S52.
5. Gage JC, Schiffman M, Hunt WC, et al. Cervical histopathology variability among laboratories: a population-based statewide investigation. Am J Clin Pathol. 2013;139:330–335.
6. Carreon JD, Sherman ME, Guillén D, et al. CIN2 is a much less reproducible and less valid diagnosis than CIN3: results from a histological review of population-based cervical samples. Int J Gynecol Pathol. 2007;26:441–446.
7. Bergeron C, Ordi J, Schmidt D, et al. Conjunctive p16INK4a testing significantly increases accuracy in diagnosing high-grade cervical intraepithelial neoplasia. Am J Clin Pathol. 2010;133:395–406.
8. Galgano MT, Castle PE, Atkins KA, et al. Using biomarkers as objective standards in the diagnosis of cervical biopsies. Am J Surg Pathol. 2010;34:1077–1087.
9. Dijkstra MG, Heideman DA, de Roy SC, et al. p16
(INK4a) immunostaining as an alternative to histology review for reliable grading of cervical intraepithelial lesions. J Clin Pathol. 2010;63:972–977.
10. Sarian LO, Derchain SF, Yoshida A, et al. Expression of cyclooxygenase-2 (COX-2) and Ki67 as related to disease severity and HPV detection in squamous lesions of the cervix. Gynecol Oncol. 2006;102:537–541.
11. Meserve E, Berlin M, Mori T, et al. Reducing misclassification bias in cervical dysplasia risk factor analysis with p16
-based diagnoses. J Low Genit Tract Dis. 2014;18:266–272.
12. Darragh TM, Colgan TJ, Cox JT, et al. The Lower Anogenital Squamous Terminology Standardization Project for HPV-Associated Lesions: background and consensus recommendations from the College of American Pathologists and the American Society for Colposcopy and Cervical Pathology. J Low Genit Tract Dis. 2012;16:205–242; Erratum in: J Low Genit Tract Dis
13. Wright TC Jr, Stoler MH, Behrens CM, et al. The ATHENA human papillomavirus study: design, methods, and baseline results. Am J Obstet Gynecol. 2012;206:46.e1–46.e11.
14. Stoler M, Bergeron C, Cogan TJ, et alKurman RJ, Carcangiu ML, Herrington CS, Young RH. Tumours of the uterine cervix. WHO Classification of Tumours of Female Reproductive Organs, 4th ed. Lyon, France: IARC; 2014:169–206.
15. US Food and Drug Administration. Establishing the Performance Characteristics of In Vitro Diagnostic Devices for the Detection or Detection and Differentiation of Human Papillomaviruses; Guidance for Industry and Food and Drug Administration Staff. Washington, D.C.: US Department of Health and Human Services; 2011. Document issued on November 28, 2011.
16. Shrout P, Fleiss JL. Intraclass correlation: uses in assessing rater reliability. Psychol Bull. 1979;86:420–428.
17. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174.
18. Massad LS, Einstein MH, Huh WK, et al. 2012 updated consensus guidelines for the management of abnormal cervical cancer screening tests and cancer precursors. Obstet Gynecol. 2013;121:829–846.
19. Horn LC, Reichert A, Oster A, et al. Immunostaining for p16INK4a used as a conjunctive tool improves interobserver agreement of the histological diagnosis of cervical intraepithelial neoplasia. Am J Surg Pathol. 2008;32:502–512.
20. Sayed K, Korourian S, Ellison DA, et al. Diagnosing cervical biopsies in adolescents: the use of p16 immunohistochemistry
to improve reliability and reproducibility. J Low Genit Tract Dis. 2007;11:141–146.
21. Gurrol-Díaz CM, Suárez-Rincón AE, Vázquez-Camacho G, et al. p16INK4a immunohistochemistry
improves the reproducibility of the histological diagnosis of cervical intraepithelial neoplasia in cone biopsies. Gynecol Oncol. 2008;111:120–124.
22. Reuschenbach M, Wentzensen N, Dijkstra MG, et al. p16INK4a immunohistochemistry
in cervical biopsy
specimens: a systematic review and meta-analysis of the interobserver agreement. Am J Clin Pathol. 2014;142:767–772.
23. Simoens C, Goffin F, Simon P, et al. Adverse obstetrical outcomes after treatment of precancerous cervical lesions: a Belgian multicentre study. BJOG. 2012;119:1247–1255.
24. Kyrgiou M, Athanasiou A, Kalliala IEJ, et al. Obstetric outcomes after conservative treatment for cervical intraepithelial lesions and early invasive disease. Cochrane Database Syst Rev. 2017;11:CD012847. DOI: 10.1002/14651858.CD012847.
cervical biopsy; squamous intraepithelial lesion; immunohistochemistry; p16; diagnostic agreement
Supplemental Digital Content
Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.