The diagnostic interpretation of hematoxylin and eosin (H&E)–stained cervical biopsies is subject to substantial variability between readers, leading to potential under-treatment of women with high-grade precancerous lesions (high-grade squamous intraepithelial lesions [HSIL]) or greater, or overtreatment in case of false-positive diagnostic interpretations.1–4 In particular, low diagnostic agreement rates between readers have been reported within the HSIL (cervical intraepithelial neoplasia of grade 2 [CIN2]) category, which is the diagnostic threshold for ablative treatment in many cervical cancer screening programs.1,5,6 Numerous studies have reported on the potential use of biomarkers to improve the reliability of the interpretation of cervical biopsy specimens.7–11 The Lower Anogenital Squamous Terminology (LAST) project by the American Society for Colposcopy and Cervical Pathology (ASCCP) and the College of American Pathologists (CAP) performed a systematic review of the published literature regarding the use of biomarkers to improve diagnostic agreement.12 The authors of LAST concluded that p16INK4a immunohistochemistry (p16 IHC) is the only biomarker with sufficient clinical evidence to be recommended for routine clinical use. They also provided specific guidance on when to utilize p16 IHC when interpreting cervical biopsies.12
There have been only a few studies that have systematically analyzed the effect of adjunctive p16 IHC testing on the diagnostic performance and reader reliability of surgical pathologists in the United States.8,11 Here, we report the study design and results of a large clinical validation study, the CERvical Tissue AdjunctIve aNalysis (CERTAIN) study, which was conducted as an US Food and Drug Administration registration study to evaluate the impact of adjunctive p16 IHC on the interpretation of H&E-stained cervical punch biopsies by surgical pathologists in the United States. The study was designed to evaluate the impact of using adjunctive p16 IHC staining on the overall percent agreement (OPA, equivalent to accuracy when assuming reference diagnoses reflect clinical truth), positive percent agreement (PPA, equivalent to sensitivity), negative percent agreement (NPA, equivalent to specificity), inter-reader reproducibility, and the performance of individual surgical pathologists (ISPs). Further analyses about the impact of adjunctive p16 staining when the ISPs’ use of p16-stained slides was limited to a subset of cases per LAST recommendations12 will be reported separately (Wright et al, manuscript in preparation).
MATERIALS AND METHODS
Study Cohort and Procedures
The study reading set comprised a total of 1100 cervical biopsies with diagnoses of No-CIN, CIN1, CIN2, CIN3/adenocarcinoma in situ (ACIS), or invasive carcinoma. Of these specimens, 897 were obtained from archived specimens of the ATHENA screening trial in the United States,13 and 203 were obtained from the specimen repository of an independent anatomic pathology laboratory (Aurora Diagnostics, Greensboro, NC). ATHENA-trial sourced specimens were selected by stratified randomization to achieve a distribution of histologic diagnoses comparable to that expected in a colposcopy referral population in a clinical practice setting. The additional 203 tissue blocks were sequentially enrolled from the Aurora Diagnostics specimen archive and represent a colposcopy referral population. All specimens were required to meet all of the following inclusion criteria of (i) being a formalin-fixed, paraffin-embedded diagnostic cervical biopsy, and (ii) having the information about associated patient age and cervical cytology result available. Human Papillomavirus (HPV) test results were available from the original patient enrollment visit for all ATHENA study cases, and HPV status information was retrieved from patient records at Aurora Diagnostics as available.
Histology and IHC
Four serial sections were prepared from each tissue block and mounted on glass slides. Slides #1 and #4 were used for H&E staining, whereas slide #2 was used for p16 IHC staining using the CINtec Histology product (Ventana Medical Systems Inc., Tucson, AZ), which is based on primary p16-specific mouse monoclonal antibody clone E6H4 and the OptiView DAB IHC Detection kit (Ventana Medical Systems Inc.) on a BenchMark ULTRA system (Ventana Medical Systems Inc.) according to the specifications of the manufacturer. Slide #3 was subjected to IHC using a Negative Ig Control Antibody (Negative Control [Monoclonal], #760-2014; Ventana Medical Systems Inc.). All slides used for IHC included on-slide tissue controls (2-in-1 control block sections prepared using cervical tissue of known p16 positivity and negativity, respectively). A total of 5 external laboratory sites in the US participated in the H&E and p16 IHC staining of the study cases.
p16 IHC Stain Interpretation
For cases deemed adequate for interpretation, a p16 IHC-staining pattern of “diffuse,” “focal,” or “no staining” was recorded on the slide evaluation case report forms based on the specific staining criteria described in Table 1. The criteria used are in agreement with the criteria defined by LAST.12 Cases with diffuse, nuclear, and/or cytoplasmic p16 IHC staining are considered positive for p16 IHC (Figs. 1B, D, F), whereas cases showing either a focal p16 staining (Fig. 1H) or no staining (Fig. 1J) are considered negative for p16 IHC.
Readers were trained on the interpretation of p16-staining results and how to use the p16 IHC test result in conjunction with the diagnostic interpretation of H&E-stained slides using an online training module. Subsequently, a qualification test was performed using electronic images to ensure that readers appropriately assessed IHC stain adequacy, p16-staining pattern (diffuse, focal, or no staining), and p16 IHC status (positive/negative).
Central Pathology Review for Reference Diagnoses
For establishing reference diagnoses for the study cases, a Central Pathology Review (CPR) was performed. Biopsies were reviewed independently and in random order by 2 US pathologists specializing in gynecologic pathology (M.H.S. and T.C.W.). For histology, the LAST and 2014 World Health Organization terminology that incorporates Bethesda terminology, was simplified to the CIN grade. Thus, CIN1 indicates low-grade squamous intraepithelial lesions (LSIL), CIN2 indicates HSIL (CIN2), and CIN3 indicates HSIL (CIN3).12,14 Each case was initially presented as the 2 bracketing H&E-stained slides (slides #1 and #4), together with the case’s corresponding patient age, cervical cytology result, and HPV test result (if available). If both pathologists agreed on the histologic diagnosis for a case, the consensus result was used as the H&E CPR reference diagnosis (CPRH&E). Cases with disagreement between the 2 CPR readers as to histologic diagnosis were subjected to an independent review by a third gynecologic pathologist (A.F.). If 2 of the 3 diagnoses agreed, a CPRH&E diagnosis was achieved. If all 3 pathologists disagreed, cases were subjected to a joint adjudication review including all 3 pathologists, and a final CPRH&E diagnosis was established.
After a minimum of 1-month wash-out period, the same complete process was repeated but using both H&E and p16 IHC-stained slides to establish an additional reference diagnosis for each case (CPRH&E+p16).
ISP Recruitment and Slide Reading
Recruitment of ISPs was performed by sending an invitation and survey link to >2,300 US-based surgical pathologists included in a medical society mailing list. Of the 135 pathologists completing the survey, 115 were considered qualified to participate in the study. To meet the acceptance criteria, pathologists had to (i) be board-certified in anatomic pathology, (ii) be licensed and currently practicing medicine within the United States, and (iii) evaluate cervical histology specimens as part of their routine clinical practice. On the basis of the responses from the survey, efforts were made to involve pathologists from different geographic regions in the United States, and representing a range of experience levels, practice volumes, and employment types (academic institution, hospital, private laboratory, or reference laboratory). Of the 115 pathologists, a total of 70 individual pathologists were finally recruited by the study sponsor and divided into 4 separate reader cohorts of 17 or 18 readers each; a stratified random selection process was used to ensure mostly comparable reader demographics (experience level, practice volume) for the 17 to 18 readers of each reader cohort.
The total study set of 1100 cervical punch biopsy cases was divided into 4 reading sets of 275 cases each, following a stratified randomization approach using the consensus diagnostic results of the CPRH&E. The number of CIN2+ cases included in the 4 individual reading sets varied from 41 (14.9%) to 42 (15.3%). During the first round of review each of the 70 ISPs independently diagnosed 275 cases in the reading subset randomly assigned to them using H&E-stained slides only (ISPH&E) and the same diagnostic categories used for the CPR. Information about the patient’s age at the time of biopsy collection, preceding cervical cytology result, and (if available) HPV test result were provided for each case. After a 4-week wash-out period, each ISP reader independently re-evaluated all cases in the same reading subset as in round 1. During round 2, H&E-stained slides together with case-matched p16 and negative reagent control-stained slides were assessed by the individual pathologists (ISPH&E+p16). Readers were blinded to round 1 reading results.
Results were entered into the study database using an electronic case report form. After database lock, results were analyzed according to the predefined statistical analysis plan.
Study Objectives and Statistical Methods
The primary study objective was to demonstrate (i) a statistically significant increase in the OPA (equivalent to diagnostic accuracy when assuming an expert consensus diagnosis as a surrogate of clinical truth), together with (ii) noninferior PPA (equivalent to sensitivity) of the H&E+p16 method compared with H&E alone at the CIN2+/CIN1− diagnostic threshold, conditional on adjudicated expert-derived reference diagnoses. Additional study objectives were to evaluate the impact of the adjunctive p16 stain on inter-reader reproducibility, and on the performance of ISPs when results were compared with an expert-derived reference diagnosis established by adjudication on H&E-stained slides and p16-stained slides.
The statistical association between p16 IHC status (positive, negative) and the reference diagnoses established by CPR either as CPRH&E or CPRH&E+p16 were demonstrated using cross tabulations. P-values were calculated based on Fisher exact test.
For the agreement analyses comparing ISPs’ reading results to the reference diagnosis, PPA and NPA and OPA, as well as positive and negative likelihood ratios (PLR and NLR) and positive and negative predictive values (PPV and NPV) were calculated to measure each ISPs’ diagnostic performance. Two-sided 95% confidence intervals (CI) for the ISPs’ OPA, PPA, and NPA were calculated using the Wilson score method. Two-sided 95% CIs for ISP’s PLR, NLR, PPV, and NPV were calculated using the method as recommended by US Food and Drug Administration.15 Besides ISP diagnostic performance, reader-averaged OPA, PPA, NPA, PLR, NLR, PPV, and NPV across all ISPs were also calculated. For these reader-averaged diagnostic performance measures, 2-sided 95% CIs were calculated using the percentile bootstrap method, and since each case consisted of repeated measures from 17 to 18 ISPs, each case was treated as the sampling unit, and all observations belonging to that case were kept together when constructing resamples for the bootstrap CIs. Wald test P-values were also calculated comparing the reader-averaged diagnostic performances between ISPH&E and ISPH&E+p16 with the standard errors estimated from the bootstrap samples.
For assessing agreement between ISPs, inter-rater reliability was calculated for ISPH&E and ISPH&E+p16 separately using the Fleiss kappa coefficient.16 Kappa coefficients were judged per the guidance developed by Landis and Koch.17
Study Cohort Characteristics
The baseline characteristics of the total cohort of 1100 cervical punch biopsy specimens used in this study are shown by case source in Table 2. The ATHENA-sourced (ATHENA cohort) and the Aurora Diagnostics-sourced (Aurora cohort) study cases were similar in patient age (mean, 35.3 versus 36.6-y old; median, 34.0 versus 33.0-y old for the ATHENA versus Aurora cohorts, respectively), as well as with respect to HPV positivity rates for cases with available HPV test results (91.9% vs. 91.7%). The prevalence of cases with previous negative for intraepithelial lesion or malignancy cytology results was substantially lower in the Aurora cohort than in the ATHENA-sourced material. In ATHENA all HPV-positive women, irrespective of cervical cytology results, were referred to colposcopy. In contrast, the Aurora cohort consists of biopsies from women referred to colposcopy based on standard clinical guidelines. Table 3 shows baseline characteristics of the total study cohort by the reference diagnoses established by CPR of H&E-stained slides (CPRH&E). The total number of CIN2+ cases by CPRH&E was 167, corresponding to 15.2% (167/1100) of the study cases.
CPR Results and p16 IHC Status
The associations between the p16 IHC status (positive, negative) and the 2 CPR reference diagnoses (CPRH&E and CPRH&E+p16) are shown in Table 4. p16 IHC positivity rates increased with higher CIN grades (P<0.0001). Using CPRH&E+p16 (followed by CPRH&E in parentheses) as the reference diagnosis and the 1080 of 1100 cases (98.2%) with p16 IHC results available, 1.1% (7.5%) of No-CIN, 50.6% (58.3%) of CIN1, 100% (94.5%) of CIN2, and 100% (98.6%) of CIN3 cases were p16 IHC positive, as was the single invasive carcinoma case in the study.
CPRH&E+p16 showed a higher number of CIN diagnoses across all 3 CIN grades as compared with CPRH&E: the number of cases with a CPR reference diagnosis of CIN1 increased from 163 (CPRH&E) to 170 (CPRH&E+p16), from 91 to 137 for CIN2, and from 70 to 77 for CIN3, respectively, whereas the number of cases diagnosed as No CIN decreased from 755 to 696 (Table 4).
Agreement of ISPs Reading Results With Reference Diagnoses
Seventy ISP from different geographic regions within the United States collectively performed a total of 19,250 slide readings per method, with 17 or 18 pathologists each reading 1 of the 4 reading subsets of 275 cases. Table 5 shows the frequency distribution of the ISPs’ diagnoses using H&E (ISPH&E) and using H&E+p16 (ISPH&E+p16), dichotomized at the CIN2+ (ie, CIN2, CIN3, ACIS, invasive carcinoma)/CIN1− (ie, No CIN, CIN1) threshold, for both reference diagnoses CPRH&E and CPRH&E+p16. The same analysis but dichotomized at the CIN3+ (ie, CIN3, ACIS, invasive carcinoma)/CIN2− (ie, No CIN, CIN1, CIN2) threshold is provided as Supplemental Table S1 (Supplemental Digital Content 1, http://links.lww.com/PAS/A617).
The number of ISPs’ diagnoses in exact agreement with the respective CPRH&E reference diagnoses of No CIN, CIN1, CIN2, CIN3, and ACIS significantly increased when H&E+p16 was used (ISPH&E+p16) compared with H&E only (ISPH&E; difference between ISPH&E+p16 and ISPH&E of 2.8%; 95% CI, 1.8, 3.7; P<0.0001) (Supplemental Table S2, Supplemental Digital Content 2, http://links.lww.com/PAS/A618). These ISP agreement rates on the exact diagnostic categories more than doubled when using the CPRH&E+p16 as the reference diagnoses (difference of 6.1%; 95% CI, 5.1, 7.0; P<0.0001) (Supplemental Table S3, Supplemental Digital Content 3, http://links.lww.com/PAS/A619).
Agreement rates of ISPs’ diagnoses using H&E only versus H&E+p16 with both CPR-derived reference diagnoses (CPRH&E and CPRH&E+p16) dichotomized at the CIN2+ threshold were calculated for overall agreement (OPA) across all cases, as well as for agreement with CIN2+ cases (PPA) and CIN1− cases (NPA). As shown in Table 6, all agreement rates (overall, positive, and negative) significantly improved when ISPs used H&E+p16-stained slides. Using CPRH&E as the reference diagnosis, the improvement was 2.2% (95% CI, 1.3, 3.0; P<0.0001) for OPA, 6.8% (95% CI, 4.7, 9.0; P<0.0001) for PPA, and 1.3% (95% CI, 0.5, 2.3; P=0.0032) for NPA. These differences in agreement rates between ISPs’ diagnoses for the 2 methods were substantially higher when using the CPR-derived diagnoses using H&E+p16-stained slides (CPRH&E+p16) as the reference (Table 6): the improvement for OPA was 4.7% (95% CI, 3.9, 5.4; P<0.0001), for PPA was 11.5% (95% CI, 9.3, 13.5; P<0.0001), and for NPA was 3.0% (95% CI, 2.2, 3.7; P<0.0001).
The effect of adjunctive p16 IHC use on the change in each ISP’s performance using H&E only versus H&E+p16 relative to CPRH&E+p16 is shown graphically in Figure 2: the graph shows a plot of PPA (sensitivity) against 1-NPA (1-“specificity”) for each of the 70 ISPs using H&E only (blue circles) and using H&E+p16 (red circles). Agreement rates shown in this figure (PPA; 1-NPA) are for the dichotomous comparison at CIN2+ versus CIN1− threshold. Supplemental Figure S1 (Supplemental Digital Content 4, http://links.lww.com/PAS/A620) shows the same plotted graph for ISP performance on H&E versus H&E+p16 using the CPRH&E reference diagnosis.
Additional analyses were performed to assess the effect of adjunctive p16 IHC on reader-averaged PLR and NLR, respectively, and PPV and NPV, respectively. As shown in Supplemental Table S4 (Supplemental Digital Content 5, http://links.lww.com/PAS/A621), all 4 measures improved from ISPs’ reading H&E only to H&E-stained plus p16-stained slides, and all improvements were highly statistically significant.
ISP inter-rater reliability using H&E only and H&E+p16 was evaluated for each ISP reader cohort (ie, 17 or 18 pathologists, reading the same cases), and for all ISP readers pooled. The Fleiss kappa coefficient significantly improved from 0.58 (95% CI, 0.57, 0.59) for H&E only (moderate agreement) to 0.73 (96% CI, 0.72, 0.74; P<0.0001) for H&E+p16-stained slides (substantial agreement) pooled across all ISP readers.
CERTAIN is a large US-based study designed to evaluate the adjunctive use of p16 IHC when interpreting cervical biopsies. Our results confirm previous findings that the adjunctive use of p16 IHC significantly increases diagnostic agreement with an adjudicated consensus diagnosis (CPR) as a reference.7–9 CERTAIN included 70 US-based ISPs who provided a total of 19,250 diagnoses per method, using either H&E slides or H&E+p16-stained slides from 1100 cervical biopsies. OPA, PPA, and NPA (equivalent to diagnostic accuracy, sensitivity, and specificity, respectively, when assuming the reference diagnoses as clinical truth) were all significantly increased for ISPH&E+p16 compared with ISPH&E using either a CPR reference diagnosis of CPRH&E or CPRH&E+p16. Although NPA (specificity) showed a statistically significant improvement (difference of 3.0% when using CPRH&E+p16), the biggest difference was observed for PPA (sensitivity) for CIN2+ (difference of 11.5% when using CPRH&E+p16; P<0.0001). This corresponds to a 43.1% reduction of false-negative HSIL (CIN2, CIN3) results using CPRH&E+p16. An almost identical reduction in false-negative results for CIN2+ was observed in a similarly designed large European study (see discussion below): the false-negative rate for CIN2+ dropped by 43.4% when comparing H&E+p16 to H&E only. In the current study, similar sensitivity gains were observed using CIN3+ as the diagnostic threshold. This improvement in the correct identification of CIN3 lesions has considerable clinical relevance, as CIN3 lesions are considered direct precursors to invasive squamous cervical carcinoma.
Concerns have been raised that the routine use of p16 IHC may result in an increase of CIN2+ diagnoses which could lead to an increase in the number of women undergoing treatment.12 However, in CERTAIN the absolute numbers of ISP CIN2+ diagnoses did not increase when adjunctive p16 was used. There were 3996 ISP CIN2+ diagnoses using H&E compared with 3982 using H&E+p16 (Tables 5A, B). In addition, regardless of which CPR reference diagnosis was used, the number of ISP diagnoses matching the CPR reference diagnoses at the CIN2+/CIN1− threshold increased using H&E+p16 compared with H&E only (Tables 5A, B). Just as important, the number of ISP diagnoses of CIN2+ on cases where the CPR reference diagnosis was CIN1− decreased, regardless of which reference diagnosis was used. Of note, whereas the total number of ISP CIN2+ diagnoses was essentially the same using H&E and H&E+p16, the number of ISP CIN3+ diagnoses was ∼12% higher when H&E+p16 was used compared with H&E alone (Supplemental Table S1, Supplemental Digital Content 1, http://links.lww.com/PAS/A617). However, approximately half (96/178) of the additional ISPH&E+p16 CIN3+ diagnoses were classified as CIN3+ by the CPRH&E and all except 7 were classified as CIN3+ by the CPRH&E+p16. These are important findings, as they show that the use of p16 as a diagnostic adjunct to the interpretation of H&E-stained cervical biopsies does not lead to an increase in the overall number of women that may receive treatment using current US treatment guidelines.18 Nevertheless, the authors believe that it is important to remind pathologists that:
- a positive p16 IHC result does not necessarily indicate the presence of a HSIL (CIN2, CIN3) lesion; instead, a diffuse p16-staining pattern in cervical tissue specimens suggests the presence of a transforming HPV infection. Of note, in this study, 50.6% of biopsies classified as LSIL (CIN1) using CPRH&E+p16 and 58.3% using CPRH&E were p16 positive, a rate similar to what was observed in previous studies.7,8 LSIL (CIN1) lesions may show a diffuse p16-staining pattern (considered p16 IHC positive) (Figs. 1D, F), or a focal staining pattern (Fig. 1H), or no immunoreactivity (Fig. 1J), the latter 2 being considered p16 IHC negative;
- the histologic grading of squamous intraepithelial lesions into LSIL (CIN1), HSIL (CIN2), and HSIL (CIN3) subsets has to be performed based on the interpretation of usual morphologic features on the H&E-stained slide(s) used together with the respective p16-stained slide(s);
- the extent of diffuse p16 staining within the squamous epithelium does not necessarily correlate with the grade of the lesion: for example, CIN1 lesions may show one third, one half to two thirds (Fig. 1F), or even full thickness (Fig. 1D) p16 staining within the squamous tissue.
Similar results to those found in CERTAIN were previously reported by a European study that evaluated the adjunctive p16 IHC use on all cervical biopsy specimens.7 The European study involved 12 community pathologists from 4 different European countries and 500 cervical biopsies.7 As in CERTAIN, the diagnoses of the community pathologists were compared with adjudicated consensus diagnoses established by 3 European expert gynecologic pathologists on H&E-stained slides. However, in the European study the expert gynecologic pathologists did not establish a second consensus diagnosis using H&E+p16. In the European study a sensitivity increase of 10% (from 77% for H&E to 87% for H&E+p16) was associated with a loss in specificity of 1.0% (from 89% for H&E to 88% for H&E+p16). Furthermore, almost identical observations were made in both CERTAIN and the European study when analyzing the effect of adjunctive p16 IHC use on the agreement between the community pathologists. Kappa values as a measure of reader agreement corrected by chance showed moderate agreement between pathologists for the interpretation of H&E-stained slides (κ=0.58 in CERTAIN study, κ=0.57 in European study), and a significant increase to substantial agreement for the interpretation of H&E+p16-stained slides (κ=0.73 in CERTAIN study and κ=0.75 in European study; P=0.0001). Thus, besides the improved agreement of surgical pathologists with expert gynecologic pathologists derived consensus diagnoses of HSIL (CIN2, CIN3) (ie, improved “accuracy”), the adjunctive use of p16 IHC also consistently improves the agreement between readers (ie, improved precision). Similar results have been observed in several other studies evaluating the effect of adjunctive p16 IHC on inter-rater reliability.19–22
It is worth noting that the data described in this phase of CERTAIN characterize the maximal potential impact of adjunctive p16 IHC testing on patient outcomes in that all biopsies had p16 staining. And certainly, the data show that individual pathologists vary in their p16 interpretations and conjunctive interpretations. Yet as already noted interpretive variability is reduced substantially compared with H&E alone. One of the major concerns during the development of the LAST recommendations was for potential p16 IHC overuse. The impact of restricting p16 usage within CERTAIN will be detailed in a subsequent manuscript (Wright et al, in preparation). Given the perception of fairly wide adoption of LAST, the data to come will provide insight on how the ISP group makes choices to use p16, and the impact of restricting p16 usage on the sensitivity and specificity and reproducibility of cervical biopsy diagnosis. Thus, the 2 data sets will provide a sense of the boundaries on the benefits of p16 IHC in accurately identifying patients who may need treatment versus the potential harms of overdiagnosis. However, as shown in the present analysis, even if every cervical biopsy was stained adjunctively with p16 the outcome does not lead to excess treatment, while it maximizes the identification of cases identified as CIN3 by CPR. The cost implications of any expanded usage relative to current practice in our opinion are secondary to the clinical importance of doing what is best for our patients, and we hope these data inform the deliberations of professional society practice guideline groups. Furthermore, an economic model has been developed to fully explore the complex impact of p16 usage in clinical practice (Stoler et al, in preparation).
In summary, improvement in both accuracy and precision when diagnosing cervical biopsies is important clinically. False-negative diagnoses of HSIL (CIN2, CIN3) lesions can result in significant patient morbidity, and even potentially mortality, if patients requiring treatment are categorized as not requiring treatment and are subsequently lost to follow-up. Similarly, false-positive HSIL (CIN2, CIN3) diagnoses can lead to unnecessary treatments including loop electrosurgical excision procedures and cone biopsies that can have a significant negative impact on the subsequent pregnancies including preterm delivery.23,24 Therefore, it is reassuring that CERTAIN demonstrated an improvement in both accuracy (agreement with reference diagnoses) and precision with the adjunctive use of p16 IHC while not increasing the overall number of diagnoses of HSIL (CIN2, CIN3).
1. Stoler MH, Schiffman M. Atypical Squamous Cells of Undetermined Significance-Low-grade Squamous Intraepithelial Lesion Triage Study (ALTS) Group. Interobserver reproducibility of cervical cytologic and histologic interpretations: realistic estimates from the ASCUS-LSIL Triage Study. JAMA. 2001;285:1500–1505.
2. Robertson AJ, Anderson JM, Beck JS, et al. Observer variability in histopathological reporting of cervical biopsy specimens. J Clin Pathol. 1989;42:231–238.
3. Ismail SM, Colclough AB, Dinnen JS, et al. Reporting cervical intra-epithelial neoplasia (CIN): intra- and interpathologist variation and factors associated with disagreement. Histopathology. 1990;16:371–376.
4. Malpica A, Matisic JP, Niekirk DV, et al. Kappa statistics to measure interrater and intrarater agreement for 1790 cervical biopsy specimens among twelve pathologists: qualitative histopathologic analysis and methodologic issues. Gynecol Oncol. 2005;99(suppl 1):S38–S52.
5. Gage JC, Schiffman M, Hunt WC, et al. Cervical histopathology variability among laboratories: a population-based statewide investigation. Am J Clin Pathol. 2013;139:330–335.
6. Carreon JD, Sherman ME, Guillén D, et al. CIN2 is a much less reproducible and less valid diagnosis than CIN3: results from a histological review of population-based cervical samples. Int J Gynecol Pathol. 2007;26:441–446.
7. Bergeron C, Ordi J, Schmidt D, et al. Conjunctive p16INK4a testing significantly increases accuracy in diagnosing high-grade cervical intraepithelial neoplasia. Am J Clin Pathol. 2010;133:395–406.
8. Galgano MT, Castle PE, Atkins KA, et al. Using biomarkers as objective standards in the diagnosis of cervical biopsies. Am J Surg Pathol. 2010;34:1077–1087.
9. Dijkstra MG, Heideman DA, de Roy SC, et al. p16(INK4a) immunostaining as an alternative to histology review for reliable grading of cervical intraepithelial lesions. J Clin Pathol. 2010;63:972–977.
10. Sarian LO, Derchain SF, Yoshida A, et al. Expression of cyclooxygenase-2 (COX-2) and Ki67 as related to disease severity and HPV detection in squamous lesions of the cervix. Gynecol Oncol. 2006;102:537–541.
11. Meserve E, Berlin M, Mori T, et al. Reducing misclassification bias in cervical dysplasia risk factor analysis with p16-based diagnoses. J Low Genit Tract Dis. 2014;18:266–272.
12. Darragh TM, Colgan TJ, Cox JT, et al. The Lower Anogenital Squamous Terminology Standardization Project for HPV-Associated Lesions: background and consensus recommendations from the College of American Pathologists and the American Society for Colposcopy and Cervical Pathology. J Low Genit Tract Dis. 2012;16:205–242; Erratum in: J Low Genit Tract Dis
13. Wright TC Jr, Stoler MH, Behrens CM, et al. The ATHENA human papillomavirus study: design, methods, and baseline results. Am J Obstet Gynecol. 2012;206:46.e1–46.e11.
14. Stoler M, Bergeron C, Cogan TJ, et alKurman RJ, Carcangiu ML, Herrington CS, Young RH. Tumours of the uterine cervix. WHO Classification of Tumours of Female Reproductive Organs, 4th ed. Lyon, France: IARC; 2014:169–206.
15. US Food and Drug Administration. Establishing the Performance Characteristics of In Vitro Diagnostic Devices for the Detection or Detection and Differentiation of Human Papillomaviruses; Guidance for Industry and Food and Drug Administration Staff. Washington, D.C.: US Department of Health and Human Services; 2011. Document issued on November 28, 2011.
16. Shrout P, Fleiss JL. Intraclass correlation: uses in assessing rater reliability. Psychol Bull. 1979;86:420–428.
17. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174.
18. Massad LS, Einstein MH, Huh WK, et al. 2012 updated consensus guidelines for the management of abnormal cervical cancer screening tests and cancer precursors. Obstet Gynecol. 2013;121:829–846.
19. Horn LC, Reichert A, Oster A, et al. Immunostaining for p16INK4a used as a conjunctive tool improves interobserver agreement of the histological diagnosis of cervical intraepithelial neoplasia. Am J Surg Pathol. 2008;32:502–512.
20. Sayed K, Korourian S, Ellison DA, et al. Diagnosing cervical biopsies in adolescents: the use of p16 immunohistochemistry to improve reliability and reproducibility. J Low Genit Tract Dis. 2007;11:141–146.
21. Gurrol-Díaz CM, Suárez-Rincón AE, Vázquez-Camacho G, et al. p16INK4a immunohistochemistry improves the reproducibility of the histological diagnosis of cervical intraepithelial neoplasia in cone biopsies. Gynecol Oncol. 2008;111:120–124.
22. Reuschenbach M, Wentzensen N, Dijkstra MG, et al. p16INK4a immunohistochemistry in cervical biopsy specimens: a systematic review and meta-analysis of the interobserver agreement. Am J Clin Pathol. 2014;142:767–772.
23. Simoens C, Goffin F, Simon P, et al. Adverse obstetrical outcomes after treatment of precancerous cervical lesions: a Belgian multicentre study. BJOG. 2012;119:1247–1255.
24. Kyrgiou M, Athanasiou A, Kalliala IEJ, et al. Obstetric outcomes after conservative treatment for cervical intraepithelial lesions and early invasive disease. Cochrane Database Syst Rev. 2017;11:CD012847. DOI: 10.1002/14651858.CD012847.