OBJECTIVE: To estimate the agreement among multiple expert colposcopists evaluating high-resolution digitized cervigrams taken from patients with a variety of human papillomavirus (HPV) infection states and previous cervigram interpretations.
METHODS: Twenty expert colposcopists evaluated 939 digitized images of the uterine cervix obtained after the application of 5% acetic acid during the ASCUS-LSIL Triage Study. Twenty images selected to represent a broad range were graded by all the colposcopists. The remaining 919 pictures were distributed by stratified random sampling, such that each image was evaluated by two colposcopists, and each expert evaluated 112 images with similar distributions of cervigram diagnoses and HPV DNA test results. We evaluated interrater agreement among the pairs of colposcopists and confirmed the conclusions using the 20 images they all graded.
RESULTS: Pairs of colposcopists agreed on the diagnosis for only 56.8% of images. Similar agreement was seen regarding number of visible lesions (of low-grade or greater). This variability in ratings remained when the images were stratified by final histologic diagnosis or HPV status. The results were confirmed by the presence of large variability in ratings (ranging in some cases from normal to cancer) for the 20 images graded by all colposcopists.
CONCLUSION: Colposcopic diagnosis using static images is poorly reproducible and might reflect similar problems in clinical practice. Researchers should question the use of colposcopic images as a reference standard for teaching and evaluating the presence or severity of disease.
LEVEL OF EVIDENCE: II