Secondary Logo

Journal Logo

Factors affecting the reproducibility and validity of colposcopy for product development: review of current literature

Ballagh, Susan MD

JAIDS Journal of Acquired Immune Deficiency Syndromes: October 2004 - Volume 37 - Issue - p S152–S155

Colposcopy has been adapted from cervical cytology screening as a tool to assess the tolerance of vaginal products in development for anti-infective and contraceptive indications. As the vagina is poorly enervated, symptoms do not correlate well with visual findings, which increases the importance of secondary assessment. Dysplasia screening has used biopsy liberally to verify the colposcopic diagnostic impression. The few studies that include correlative biopsy with colposcopy performed for product development are reviewed. A recent study of the reproducibility of two observations at a single visit by a single or two different physicians is discussed. The review points to magnification and experience of the observer as factors in improving the agreement between observations. Observers are more likely to agree on areas greater than 1 cm in diameter. Variation is reduced when fewer observers are used and they report findings as dichotomous or categorical data.

CONRAD Clinical Research Center, Eastern Virginia Medical School

Correspondence to Susan A. Ballagh, MD, CONRAD Clinical Research Center, Eastern Virginia Medical School, 601 Colley Avenue, Norfolk, VA 23507-1627, USA. Tel: +1 757 446 8471; fax: +1 757 446 8998; e-mail:

The colposcopy technique was initially described by Hans Hinselmann in 19251 as a routine screening test to detect cervical cancer. It was widely adopted several decades later in the United States as a secondary screening tool to complement routine cytology screening.2 Colposcopy guides biopsy and is important to identify overt carcinoma that may be missed on cytology as a result of focal ulceration, bleeding and associated inflammation that may obscure or miss the overlying epithelium.3 Other colposcope adaptations include operative microsurgery, the evaluation and documentation of trauma,4 and the evaluation of vaginal adenosis accompanying in-utero diethylstilboestrol exposure.5 Given these earlier applications, it was logical to apply colposcopy to the assessment of irritation for vaginal product development in the 1990s.6 It was also clear that symptoms alone were poor predictors of physical findings in the vagina and cervix.7,8

Back to Top | Article Outline


In other uses, the colposcopic impression is routinely validated with accompanying histological analyses. Milla Villeda et al.9 reported a sensitivity of 87% and a specificity of 86%, with a 97% negative predictive value of colposcopy for dysplasia compared with standardized transformation zone biopsy. Few studies to date have correlated visual findings with biopsy evidence of irritation for product development. Bounds et al.8 called attention to vaginal changes after an average of 4 months' use of a levonorgestrel-releasing contraceptive ring. These changes included colposcopic erythema (in 35%), raised circular or striated ridges and intracellular edema. All but one erythematous area with acetowhitening (13 of 48) was biopsied. Findings included congestion with widely dilated vessels, chronic inflammatory infiltrate (cell types not specified) and intercellular edema. The epithelium was thinner than normal. Human papillomavirus changes (four) and cervical intraepithelial neoplasia grade 1 (one) were also noted in the biopsies. Given the limited application of biopsy, definitive conclusions could not be reached.

Stafford et al.10 found mild or moderate inflammatory infiltrate of lymphocytes and macrophages in the lamina propria, just beneath the epithelium in seven out of 20 nonoxynol-9 users, one of whom had mild inflammation at baseline. Only two out of 20 women using placebo had mild (one) or moderate (one) inflammation. The overlying epithelium was intact in samples with inflammatory infiltrate, consistent with the colposcopic impression. A small number of inflammatory cells permeated the superficial layer. Simultaneous colposcopy demonstrated erythema, most evident on the cervix, in nine nonoxynol-9 users versus two controls. In that study, the specificity of colposcopy was 81% whereas the sensitivity was 56% for predicting inflammation on the vaginal biopsy.

Another study evaluating biopsy and colposcopy in a dextrin sulfate trial7 used pre- and post-exposure biopsies of the vaginal fornices. No inflammatory infiltrate was seen in any of the biopsies. These biopsies did not, however, specifically target erythema or vascular changes, and the report does not specify whether the biopsies sampled areas identified as ‘abnormal’ with the colposcope. When biopsy and colposcopy findings were correlated in the Pro2000 UK cohort, Van Damme et al.11 did not see as clear a correlation of inflammatory infiltrate with colposcopically detected erythema. In that study there were as many subjects with inflammatory infiltrate at baseline (5/36) as after product exposure (5/35 - one biopsy proved unsuitable for analysis). Increased inflammation was seen in as many placebo gel users (two) as 4% pro 2000 gel users (two). Again, the study was limited because biopsies did not target colposcopically abnormal tissue.

Although biopsy appears promising as a marker of inflammation for nonoxynol-9 exposure, using standard hematoxylin and eosin staining, it may not be as useful in other settings. It is certainly possible that dextrin sulfate and Pro2000 may really not have induced inflammation at all. However, other histological inflammatory markers should be explored and further biopsy studies should include colposcopically targeted tissue plus positive and negative control treatments, if possible to provide more convincing evidence of a lack of inflammatory effect.

Back to Top | Article Outline

Baseline findings

To visualize changes caused by a vaginal product, it is important to be aware of the initial findings that are present by performing baseline colposcopy. The incidence of baseline findings has ranged from 18%12 to 58%13 of women, depending on the population and the definition of ‘findings’. It is clear that recent intercourse increases the number of findings, especially when observed within 24 h.11,14 The use of diaphragms at intercourse has not been reported to cause a further increase in findings,15 but other vaginal products such as tampons have been implicated in microulcerations and ‘layering’ or severe dryness.16 No study to date has fully collated the number of baseline findings with the many possible sources of variation. The myriad of factors that contribute to initial variation make it difficult, if not impossible to establish norms. To reduce ‘noise’ in the colposcopic assessment of irritation for product development, baseline colposcopy is imperative.

Colposcopy does increase the number of observed findings compared with naked eye assessment alone. Over half the findings noted in a comparative trial of colposcopy were missed on the preceding naked eye examination.17

Back to Top | Article Outline


As biopsy is not performed routinely, the reproducibility of the procedure becomes critical, especially when several observers or different study centers participate in a single trial.

Back to Top | Article Outline

Image review

One alternative way to verify single observer impression is the option of image review after the examination. This has been evaluated in one trial by Stafford et al.7, but was limited in that few findings were identified in the cohort. Only five findings of vascular disruption (petechia or ecchymosis) were noted in three out of 36 women and five mild erythematous areas were noted. The blinded reviewer, using 35 mm colpo-photographs, was unable to identify any of the erythematous areas but found one new area. The vascular changes, however, were all verified on review. Similarly, in a more recent study designed to compare the colposcopic impression of two observers evaluating the same cervicovaginal findings in vivo,17 the most reproducible findings were petechiae and ecchymoses, which were most likely to match exactly in size, color and diagnostic assessment compared with other types of findings.

As realistic color is critical to irritation assessment, the use of computer images generated at the time of the examination is likely to improve the image review. A comparison of 35 mm photos recorded without the benefit of direct observation to computer-generated images showed improved focus and exposure with digital images. (A Comparison of Colposcopic Techniques and Evaluation of the Variability Between Two Observers.) White balance, aperature adjustment and illumination intensity can all be varied to make the digital image more realistic before it is recorded. The computer enhancement of images or color standardization with colored disks or other tools might further enhance image reproducibility.18 Colposcopy for irritation depends on red/green color detection that can potentially be optimized with computer enhancements.19 These image-processing techniques will have to be developed specifically for irritation assessment because light/dark contrast is valued for dysplasia detection.

Another potential limitation of image review is the loss of the three-dimensional binocular view obtained with direct colposcopic observation. In dysplasia colposcopy, whereas two reviewers of cervicography may readily agree (interobserver kappa = 0.4020 to 0.6221 looking at the same photographic image), an in-vivo colposcopic assessment has been shown to approximate histology more closely when compared with image review (cervicography) in the setting of dysplasia diagnosis.22 Ferris et al.23 similarly found that telecolposcopists, particularly when reviewing static delayed images, agreed less with an on-site expert observer (65%) than if viewed in real time (73%), and that the on-site expert agreed even more often (82%) with the standard observer. Mitchell et al.24 found that the receiver operator curve for dysplasia diagnosis favored colposcopy (0.95 area under the curve; AUC) over delayed expert image review, cervicography (0.90 AUC), or fluorescent spectroscopy (0.76 AUC). The value of the three-dimensional binocular view has not been documented in the setting of irritation assessment, and several speakers described successful two-dimensional techniques later in these proceedings.

Back to Top | Article Outline

Interobserver variation

A CONRAD-sponsored study carried out 26 paired colposcopic examinations in 13 women using the same two physicians with the order of examination randomized between observers.17 The observers did not discuss their findings, and independently recorded any observed color change, epithelial disruption, or blood vessel damage. Finding size was graded as less than 5 mm, 5–9 mm or greater than 9 mm. Location was marked by quadrant, and the depth of the epithelial disruption was graded as superficial or deep according to standard procedures.25 All assessments were made after a speculum was placed and secretions removed with irrigation. No additional manipulation, including the use of swabs, was permitted.

The number of findings reported by each examiner was assessed in the study for naked eye and colposcopic evaluations. Although a statistically significant difference was noted in the number of naked eye findings that each examiner reported (Wilcoxon signed ranks test, P = 0.03), the number of findings between examiners showed no statistically significant difference when assessed with the colposcope. Magnification improved the agreement between observers.

Half of the findings of blood vessel changes matched exactly between observers, compared with erythema that matched exactly in only one case and matched by diagnosis but not location/size details in 21% of cases. Superficial epithelial loss (peeling) matched on diagnosis between observers (20/43) but the description of the exact location or extent varied in all but one case. Only two deep findings were identified overall and only by one examiner. Both were less than 5 mm in size. When a kappa analysis of worst category agreement was carried out, an uncorrected kappa between observers was 0.18, which increased to 0.32 when corrected for the severity of lesion type. Overall, 41% of the findings matched, 8% of them exactly. The colposcopic assessments were more likely to match than naked eye assessment, which was statistically different on a number of findings alone.

Back to Top | Article Outline


After reviewing relevant literature regarding the technique of colposcopy for irritation assessment, it is clear that interobserver variability is inherent to the procedure given the dependence on subjective assessment. Certain approaches such as computer image processing and standardization may make the technique somewhat more reproducible, but a more objective approach to assess irritation via biopsy or cervicovaginal lavage should be pursued. Some clear findings from the CONRAD study that are supported by previous experience of colposcopic dysplasia assessment include:

  • Magnification improves. Not only does magnification result in the identification of twice as manly findings, but it also improves the agreement between observers in paired examinations.
  • Experience counts. Fewer changes were seen between naked eye impressions and colposcopic impressions with a more experienced observer in the CONRAD tudy. Ferris et al.23 found more satisfactory examinations for dysplasia with experience. De Sutter et al.26 reported that senior assessors detected cervical intraepithelial neoplasia (CIN) better with cervico-graphy compared with juniors, and Morrison et al.22 found a more consistent assessment of cervical ectropion with experience. Image interpretation training will probably improve results, as suggested by improved performance with cervicography, but results after training suggest clear limits of visual screening.27
  • Size matters. Deep epithelial disruption smaller than 1 cm in diameter is difficult to distinguish from more superficial involvement unless bleeding is present. imilarly, the diagnosis of dysplastic epithelium is more likely as more quadrants are involved.28 Many findings are so small that they are missed on naked eye examination.
  • Better agreement with dichotomous variables and fewer observers. The more variables that are described, the more likely is interobserver variation. The numbers of lesions or diagnoses are more likely to match between observers than specifics regarding location or size. Similarly, with dysplasia colposcopy, interobserver agreement on the presence or absence of the transformation zone was better than agreement on area and border characteristics.29 When observers are limited, the data will be more comparable across visits for a given individual or study.

Colposcopy appears to be the best available option for irritation assessment when compared with naked eye imaging or hand-held magnification.17 As a result of limitations in interobserver agreement, it is best applied to early clinical trials using a few experienced investigators with conditions that limit all other vaginal activity before the assessment of product effect.

Back to Top | Article Outline


Support was provided under a cooperative agreement with the United States Agency for International Development (USAID). The views expressed by the author do not necessarily reflect the views of USAID.

Back to Top | Article Outline


1. Hinselman H. Verbesserung de Inspektinosmoglichkeiten von Vulva, Vagina, und Portio. Munch Med Wochenshr, 1925;73:1733.
2. Townsend DE, Ostergard DR, Mishell DR Jr, et al. Abnormal Papanicolaou smears. Evaluation by colposcopy, biopsies, and endocervical curettage. Am J Obstet Gynecol, 1970; 108:429–434.
3. Morrow CP, Townsend DE. Premalignant and related disorders of the lower genital tract. In: Synopsis of Gynecologic Oncology. New York: Churchill Livingstone Inc.; 1987:1–43.
4. Soderstrom RM. Colposcopic documentation. An objective approach to assessing sexual abuse of girls. J Reprod Med, 1994;39:6–8.
5. Aldrich JO, Henderson BE, Townsend DE. Diagnostic procedures for the stilbestrol-adenosis-carcinoma syndrome. N Engl J Med, 1972;287:934.
6. World Health Organization. Manual for the standardization of colposcopy for the evaluation of vaginally administered products. Global Programme on AIDS. Geneva: WHO; 1995.
7. Stafford MK, Cain D, Rosenstein I, et al. A placebo-controlled, double- blind prospective study in healthy female volunteers of dextrin sulphate gel: a novel potential intravaginal virucide. J Acquir Immune Defic Syndr Hum Retrovirol, 1997;14:213–218.
8. Bounds W, Szarewski A, Lowe D, et al. Preliminary report of unexpected local reactions to a progestogen-releasing contraceptive vaginal ring. Eur J Obstet Gynecol Reprod Biol, 1993;48:123–125.
9. Milla Villeda RH, Alvarado Zaldivar G, Sanchez Anguiano LF, et al. Colposcopy and cervical biopsy in patients with routine Papanicolaou smear. Ginecol Obstet Mex, 1997;65:235–238.
10. Stafford MK, Ward H, Flanagan A, et al. Safety study of nonoxynol-9 as a vaginal microbicide: evidence of adverse effects. J Acquir Immune Defic Syndr Hum Retrovirol, 1998;17:327–331.
11. Van Damme L, Wright A, Depraetere K, et al. A phase I study of a novel potential intravaginal microbicide, PRO 2000, in healthy sexually inactive women. Sex Transm Infect, 2000;76:126–130.
12. Fraser IS, Lahteenmaki P, Elomaa K, et al. Variations in vaginal epithelial surface appearance determined by colposcopic inspection in healthy, sexually active women. Hum Reprod, 1999;14:1974–1978.
13. van De Wijgert J, Fullem A, Kelly C, et al. Phase 1 trial of the topical microbicide BufferGel: safety results from four international sites. J Acquir Immune Defic Syndr, 2001;26:21–27.
14. Norvell MK, Benrubi GI, Thompson RJ. Investigation of microtrauma after sexual intercourse. J Reprod Med, 1984;29:269–271.
15. Soper DE, Brockwell NJ, Dalton HP. Evaluation of the effects of a female condom on the female lower genital tract. Contraception, 1991; 44:21–29.
16. Raudrant D, Landrivon G, Frappart L, et al. Comparison of the effects of different menstrual tampons on the vaginal epithelium: a randomised clinical trial. Eur J Obstet Gynecol Reprod Biol, 1995;58:41##-16.
17. Ballagh SA, Mauck CK, Henry D, et al. A Comparison of techniques to assess cervicovaginal irritation and evaluation of the variability between two observers. Contraception, 2004;70:241–249.
18. Crisp WE, Craine BL, Craine EA. The computerized digital imaging colposcope: future directions. Am J Obstet Gynecol, 1990;162:1491-1467; discussion 1497–1498.
19. Craine BL, Craine ER, O'Toole CJ, et al. Digital imaging colposcopy: corrected area measurements using shape-from-shading. IEEE Trans Med Imaging, 1998;17:1003–1010.
20. Sellors JW, Nieminen P, Vesterinen E, et al. Observer variability in the scoring of colpophotographs. Obstet Gynecol, 1990;76:1006–1008.
21. Cecchini S, Iossa A, Bonardi R, et al. Evaluation of the sensitivity of cervicography in a consecutive colposcopic series. Tumori, 1992;78: 211–213.
22. Morrison CS, Bright P, Blumenthal PD, et al. Computerized planimetry versus clinical assessment for the measurement of cervical ectopia. Am J Obstet Gynecol, 2001;184:1170–1176.
23. Ferris DG, Macfee MS, Miller JA, et al. The efficacy of telecolposcopy compared with traditional colposcopy. Obstet Gynecol, 2002;99:248–254.
24. Mitchell MF, Cantor SB, Brookner C, et al. Screening for squamous intraepithelial lesions with fluorescence spectroscopy. Obstet Gynecol, 1999;94:889–896.
25. CONRAD/World Health Organization. Manual for the standardization of colposcopy for the evaluation of vaginal products: Update 2000. Arlington, VA: CONRAD/WHO; 2000.
26. De Sutter P, Coibion M, Vosse M, et al. A multicentre study comparing cervicography and cytology in the detection of cervical intraepithelial neoplasia. Br J Obstet Gynaecol, 1998;105:613–620.
27. Schneider DL, Burke L, Wright TC, et al. Can cervicography be improved? An evaluation with arbitrated cervicography interpretations. Am J Obstet Gynecol, 2002; 187:15–23.
28. Pretorius RG, Belinson JL, Zhang WH, et al. The colposcopic impression. Is it influenced by the colposcopist's knowledge of the findings on the referral Papanicolaou smear? J Reprod Med, 2001;46: 724–728.
29. Hopman EH, Voorhorst FJ, Kenemans P, et al. Observer agreement on interpreting colposcopic images of CIN. Gynecol Oncol, 1995;58:206–209.

Colposcopy; methodology; reproducibility; irritation; vagina

© 2004 Lippincott Williams & Wilkins, Inc.