An estimated 7% of women will meet the diagnostic criteria for vulvodynia and those afflicted will commonly suffer significant psychosocial problems, including sexual dysfunction, anxiety, infertility, and divorce.1–4 Even though vulvodynia has been recognized to be a rather common affliction, evidence-based treatment options for vulvodynia are few, largely resulting from the dearth of randomized clinical trials (RCTs). As is evident from the clinicaltrials.gov Web site, research efforts to identify effective treatments for vulvodynia by RCTs have been limited to date. The future expansion of RCTs for vulvodynia will require clear and widely accepted definitions of disease, inclusion/exclusion criteria, and outcome measures.5 An expert panel, the Initiative on Methods, Measurements, and Pain Assessment in Clinical Trials (IMMPACT) group has defined what constitutes evidence of successful outcomes, known as outcome domains, for pain trials and has recommended standard measurement tools for these outcome domains.6,7
A standardized tampon insertion and removal test, the Tampon Test provides an alternative to sexual intercourse pain as an outcome measure for vulvodynia research. Although most women with vulvodynia seek treatment for a complaint of insertional dyspareunia, the assessment of intercourse pain as a primary outcome measure raises practical and methodologic difficulties. In severe cases, vulvodynia may be so intense that affected patients may completely abstain from intercourse. As a result, the use of intercourse pain as a primary outcome measure may be problematic for recruitment, data analyses, and generalization of results. Recent analysis of a large population-based sample found pain with tampon insertion to be one of the strongest risk factors for the development of vulvodynia.8 The Tampon Test reflects a common, real-life experience well understood by patients and clinicians. Following IMMPACT recommendations, the Tampon Test incorporates important aspects of disease-specific, patient-reported outcomes using a numeric rating scale.6,9 We examined the reliability, construct and discriminant validity, responsiveness to change, and feasibility of the Tampon Test as an outcome measure for vulvodynia clinical trials, and we compared the Tampon Test to individual and composite measures of pain intensity/quality recommended by the IMMPACT group.
MATERIALS AND METHODS
The Vulvar Vestibulitis Clinical Trial, funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health, was a randomized, placebo-controlled, double-blinded clinical trial to study the clinical efficacy of four medical treatments for vulvar vestibulitis syndrome (localized vulvodynia): 1) topical lidocaine, 2) oral desipramine, 3) combined lidocaine and desipramine, and 4) placebo cream and tablets. The Vulvar Vestibulitis Clinical Trial was conducted at Strong Memorial Hospital of the University of Rochester between August 2002 and July 2007, and the protocol was reviewed and approved by the University of Rochester Research Subjects Review Board (RSRB #8677). A blocked randomization scheme, using a uniform random number generator and employing a block size of eight ensured the four possible treatment combinations would occur equally or would not be greater than two assignments for any given treatment group. The duration of study drugs lasted 12 weeks with postintervention follow-up at 16, 26, and 52 weeks. Clinical response from randomization to 12 weeks (the end of the randomized, blinded phase of the trial) was assessed by change in pain by numeric rating scale of a weekly Tampon Test compared with a number of measures with preexisting reliability/validity data or prior published experience in vulvodynia clinical trials, including change in overall daily pain intensity (24 hour numeric rating scale),7 the frequency of sexual intercourse (insertional attempts per week),10 the change in intercourse pain numeric rating scale,10 vulvar algesiometer score,11 and the cotton swab test pain level by verbal reporting scale.12 In addition, during each study visit participants completed a battery of pain and health related quality-of- life measures recommended by IMMPACT including: the Brief Pain Inventory, Short Form-McGill Pain Questionnaire (SF-MPQ), Profile of Mood States, and the Beck Depression Inventory.7 For the primary outcome analysis of the clinical trial (to be published later), we hypothesized that the response rates would be 20% for the double placebo group, 50% for each treatment used alone, and 80% when the two treatments are used together. Therapeutic response of desipramine/lidocaine was estimated from preliminary reported data from our group (Foster DC, Duguid KM. Open label study of oral desipramine and topical lidocaine for the treatment of vulvar vestibulitis. Abstract, International Conference on Mechanism and Treatment of Neuropathic Pain. Rochester, New York, 1998). A Bonferroni–corrected 80% power level required a total of 104 participants to complete the trial for a two-tailed test with alpha=0.05. Assuming a 25% dropout rate, we therefore estimated 130 participants were needed to be randomly assigned into the trial.
Our present objective is to report data from prerandomization (baseline) through the first postrandomization visit (week 8) to demonstrate the utility of the Tampon Test as an outcome measure for vulvodynia clinical trials. Baseline cross-sectional comparisons used the mean of the specific outcome variable over three prerandomization time points as Baseline (week –2, week –1, week 0). Longitudinal comparisons of outcome change over time used the mean of the specific outcome variable over three prerandomization time points as Baseline (week –2, week –1, week 0) and calculated the change in the respective mean of the outcome variable over three time points ending with week 8 (week 6, week 7, week 8).
Women were invited to participate if they reported greater than three continuous months' duration of vulvar symptoms of insertional dyspareunia and/or pain with tampon insertion, and were aged between 18 and 50 years. After informed consent, all study candidates completed a standard history and physical examination. To be included in the trial, participants needed to fulfill Friedrich's Criteria for the diagnosis of vulvodynia, including tenderness localized within the vestibule confirmed by the cotton swab test modified from the technique of Bergeron et al12 The cotton swab test was performed on defined points of the labia majora, minora, and lower vagina. A positive cotton swab test was operationally defined as follows. In four defined points (1:00, 5:00, 7:00, and 11:00) within the vulvar vestibule, the participants should report mean score equal to or greater than 4 of 10 on a verbal rating scale. This modified the criteria of Bergeron et al12 by excluding cotton swab test testing at 12:00 and 6:00 of the vulvar vestibule as defined points. This modification was made with the intent of reducing the chance of inclusion of painful conditions such as skenitis and vaginal fourchette fissures that might evoke a pain response in those respective sites. The localized nature of pain was confirmed by finding all remaining cotton swab test points tested in the lower vagina, labia majora, and labia minora to be nonpainful, defined as a mean score equal to or less than 2 of 10 in pain on a verbal rating scale. A second clinician-examiner would perform a second independent examination of the candidate and would need to concur with the diagnosis of vulvar vestibulitis syndrome. Additionally, eligible candidates did not demonstrate any other specific neuropathology, atrophic vaginitis, dermatitis such as vulvar dystrophy, or pathogens such as culture/smear-proven Candida spp. or Herpes simplex.
Participants were provided with Original Regular Tampax Tampons (Proctor & Gamble Corp., Cincinnati, OH) supplied in standard cardboard applicator for insertion. Original Regular Tampax Tampons are 5.5 cm long and 1.5 cm. in diameter when contained in the cardboard applicator. The cardboard applicator length is 12.8 cm. Original Regular Tampax Tampons are made of a combination of cotton and rayon. The exact fiber proportions are proprietary to Proctor & Gamble Corp.; the string is made of 100% cotton, and the applicator is made of cardboard.
Detailed instructions concerning the performance and documentation of the weekly Tampon Test, the daily 24-hour pain measure, and intercourse pain measure were given to each participant on the first prerandomization visit (Week –2) by the Research Nurse/Coordinator. Each study participant was verbally instructed to 1) deposit the tampon fully into the vagina above the level of the hymeneal ring with the cardboard applicator, 2) remove the applicator from the vagina, and 3) finally remove the tampon from the vagina by traction on the tampon string. The participant was instructed to remove the tampon immediately after vaginal insertion. The participant was instructed not to lubricate the tampon before insertion and to insert/deposit the tampon using only the supplied cardboard applicator. On a weekly basis and in a consistent manner, the participant was instructed to insert and immediately remove the tampon and record the degree of pain during the entire insertion/removal experience on a 0–10 pain numeric rating scale, with 0 meaning no pain, and 10 meaning the worst possible pain. The participant would then record her level of pain by marking the corresponding number on a linear pain scale printed on the back of the first page of each week in her Vulvar Vestibulitis Clinical Trial logbook. All information was reviewed and recorded during the weekly telephone call by the Research Nurse/Coordinator and later confirmed after return of the Vulvar Vestibulitis Clinical Trial logbook on scheduled study visits. During the prerandomization (Baseline) phase of the trial, eligible participants were required to demonstrate an adequate baseline level of pain (average 4 of 10 or greater) on the Tampon Test to proceed to randomization. This criterion was used because lower baseline pain levels on the Tampon Test would limit the ability of the RCT to demonstrate greater improvement with treatment compared with placebo.
On a daily basis during the trial, participants reported whether they experienced sexual intercourse in the last 24 hours. The possible responses were 1—“No, too painful” would indicate the participant could not accept an approach to physical intimacy because of pain, 2—“No, not interested” would indicate that the participant was not in the mood for sexual intimacy, 3—“No, no opportunity” would indicate that her partner was not available, 4—“Yes” would mean an attempt at sexual intercourse was made. If intercourse was attempted, the participant was asked to rate her level of pain during intercourse on a 0–10 pain scale—0 meaning no pain, and 10 meaning the worst possible pain. She would then record her level of pain by marking the corresponding number on a linear pain scale printed on the front of the daily diary page.
Other than the initial visit (Week –2), when two examiners confirmed the clinical diagnosis of localized vulvodynia, participants were evaluated consistently during following visits by the same research clinician (D.C.F.) with quantitative sensory tests (cotton swab test and algesiometer), selective palpation of pelvic muscles for pain, and a battery of psychometric tests. During each study visit of the trial, all components of the examination were performed by a single examiner in identical fashion to the first prerandomization (Week –2) visit. The algesiometer, generously supplied by Curnow and Morrison, Plymouth, UK, consisted of a mechanical pulse generator that drove a probe against the mucocutaneous surface of the vulva for a calibrated distance and force ranging from 176 mN to 1,868 mN in eight increments.13 A standard four-anatomic site test of the vestibule was routinely used as described by Eva et al.11 We used a method of limits with the pain threshold determined as the first consistent verbal report of stimulus pain.14 Participants needed to demonstrate consistently positive responses for two consecutively increasing stimulus intensities. Algesiometer score resulted from the summation of the pain thresholds from the four anatomic sites (0 to 28 score range, with higher score corresponding to less vestibular pain). During a pelvic examination conducted at each study visit, selective muscle palpation included digital palpation of the levator ani, obturator internus, and piriformis muscle groups. Notation was made for each muscle group, anatomic site, and pain level on a 0 to 4 scale corresponding to none, mild, moderate, and severe pain, respectively. In addition, the Brief Pain Inventory, Short Form-McGill Pain Questionnaire (SF-MPQ), Neuropathic Pain Scale, Profile of Mood States, Beck Depression Inventory, Sexual and Physical Abuse History, Multidimensional Pain Inventory, Dyadic Adjustment Scale, Communication Pattern Questionnaire, and Index of Sexual Satisfaction were administered, and participants were asked to answer psychometric questions according to their overall pain state.
This report focuses on the Brief Pain Inventory, SF-MPQ, the Neuropathic Pain Scale, the Profile of Mood States, and Beck Depression Inventory for the purpose of validating the Tampon Test based on psychometric measures recommended by IMMPACT for evaluating treatment efficacy and effectiveness.6,7 Outcome domains (in italics) and recommended measures include 1) pain intensity—pain over each 24-hour period, pain with intercourse (if attempted), cotton swab test, and algesiometer score; 2) pain quality—SF-MPQ and Neuropathic Pain Scale; 3) physical functioning—Brief Pain Inventory Interference Scale score; and 4) emotional functioning—Beck Depression Inventory and Profile of Mood States.
Over the three prerandomization (Baseline) Tampon Test assessments, test–retest reliability was assessed with a Kappa statistic, weighted Kappa statistic, and the Shrout-Fleiss intraclass correlation.15 To evaluate construct validity, we performed Pearson and Spearman correlations examining associations between Tampon Test scores and the other outcome measures. The Tampon Test and the other outcome measures were analyzed in two ways: cross-sectional baseline values, and longitudinal change in values over time, without reference to treatment group allocation. Participant acceptance of the Tampon Test was evaluated by adherence to the measure compared with the intercourse pain measure. Correlations of the Tampon Test with cotton swab test vaginal pain and with pelvic muscle pain to palpation were included to reflect specificity of the Tampon Test to pain localized to the vestibule compared with superficial vaginal and deep pelvic pain, respectively.
Of the 150 women consented for the Vulvar Vestibulitis Clinical Trial, 132 participants were randomly assigned, and 118 participants returned through the first postrandomization visit (week 8). Table 1 summarizes characteristics of the 118 participants who completed the trial from Baseline week –2 to the first postrandomization visit, week 8. Of the 18 consented candidates/participants who were excluded or dropped out before drug randomization, 10 candidates decided not to participate in the trial, five candidates did not receive diagnostic agreement by examiners, and three participants did not demonstrate adequate levels of pain (4 of 10 or greater) on initial Tampon Test. Of the 15 participants randomly assigned to study drug who did not complete the trial, there were two pregnancies, four removed by research staff because of concern for adverse effects (hypertension/tachycardia (one), elevated liver enzymes (one), symptomatic palpitations (one), poor record keeping (1). Nine elected to drop out of the study. Of participants completing week 8 (Table 1), mean age was 30.4±7.6 years, racial/ethnic mix was predominantly non-Hispanic white, mean years of education was 16.0±3.0 years, 69.5% reported being presently sexually active, 55.1% reported a history of pain with first sexual activity, and 63.6% reported a history of pain with first tampon insertion. Adherence to tampon insertion on a weekly basis was excellent, with 1,136 tests completed of 1,180 participant weeks (96.3%), compared with intercourse pain measurement, for which only 586 tests were completed of 1,180 participant weeks (49.7%). Comparatively, the Tampon Test demonstrated a twofold higher adherence compared with the intercourse pain measure, despite encouragement for both activities by the Research Nurse. Participants were asked explain in the Vulvar Vestibulitis Clinical Trial logbook why they did not attempt intercourse. Participants reported “no partner” (55.2% of unattempted participant weeks), “too painful” (7.6% of unattempted participant weeks), and “not interested” (37.2% of unattempted participant weeks).
Test–retest reliability was estimated by examining week-to-week Tampon Test pain recorded by each participant during the prerandomization (Baseline) weeks –2, –1, and 0 phase of the trial. During the three weekly prerandomization assessments, the Tampon Test means were 4.6±2.6 (Week –2); 4.6±2.7 (Week –1); and 4.7±2.8 (Week 0), based on the 0 to 10 numeric rating scale. Weighted Kappa Tampon Test reliability was Kappa=0.52 for Weeks –2 and –1, Kappa=0.52 for Weeks –1 and 0 measures, and Kappa=0.38 for Weeks –2 and 0. Such Kappa values reflect moderate week-to-week agreement for Weeks –2 and –1 and Weeks –1 and 0 and fair week-to-week agreement for Weeks –2 and 0. The Shrout-Fleiss intraclass correlation was 0.48 for the three baseline Tampon Test assessments and 0.74 for the average of the three baseline assessments.
For the cross-sectional assessment of construct validity, the Tampon Test significantly correlated with daily 24 hours pain rating r=0.38, P<.001; intercourse pain r=0.22, P=.04; the Brief Pain Inventory r=0.34, P=.001; and the Neuropathic Pain Scale total score r=0.19, P=.03. Spearman coefficients displayed similar results to Pearson coefficients for these correlations, and scatterplot reviews for each of the correlations displayed a linear relationship pattern (scatterplot data not shown).
For the longitudinal assessment of construct validity and responsiveness to change, change in Tampon Test scores were significantly correlated with change in measures of daily 24-hour pain r=0.42, P<.001; intercourse pain r=0.35, P=.003; cotton swab test vestibule pain r=0.38, P<.001; algesiometer scores r=–0.33, P<.001; SF-MPQ sensory subscale scores r=0.30, P=.005; Brief Pain Inventory Interference scale scores r=0.49, P<.001; and Neuropathic Pain Scale total scores r=0.33, P<.001. Spearman coefficients displayed similar results to Pearson coefficients for these correlations, and scatterplot reviews for each of the correlations displayed a linear relationship pattern (scatterplot data not shown).
Table 3 displays a correlation matrix of pain intensity/quality measures and psychometric measures in addition to the Tampon Test correlations of Table 2. Of particular note, the highest correlation was found between the Baseline to week 8 change in 24-hour pain and change in the Brief Pain Inventory, r=0.55. Additionally, there was a complete lack of correlation between changes in cotton swab test–evoked vestibular pain or algesiometer-evoked pain and changes in intercourse pain, r=0.01 and r=0.00, respectively. Comparing the correlation matrices of Table 3 to the corresponding correlations of the Tampon Test (Table 2) shows no single outcome measure surpasses the Tampon Test in breadth and strength of association.
We studied the potential effect of selected comorbid conditions on Tampon Test pain. Unpaired sample t tests were used assess possible comorbid effects on the Tampon Test by selected historical categorical variables. No significant effect on Tampon Test pain was found in the presence of endometriosis, irritable bowel syndrome, interstitial cystitis, history of rape/sexual abuse, or a report of “never using tampons before.” A significant difference in Tampon Test pain was found when fibromyalgia was present (t=2.30, P=.02). A linear model was developed incorporating fibromyalgia and overall 24-hour pain as independent variables regressed against Tampon Test pain. Overall 24-hour pain remained highly predictive of Tampon Test pain, adjusting for the presence of fibromyalgia, (t=3.76, P<.001). On the other hand, fibromyalgia no longer significantly predicted Tampon Test pain, adjusting for overall 24-hour pain (t=1.68, not significant). When Tampon Test was done within 7 days of onset of menses, Tampon Test pain was not significantly different: 5.3±1.7 within 7 days of menses, 4.2±2.2 outside of time period, (t=1.47, not significant). As is evident in Table 2, Tampon Test scores were not significantly correlated with measures of levator, obturator, and piriformis muscle pain to palpation nor with the cotton swab assessment of vaginal pain. The Tampon Test scores also did not significantly correlate with variation in mood or affect as reflected by the Beck Depression Inventory, SF-MPQ affective subscale, and the Profile of Mood States.
A consensus group (IMMPACT) has published recommendations for the conduct of clinical trials in chronic pain and describes the ideal primary outcome measure to include qualities of appropriateness of content, reliability, validity, responsiveness, and limited participant burden.7 The Tampon Test is a readily understandable real-life outcome measure that demonstrated good week-to-week reliability using weighted Kappa and intraclass correlation coefficients. The consistent means and variability for the Tampon Test over Weeks –2, –1, and 0 indicate the absence of change in pain intensity secondary to a practice effect. The Tampon Test was significantly associated with a number of the IMMPACT core outcome dimensions and specifically recommended measures6,7 including 1) pain intensity—pain over each 24 hour period, pain with intercourse (if attempted), cotton swab test and algesiometer scores; 2) pain quality—McGill Pain Questionnaire–Short form (MPQ-SF), and 3) physical functioning—Brief Pain Inventory scores. Pearson and Spearman correlations were consistent in demonstrating statistical robustness of the findings. Comparing the correlations of all outcome measures examined in the present analyses, no other measure correlated as highly or as frequently with other outcome measures as the Tampon Test.
With respect to construct validity, we evaluated two dimensions: first, the Tampon Test baseline values were compared cross-sectionally to baseline values of other outcome measures, and second, Tampon Test change longitudinally over time to changes in other outcome measures. Our intent was to evaluate the ability of the Tampon Test to measure the severity of pain through the cross-sectional comparisons and to evaluate the ability of the Tampon Test to measure response to treatment through the longitudinal comparisons. Table 2 indicates that the Tampon Test displays broader and stronger associations with changes in outcome measures over time compared with cross-sectional associations at baseline. This would suggest that the Tampon Test has stronger validity in measuring response to treatment over time compared with measuring pain severity at a single time point. The ability of an outcome measure to measure change over time exemplifies the quality of responsiveness, a critical requirement for clinical trial outcome measures, which must reflect improvement (or worsening) and successfully distinguish efficacy among different treatment groups.
Discriminant validity was evident through several observations: first, the Tampon Test was not influenced by comorbid conditions such as endometriosis and interstitial cystitis, second, the Tampon Test did not correlate with evoked pain measures outside of the vestibule, and third, the Tampon Test did not correlate with psychometric measures of affect. To elaborate, the Tampon Test showed good correlation with cotton swab and algesiometer assessments of vestibular pain—change over time, a pivotal assessment for future studies of response to treatment. In contrast, testing of other anatomic regions of the genital tract including cotton swab-evoked vaginal pain and palpation-evoked levator muscle pain failed to correlate with the Tampon Test. With respect to the “emotional functioning” outcome dimension, the Tampon Test did not significantly correlate with the Beck Depression Inventory (BDI), Profile of Mood States (POMS), and McGill Affective scores. As a result, the Tampon Test may be less influenced by the participant's affect or short-term emotional variation, thereby strengthening discriminant validity over the duration a clinical trial.
In vulvodynia outcomes research to date, primary outcome variables have fallen into three major categories, characterized by both strengths and weaknesses: 1) composite pain scores (commonly a combination of psychometric tests, personal pain assessment, and practitioner pain assessment); 2) individually designed participant questionnaires and clinical assessment instruments; and 3) quantitative sensory testing.10,16–18 Composite pain scores commonly consist of one or several psychometric tests combined with other measures of patient-reported outcomes and clinician-reported outcomes, which may have variable reliability and validity. Many composite measures lack specificity with regard to vulvodynia and could be influenced by comorbid pain and mood disorders. The complexity of composite scores may also hinder interpretation. Individually designed questionnaires and examination assessments used in single studies may be quite specific to vulvodynia but commonly overlook reliability and validity testing. In contrast to composite scores and individually designed assessment tools, quantitative sensory testing provides reliable measures that can be specifically designed for vulvar pain assessment. Several research groups have developed algesiometers for quantitative sensory testing assessments that produce calibrated mechanical stimuli for pain testing of the vulva.13,19 Unfortunately, quantitative sensory testing assessments are instrument dependent, making replication difficult when the instruments are not commercially available. Quantitative sensory testing measures also lack direct clinical relevance, which may limit their use as a primary endpoint, although quantitative sensory testing may be quite valuable as a surrogate or secondary endpoint.
There are several limitations of the present study and of the Tampon Test as an outcome measure for vulvodynia clinical trials. Some participants may report Tampon Test baseline data that may be too low to permit effective analysis of outcomes in an RCT. The proportion excluded from the present study was small (3%) but does dilute the potential pool of participants. Some individuals with vulvodynia reported a much higher level of intercourse pain in contrast to tampon insertion pain, highlighting the fact that the Tampon Test does not fully replace intercourse pain as an outcome measure. The Tampon Test, by nature, evokes a self-inflicted pain compared with partner-inflicted intercourse pain, and this difference may lead to a distinctly different experience and perception of pain. Intercourse pain also carries a psychosexual dimension that cannot be equated to pain associated with tampon insertion nor can intercourse pain be equated to evoked pain of the cotton swab or algesiometer to the vestibule, as is evident in Table 3. Intercourse pain outcome measures will therefore remain a major, albeit problematic focus of vulvodynia RCTs.
A crucial facet of pain outcomes research is the development of well-defined, understandable, reliable, and valid outcome variables. Pain with tampon insertion is a common symptom in women with vulvodynia that in many cases precedes the development of intercourse pain.8 Among the criteria developed for evaluating the quality of chronic pain outcome measures, the greatest weight has been given to appropriateness of the measure's content, reliability, validity, responsiveness, and limited respondent burden.7 In addition, for assessments of pain intensity, study endpoints with patient-reported outcomes on a numeric rating scale are preferred. The Tampon Test therefore fulfills key attributes of a core outcome measure for vulvodynia pain. Rather than simply being a surrogate for intercourse pain, the Tampon Test reflects another real life behavior. We have shown in this report that the Tampon Test is a reliable and valid outcome measure, one that is associated with a wide range of other pain outcome measures in both cross-sectional and longitudinal assessments. Importantly, its excellent adherence rate of more than 95% indicates that patients find it an acceptable and feasible approach to evaluating their pain.