Accurate diagnosis is a critical part of the clinical process because it allows optimal management strategies to be employed. Incorrect diagnoses may put patients at risk and waste limited resources.1,2 Medical diagnosis is usually based on information acquired from a variety of components, which make up the clinical process (ie, history, examination, and tests). The need to assess rigorously the accuracy of the clinical history and examination has been highlighted,3 but in our view, the need is to integrate this part of the clinical process with diagnostic tests. The whole clinical process should be borne in mind when conducting research to evaluate a diagnostic strategy and when implementing new tests into clinical practice. Diagnostic accuracy in studies evaluating tests in isolation from the rest of the clinical context does not necessarily indicate how useful the test will be in practice.4–6 To determine this, evaluation within the context of contemporary clinical practice is essential. This is because the information generated by a diagnostic test may have already been obtained from the patient's history and physical examination, which have taken place earlier in the clinical process. The true clinical value of a test lies in the added information over and above what was already known from the history and examination.
A stepwise multivariable approach in the evaluation of diagnostic tests7 allows the clinical context, in which the test will be used, to be accounted for. This paper develops such an approach and explores its application in the diagnosis of endometrial disease (hyperplasia or cancer) in the context of an outpatient rapid access clinic. We recently conducted systematic reviews8 addressing the diagnostic accuracy of tests (ultrasound, hysteroscopy, and endometrial biopsy) used in predicting endometrial disease in this setting. There were over 100 primary test accuracy studies, none of which used stepwise multivariable approaches. Our aim is to develop an analytic strategy, which may be used to make a recommendation for clinical practice, after validation in an independent data set.
MATERIALS AND METHODS
In an outpatient rapid access ambulatory diagnostic clinic for investigation of abnormal uterine bleeding, patients' age at presentation, menopausal status, and use of hormone replacement therapy are recorded. Patients then undergo further investigation with a combination of pelvic ultrasound scan and outpatient hysteroscopy. Data collected from 248 consecutive patients investigated in our one-stop clinic for abnormal uterine bleeding from November 1996 to December 1997 were used to develop a model to determine the added value of ultrasonography and hysteroscopy.
Univariate analyses of the tests employed in this data set have been published previously.9,10 Pelvic transvaginal ultrasonography was performed using an endovaginal 6.5-MHz transducer of a portable scanner (Hitachi Sumi, Tokyo, Japan). Double-layer endometrial thickness was measured in millimeters, and endometrial thickness of 5 mm was used as a cutoff, based on the findings of a recent review.11 Minihysteroscopy was performed using a 1.2-mm microhysteroscope with a 2.5-mm rigid sheath (Karl Storz, Tuttlingen, Germany). By applying simple pressure with the hysteroscope, the depth of mucosa was assessed; hysteroscopic features of smooth nonvascular endometrium were considered as normal hysteroscopy, whereas features of increased endometrial thickness, abnormal vascularization, and irregular friable polypoid formations with necrosis or bleeding were considered suspicious lesions.12–14 The gold standard was based on histologic diagnoses. These diagnoses were classified as either negative (secretory/proliferative endometrium and benign endometrial polyps) or positive (endometrial hyperplasia and carcinoma) because the main aim of investigations for abnormal uterine bleeding is to exclude serious intrauterine pathology, primarily endometrial cancer and potentially premalignant hyperplasia. Histologic samples were provided by outpatient endometrial biopsy (Laboratoire CCD, Paris, France), dilatation and curettage, and hysterectomy specimens. In patients where outpatient biopsy was inadequate, inpatient sampling was performed. All patients were followed-up for a minimum of 6 months, and recurrent symptoms were used as an indication for reassessment. Although outpatient biopsy is a simple and inexpensive diagnostic test, there are concerns surrounding the nonrepresentative nature of blind endometrial biopsy,15,16 and this is the rationale for employing hysteroscopic or ultrasound imaging. However, in this study, no erroneous diagnoses on outpatient biopsy were identified from hysterectomy specimens, dilatation and curettage, or after reinvestigation of symptoms at 6 months follow-up, and so its use as the diagnostic reference standard appears to be justified.
To delineate the predictive values of the historical features, age, menopausal status, and hormone replacement therapy use, and results from ultrasound and hysteroscopy (independent variables) on endometrial hyperplasia or cancer (binary dependent variable), we used logistic regression analysis.7,17 We initially performed univariable analyses followed by multivariable modeling. Four diagnostic models were produced by multivariable analysis, which included historical features alone, historical features plus ultrasonography, historical features plus hysteroscopy, and historical features plus ultrasonography and hysteroscopy. Postmenopausal status, use of hormone replacement therapy, ultrasonic endometrial thickness greater than 5 mm, and suspicious findings on hysteroscopy were all considered positive results. Age was split into six categories: less than or equal to 30 years, 31–40 years, 41–50 years, 51–60 years, 61–70 years, and over 70 years. Increasing age was considered more likely to be associated with endometrial pathology.
Our objective was to develop an analytic approach. We only had a small data set, so to allow building of multivariable models, we combined endometrial cancer and hyperplasia to increase the number of outcome events and any missing data were imputed (see Results).17 Using a univariate approach for each diagnostic (independent) variable, we calculated the diagnostic odds ratio (dOR) where values greater than 1.0 showed an increased level of accuracy. The dOR represents a ratio of the positive and negative likelihood ratios, and it can be mathematically summarized as: dOR = [sensitivity/(1-specificity)]/[specificity/(1-sensitivity)].
After the univariable analysis, we then built four multivariable logistic regression7,18 models using BMDP statistical software, release 7 (BMDP Statistical Software Inc., Los Angeles, CA). These models were built to allow the statistical approach to mimic the clinical process. The first model built was to provide a valid estimate of the combined predictive value of the historical variables (ie, age, menopausal state, and use of hormone replacement therapy). In the other three models, we determined the combined predictive added value of history and tests (ultrasonography or hysteroscopy and ultrasound or hysteroscopy combined). This mirrors the real clinical situation where historical information is acquired before undertaking investigation with ultrasound scan or hysteroscopy.
The predictive value of each model was summarized by plotting the estimates of sensitivity (true-positive rates) against 1-specificity (false-positive rates) to develop a receiver operating characteristic (ROC) curve, which characterizes the performance of the test.19,20 This is a commonly used approach in both primary and secondary research for evaluating diagnostic accuracy of tests.20–23 The ROC curves can be readily generated from logistic regression modeling, and this type of approach has been used in tests with multiple covariates.24 The area under the curve (ROC area) determines the accuracy with which the test diagnoses the condition of interest. A ROC area greater than 0.5 suggests some degree of test accuracy, with higher accuracy suggested by a ROC area closer to 1.0 (representing perfect test accuracy). The ROC areas for each model were compared. The hypothesis we wanted to test was about the improvement in prediction achieved by adding tests to the history. The χ2 test was used to test for statistically significant improvement between models and was computed from the log of the ratio of the current versus the previous likelihood function values. A small P value indicated a significant change in prediction.
The mean age of the 248 women included in this study was 50 years (range 23–94 years), and 111 women (45%) were postmenopausal. Forty-five women (18%) were under 40 years, and 73 women (30%) were taking hormone replacement therapy. Twenty-three women (9%) had endometrial disease (13 cases of hyperplasia and ten cases of cancer) on histopathologic assessment. All cases of cancer detected were in women over 40 years as were the majority of endometrial hyperplasias, 11 of 13 (85%). Data for hysteroscopy and outpatient biopsy were complete, but 10% of the total data were missing for the other variables, and these data were imputed.17 There were ten false-positive and 18 false-negative diagnoses with hysteroscopy and 121 false-positive and seven false-negative diagnoses with ultrasound measurement of endometrial thickness.
Table 1 shows the results of the univariable and multivariable approaches to summarize the diagnostic value of history and tests. The historical features (age, postmenopausal bleeding, and use of hormone replacement therapy) were all significant predictors of endometrial hyperplasia or cancer using a univariable approach. Similarly, ultrasonography and hysteroscopy were also significant predictors. These univariable evaluations were then combined in a stepwise fashion in keeping with the clinical context described above.
The predictive ability of all the historical features combined measured by the ROC area for the model including historical features was 0.78. The addition of ultrasonography to historical features significantly increased the ROC area (0.78 versus 0.82, χ2 for improvement = 5.6, P = .02) (Figure 1). Similarly, the addition of hysteroscopy to historical features significantly increased the ROC area (0.78 versus 0.81, χ2 for improvement = 7.1, P = .008) (Figure 2). The addition of both ultrasonography and hysteroscopy increased the ROC area to 0.84. This represented significantly improved predictive ability from the clinical history plus ultrasonography model (0.82 versus 0.84, χ2 for improvement = 6.9, P = .009) and from the clinical history plus hysteroscopy model (0.81 versus 0.84, χ2 for improvement = 5.4, P = .02) (Figures 1 and 2).
Diagnostic tests are evaluated for accuracy in relation to a reference standard, often in isolation from the clinical context in which they will be used. Estimates of diagnostic accuracy derived in this way can lead to erroneous inferences and may artificially inflate the value of diagnostic tests.4,5 Our study shows that to avoid misleading clinical inferences, multivariable regression models can be constructed to reflect clinical practice. Univariable analyses of the various diagnostic variables had suggested potential value in history and tests for predicting serious endometrial disease (hyperplasia or cancer) in women with abnormal uterine bleeding. In this study, multivariable analyses showed that both ultrasonography and hysteroscopy increased the prediction of serious endometrial pathology above that predicted from clinical history alone. The use of both ultrasonography and hysteroscopy together marginally (but statistically significantly) increased the predictive ability further. This study shows that use of this stepwise, multivariable approach is feasible in evaluating the diagnostic value of tests in a clinical context.
This paper illustrates the use of multivariable regression analysis in determining the value of different diagnostic workup strategies. The results for endometrial thickness measurement by ultrasound scan and hysteroscopy should, however, be interpreted in the context of hypothesis generation. This is because our approach in this paper was limited to assessing the feasibility of developing an analytic strategy using standard software. The results from our data set of 248 women with abnormal uterine bleeding should not, therefore, be seen as assessing diagnostic value of the various tests. In addition, because of relative paucity of outcome events, our estimates of accuracy may be unstable, a situation in which exact methods for performing logistic regression may be more appropriate.25 However, further studies using the analytic techniques outlined in this paper in larger data sets would be more suitable for determining the contribution of various tests with stable regression coefficients. In this way, the clinical significance of changes in predictive ability associated with particular diagnostic strategies can be determined for women with abnormal uterine bleeding.
This type of multivariable analysis to delineate the significance of diagnostic variables is important in facilitating meaningful clinical interpretation of tests. This is because it allows the added value of tests to be determined in light of information already available to the clinician from the history, thereby reflecting the real clinical situation. If analysis of diagnostic interventions in this way can be shown to add value to the overall diagnostic process, then their effectiveness and cost-effectiveness in improving clinical outcomes may require evaluation in clinical trials.26 If, however, multivariable regression analysis does not reveal any additional value of a diagnostic intervention in predicting a particular disease, then it should not be used in routine clinical practice.
1. Fineberg HV, Hiatt HH. Evaluation of medical practices: The case for technology assessment. N Engl J Med 1979; 301:1086–91.
2. Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic tests research. Getting better but still not good. JAMA 1995;274:645–51.
3. McAlister FA, Straus SE, Sackett DL. Why we need large, simple studies of the clinical examination: The problem and a proposed solution. Lancet 1999;354:1721–4.
4. Begg CB. Experimental design of medical imaging trials: Issues and opinions. Invest Radiol 1989;24:934–26.
5. Moons CKGM. Hazards of a univariable approach in diagnostic test evaluation in diagnostic research theory and application (PhD thesis), 1996:37–52.
6. Khan KS, Dinnes J, Kleijnen J. Systematic reviews to evaluate diagnostic tests. Eur J Obstet Gynecol Reprod Biol 2001;95:6–11.
7. Khan KS, Chien PFW, Dwarakanath LS. Logistic regression models in obstetrics and gynaecology literature. Obstet Gynecol 1999;93:1014–20.
8. Clark TJ, Mann CH, Shah N, Song F, Khan KS, Gupta JK. Accuracy of outpatient endometrial biopsy in the diagnosis of endometrial hyperplasia: A systematic quantitative review. Acta Obstet Gynecol Scand 2001;80:784–93.
9. Bakour SH, Dwarakanath LS, Khan KS, Newton JR, Gupta JK. The diagnostic accuracy of ultrasound scan in predicting endometrial hyperplasia and cancer in post-menopausal bleeding. Acta Obstet Gynecol Scand 1999;8:447–51.
10. Bakour SH, Dwarakanath LS, Khan KS, Newton JR. The diagnostic accuracy of outpatient miniature hysteroscopy in predicting premalignant and malignant endometrial lesions. Gynecol Endosc 1999;8:143–8.
11. Smith-Bindman R, Kerlikowske K, Felstein V, Subak L, Scheidler J, Segal M, et al. Endovaginal ultrasound to exclude endometrial cancer and other endometrial abnormalities. JAMA 1998;280:1510–7.
12. Ben-Yehuda OM, Kim YB, Leuchter RS. Does hysteroscopy improve upon the sensitivity of dilatation and curettage in the diagnosis of endometrial hyperplasia or carcinoma? Gynecol Oncol 1998;68:4–7.
13. Cicinelli E, Comi N, Scorcia P, Petrizzo O, Epifani S. Hysteroscopy for diagnosis and treatment of endometrial adenocarcinoma percursors. Eur J Gynecol Oncol 1993;5:425–56.
14. Uno LH, Sugimoto O, Carvalho FM, Bagnoli VR, Fonseca AM, Pinotti JA. Morphologic hysteroscopic criteria suggestive of endometrial hyperplasia. Int J Gynecol Obstet 1995;49:35–40.
15. Rodriguez GC, Yaqub N, King ME. A comparison of the pipelle device and the Vabra aspirator as measured by endometrial denudation in hysterectomy specimens: The Pipelle device samples significantly less of the endometrial surface than the Vabra aspirator. Am J Obstet Gynecol 1993;168:55–9.
16. Guido RS, Kanbour-Shakir A, Rulin MC, Christopherson WA. Pipelle endometrial sampling: Sensitivity in the detection of endometrial cancer. J Reprod Med 1995;40:553–5.
17. Norman GR, Streiner DL. Screwups, oddballs, and other vagaries of science. Locating outliers, handling missing data, and transformations. In: Norman GR, Streiner DL, eds. Biostatistics: The bare essentials. St. Louis, MO: Mosby, 1994:202–10.
18. Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Ann Intern Med 1993;118:201–10.
19. Swets J. Measuring the accuracy of diagnostic systems. Science 1988;240:1285–93.
20. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating (ROC) curve. Radiology 1982; 143:29–36.
21. Irwig L, Macaskill P, Glasziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48:119–30.
22. Walter SD, Jadad AR. Meta-analysis of screening data: A survey of the literature. Stat Med 1999;18:3409–24.
23. Linnet K. A review on the methodology for assessing diagnostic tests. Clin Chem 1988;34:1379–86.
24. Tosteson ANA, Weinstein MC, Wittenberg J, Begg CB. ROC curve regression analysis: The use of ordinal regression models for diagnostic test assessment. Environ Health Perspec 1994;102(Suppl 8):73–8.
25. Mehta CR, Patel NR. Exact logistic regression: Theory and examples. Stat Med 1995;14:2143–60.
© 2002 The American College of Obstetricians and Gynecologists
26. Fryback DG, Thornbury JR. The efficacy of diagnostic imaging. Med Decis Making 1991;11:88–94.