Measuring Surgical Outcomes in Subaxial Degenerative Cervical Spine Disease Patients: Minimum Clinically Important Difference as a Tool for Determining Meaningful Clinical Improvement
Auffinger, Brenda MD*; Lam, Sandi MD, MBA‡; Shen, Jingjing MD*; Roitberg, Ben Z. MD*
*The University of Chicago, Section of Neurosurgery, Chicago, Illinois;
‡Baylor College of Medicine, Texas Children's Hospital, Houston, Texas
Correspondence: Brenda Auffinger, MD, The University of Chicago, Section of Neurosurgery, 5841 S Maryland Ave, MC3026, J301, Chicago, IL 60637. E-mail: firstname.lastname@example.org
Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal's Web site (www.neurosurgery-online.com).
Received February 24, 2013
Accepted October 31, 2013
BACKGROUND: Although the concept of minimum clinically important difference (MCID) as a measurement of surgical outcome has been extensively studied, there is lack of consensus on the most valid or clinically relevant MCID calculation approach.
OBJECTIVE: To compare the range of MCID threshold values obtained by different anchor-based and distribution-based approaches to determine the best clinically meaningful and statistically significant MCID for our studied group.
METHODS: Eighty-eight consecutive patients undergoing surgery for subaxial degenerative cervical spine disease were analyzed from a prospective blinded database. Preoperative, 3-, and 6-month postoperative patient reported outcome (PRO) scores and blinded surgeon ratings were collected. Four calculation methods were used to calculate MCID threshold values: average change, change difference, minimum detectable change, and receiver operating characteristic (ROC) curve. Three anchors were used to evaluate meaningful improvement postsurgery: health transition item, patient overall status, and surgeon ratings.
RESULTS: On average, all patients had a statistically significant improvement (P < .001) postoperatively for neck disability index (score 27.42 preoperatively to 19.42 postoperatively), physical component of the Short Form of the Medical Outcomes Study (SF-36) (33.02-42.23), mental component of the SF-36 (44-50.74), and visual analog scale (2.85-1.93). The 4 MCID approaches yielded a range of values for each PRO: 2.23 to 16.59 for physical component of the SF-36, 0.11 to 16.27 for mental component of the SF-36, and 2.72 to 12.08 for neck disability index. In comparison with health transition item and patient overall status anchors, the area under the ROC curve was consistently greater for surgeon ratings for all 4 PROs.
CONCLUSION: Minimum detectable change together with surgeon ratings anchor appears to be the most appropriate MCID method. Based on our findings, this combination offers the greatest area under the ROC curve (threshold above the 95% confidence interval). The choice of the anchor did not significantly affect this result.
ABBREVIATIONS: AUC, area under the ROC curve
HTI, health transition item
PRO, patient-reported outcome questionnaire
MCID, minimum clinically important difference
MCS, mental component summary of the Short Form of the Medical Outcomes Study
MDC, minimum detectable change
NDI, Neck Disability Index
PCS, physical component summary of the Short Form of the Medical Outcomes Study
ROC, receiver operating characteristic; SF-36, Short Form of the Medical Outcomes Study
VAS, Visual Analog Scale
Patient-reported outcome questionnaires (PROs) are commonly used in both clinical and research practices to evaluate patient improvement after a specific therapeutic intervention. They are considered a reliable measure of patient's perceived health status and an important indicator of treatment effectiveness. Four important PROs frequently used in cervical spine surgery studies are Visual Analog Scale (VAS),1 Neck Disability Index (NDI),2 and physical (PCS) and mental (MCS) component summaries of the Short Form of the Medical Outcomes Study (SF-36).3 Although PROs are generated by and representative of the patient's perception, the significance of the numerical changes given by these surveys is not intuitively apparent. The appropriate interpretation of these changes is debated. The statistical significance of changes in PRO scores may not necessarily correlate with clinical relevance. The concept of minimum clinically important difference (MCID) endeavors to address this issue.4 It stands for the smallest change in patient's self-reported scores that represents a meaningful therapeutic efficacy.5,6 Therapeutic outcomes that reach such threshold values are believed to impart clinical significance. This translation to clinical significance makes the concept attractive to apply in clinical research practices.
Anchor-based and distribution-based approaches are commonly used for calculation of MCID threshold values. While anchor-based methods compare the change in PRO scores with a different measure of change (an external criterion or anchor), which can be the improvement post-therapy or the patient's satisfaction with the intervention, distribution-based approaches compare the change in PRO scores with selected variability measures. To date, many PRO questionnaires and different anchors have been used for MCID calculation. A major limitation for anchor-based and distribution-based approaches is that diverse calculation methods yield different MCID values.7 There is thus no agreement on the optimal MCID calculation method, and no definite MCID threshold value has been established for PRO surveys evaluating patients who undergo cervical spine surgery.
Previous studies have evaluated and compared different MCID calculation techniques for commonly used PROs, such as VAS, Oswestry Disability Index, and SF-36 in mixed8-10 or homogeneous11 patient populations. They have historically used a subjective patient assessment as an anchor for MCID calculation instead of an objective external criterion. An external assessment would not be subject to recall bias and would not be influenced by other patient-specific comorbidities that are not related to the therapeutic intervention under evaluation. So far, just 2 reports have compared multiple anchor-based and distribution-based approaches for patients undergoing cervical spine surgery.12,13 Here, we compare different anchor-based and distribution-based approaches for MCID calculation in patients who have degenerative cervical spine disease, and we introduce a new external and more objective anchor, “surgeon ratings.” Our goal is to establish a threshold value in which a better association between clinical relevance and statistical significance can be met (see list of the terms used with definitions in Supplemental Digital Content 1, http://links.lww.com/NEU/A598).
MATERIALS AND METHODS
Our prospectively collected spine surgery registry was retrospectively examined. Here, we focus on a subset of patients who had surgery at C3 to C7 levels. Eighty-eight consecutive adult patients undergoing surgery for degenerative cervical spine disease at our institution from August 2009 to January 2012 were examined for this study from the prospective spine outcome database. All 88 patients had complete preoperative, 3-month, and 6-month follow-up SF-36, NDI, VAS, patient overall status, and surgeon ratings scores. The inclusion criteria were magnetic resonance imaging (MRI) confirmation of C3 to C7 degenerative cervical spine disease and age over 18 years. Patients with trauma, infection, or intracranial tumors, peripheral nerve disease as a cause of symptoms, and involvement in litigation were excluded from enrollment in the spine outcomes database. Indications for surgery in this sample were the following disease processes with corroborative imaging findings: (1) cervical radiculopathy—symptoms persistent for more than 6 weeks despite conservative management, progressive symptoms, or progressive neurological deficit regardless of symptom duration; (2) cervical myelopathy—symptomatic regardless of duration; or (3) cervical instability. All questionnaires were completed by the patients either at the doctor's office or at home and returned by mail. Surgeons completed the surgeon ratings questionnaire after the patient visit. Here, the 3- and 6-month scores are reported. Because this was a blinded study, surgeons or patients did not have access to each other's survey results. Two neurosurgeons participated in this study. In all cases, early mobilization and return to work were actively encouraged. Institutional review board approval was received from the University of Chicago Institutional Review Board.
Four PRO questionnaires were filled out by patients preoperatively and at 3 and 6 months postsurgery: NDI,2 PCS, and MCS from SF-36,3 and VAS for neck pain.14 Investigators not clinically involved with the patients assessed patient outcomes questionnaires. In addition, we define “change scores” as the difference between scores at 6 months postoperative follow-up and scores at baseline. The NDI is a 10-item patient survey that quantifies disability in patients who are experiencing neck pain. It has a maximum score of 50, and every item is scored from 0 to 5. The lower the score, the lower the patient debility.15,16 The SF-36 is a 36-item health questionnaire. Based on the reported values, 2 main scores can be calculated: physical component summary (PCS) and mental component summary (MCS). The SF-36 primarily evaluates patient's social and physical function, general health, vitality, and body pain. VAS relies on a self-assessment numerical scale that ranges from 0 to 10 for pain.17 Zero means no pain, whereas 10 means intolerable pain. Decreasing scores for NDI and VAS, and increasing values for the PCS and MCS components of the SF-36 imply patient better functional status.
This study used 2 previously reported anchors (health transition item [HTI] and patient overall status)8,11,18 and a new independent anchor (surgeon ratings) for derivations of MCID. The HTI anchor is from the health transition item of the SF-36, which refers to how the patient feels at the time of the questionnaire compared with 1 year before. This is considered an appropriate independent anchor because is it not used in the scoring of MCS or PCS of the SF-36.
The surgeon ratings and patient overall status anchors were based on a 7-point Likert scale in which the attending surgeon and patient evaluate patient improvement following surgery. Both range from 1 to 7, where 1 means “very much improved,” 2 means “much improved,” 3 means “minimally improved,” 4 means “no change,” 5 means “minimally worse,” 6 means “much worse,” and 7 means “very much worse.” Surgeon ratings are suitable as an anchor because, although associated with patient perception of surgical outcomes, it is an independent and objective rating not used in other PRO calculations. In this study, the 6-month postsurgery values (HTI, surgeon ratings, and patient overall status) were the ones used as anchors for the MCID calculation.
Anchor-Based and Distribution-Based Approaches
For an accurate MCID assessment, we chose 1 well-described distribution-based approach and 3 previously reported anchor-based approaches: minimum detectable change (MDC), mean change, change difference, and receiver operating characteristic (ROC) curve. “Mean change” stands for an MCID value that correlates with the average change in the patient cohort that exhibits small PRO variations. In this approach, the selection of groups of patients in different scales for MCID calculation is subjective. It depends on the number of levels in the original scale.19
The “change difference” MCID approach aims to compare PRO score changes between 2 adjacent levels of a given scale.9 In our case, it compares the difference in change scores of the patients that feel “minimally improved” and “minimally worse” for all 3 anchors that are used in our MCID calculation. MDC is the smallest value that is above the measurement error within a 95% confidence interval (CI). It uses the standard error of measurement for the calculation of an MCID with a 95% CI.20,21
The ROC curve is a sensitivity- and specificity-based approach for calculation of MCID. When applied to PROs and used in conjunction with MCID, a sensitivity of 1 means that all true positive values have been identified (patient reports an improvement and MCID is above the therapeutic threshold). The inverse applies for a specificity value of 1.22,23 The ROC curve ideally identifies the threshold for a PRO score while keeping the greatest sensitivity and specificity. The area under the ROC curve represents the probability that a PRO score will discriminate between improved and unimproved patients. The probability values range between 0.5 (probability of discrimination is the same as a coin toss) and 1 (accurately discriminates all patients).7
All statistical analyses were performed in Prism 5 for Mac OS X version 5.0c (Graphpad Software Inc, La Jolla, California) and STATA 11.1 (StataCorp, College Station, Texas). Spearman correlation coefficients were calculated to assess any possible correlation between PRO baseline scores and change scores, and to assess the relationship between anchors and PROs. Analysis of variance between groups was used to compare preoperative, 3-month, and 6-month postoperative scores. We used one-way analysis of variance with Bonferroni post hoc tests to compare change in outcome scores between groups. The difference between ROC curve areas was determined by using described metholodogy.24 Values with P < .05 were considered statistically significant.
Eighty-eight consecutive patients undergoing surgery for subaxial degenerative cervical spine disease were studied from our prospective blinded database. Preoperative, 3-, and 6-month postoperative PRO scores were collected from each patient. Of the 88 patients, 88 had responses to PRO questionnaires at baseline, 86 (97.72%) at 3 months after surgery, and 83 (94.31%) at 6 months after surgery. Surgeon ratings questionnaires were filled out on all patients. Mean age of patients at baseline was 56.57 ± 12.09 years. Forty-six patients (51.59%) were female, and the mean body mass index (BMI) was 27.92 ± 7.9. Twenty-one percent of the patients were either current or previous smokers, whereas 49.46% had a history of alcohol consumption. Thirty-nine patients (43.61%) were employed at baseline (Table 1).
Three complications, including 2 reoperations, occurred within 30 days of the index surgery. One patient developed cerebrospinal fluid (CSF) leak with concomitant meningitis, treated with reoperation and antibiotics. One patient had cerebral ischemia treated with anticoagulation. The last patient underwent surgery for symptomatic adjacent level disease. The mean baseline, 3-month, and 6-month postoperative PRO scores and change in PRO scores for NDI, VAS, PCS, and MCS of the SF-36 survey are described in Table 2. All patients showed significant improvement of PROs at both 3 and 6 months after surgery (P < 0.001). Such improvement varied with time and was greater between baseline and 6 months (mean change for NDI, VAS, PCS, and MCS values was, respectively, −8.00 ± 9.51, −0.92 ± 1.17, 9.21 ± 16.59, and 6.74 ± 15.07 [P < .001]). The mean change between baseline and 3 months for NDI, VAS, PCS, and MCS scores was, respectively, −7.82 ± 9.42, −0.91 ± 1.06, 7.23 ± 16.59, and 7.94 ± 15.18 (P < .001).
We found strong correlations between baseline PRO scores and the percentage of mean change values for the same PRO (Figure 1). There was a negative correlation between NDI baseline and NDI change scores (R, −0.49; P < .001) (Figure 1A), VAS baseline and VAS change scores (R, −0.66; P < .001) (Figure 1B), a positive correlation between PCS baseline and PCS change scores (R, 0.32; P < .001) (Figure 1C), and a negative correlation between MCS baseline and MCS change scores (R, −0.56; P < .001) (Figure 1D). Such high correlation implies that patients with the lowest functional status in preoperative scores, such as greater neck disability, higher pain, and less mobility, were the ones that demonstrated the greatest incremental improvement after surgery. In keeping with our results, Wang et al25 have also demonstrated the effect of baseline scores on patient improvement following a given treatment.
With different anchor-based and distribution-based approaches (mean change, change difference, or MDC), the MCID threshold value varied widely for all patient-reported outcome measures (PCS, MCS, NDI, and VAS) (Table 3). It ranged from 18.18 to 3.36 for PCS, from 5.12 to 0.11 for MCS, from 12.08 to 2.14 for NDI, and from 1.49 to 0.26 for VAS. In comparison with the other 2 approaches, MDC appeared to be the most appropriate method for MCID calculation. First, among the 3 calculation approaches, MDC was the one that was not significantly affected by the choice of anchor, and gave the smallest variation between all PROs. MDC varied from 1.49 to 5.68 between all PROs for the HTI anchor, whereas the “change difference” method varied from 1.49 to 18.18 and the “mean change” technique varied from 1.23 to 16.59. Second, the MDC method generated a threshold of therapeutic improvement that was statistically greater than chance error from unimproved patients (>95% CI). Last, the MDC method presented the most consistent scores between improved and unimproved patients.
Among the 3 anchors evaluated, “surgeon ratings” consistently yielded the biggest area under the ROC curve (AUC). Figure 2 shows ROC curve plots with comparisons of different anchors for the same PRO. It highlights the most statistically significant and clinically meaningful anchor for each PRO measure. The purple line represents the 0.5 “coin toss” threshold. Therapeutic values placed below this cut point are believed to offer a probability of discrimination between improved and unimproved patients no better than pure chance. In Figure 2A for NDI PRO, the AUC varied between 0.66 for HTI anchor (lowest) (P = .02; CI, 0.63-0.68) and 0.79 for “surgeon ratings” (highest) (P = .001; CI, 0.76-0.82). There was a statistically significant difference between the ROC curve plot for the “surgeon ratings” anchor and the other 2 plots (patient overall status and HTI anchors) (P < .001). In Figure 2B for VAS PRO, the AUC varied between 0.69 for HTI anchor (P = .008; CI, 0.67-0.71) and 0.73 for “surgeon ratings” (P = .01; CI, 0.70-0.76). We found a statistically significant difference between the 3 curves compared here (P < .001). In Figure 2C for PCS PRO, the AUC was 0.64 for “patient overall status” anchor (P = .10; CI, 0.60-0.67), 0.69 for HTI anchor (P = .01; CI, 0.66-0.71), and 0.80 for “surgeon ratings” anchor (P = .001; CI, 0.77-0.82). All the above-described curves presented a statistical significance in comparison with each other (P < .001). Finally, in Figure 2D for MCS PRO, the AUC varied between 0.67 for “patient overall status” anchor (P = .04; CI, 0.63-0.70) and 0.71 for “surgeon ratings” (P = .02, CI, 0.68-0.74). There was a statistically significant difference between the ROC curve plot for “patient overall status” and the other 2 curves (surgeon ratings and HTI anchors) (P = .006 and P = .01, respectively). In all analyses, the surgeon ratings anchor constantly gave us the greatest AUC. This finding suggests that surgeon ratings may be the most appropriate anchor for MCID calculation in our patient data set.
On average, all patients in our study achieved the desired MCID threshold value (Figure 3). This result suggests that all patients treated with subaxial cervical spine surgery presented a clinically meaningful improvement based on an objective external anchor such as surgeon ratings (Figure 3) as well as a subjective external criterion such as HTI or patient overall status for an optimal MCID calculation. The MCID threshold values for NDI, VAS, PCS, and MCS using surgeon ratings as an anchor were, respectively, 25.01, 2.45, 38.62, and 49.12.
The concept of MCID was developed to bring clinical relevance to measures of therapeutic outcome, which used to only focus on statistical significance. The optimal strategy is to combine both clinical relevance and statistical significance as a measure of therapeutic improvement.26,27 The motivation for this study was to evaluate MCID calculation approaches with the goal of identifying the most clinically meaningful and statistically significant MCID value for different PRO measures in cervical spine surgery. Each of the anchor-based and distribution-based approaches used in this study generated a range of MCID threshold values. As expected, lower MCID cut points would place the evaluated therapeutic intervention in a favorable light, because more patients' outcomes would cross the desired threshold. Conversely, artificially high MCID values may not appropriately recognize the importance of treatment. Other studies have evaluated different anchor-based and distribution-based approaches, such as mean change,6,8 average change,8 MDC,6,8,13 sensitivity, and specificity-based approaches (ROC curves).8,13 However, an optimal MCID threshold value or best MCID calculation method for a specific surgery or patient population has not yet been established.
Each MCID value depends on the characteristics of the population under study, the sample size, the calculation method, and the external anchor used in the calculation. In our report, we studied patients with degenerative cervical spine disease who underwent surgery on C3 to C7 levels. Analysis was performed in 2 follow-up times with over 95% follow-up. In addition, all PRO questionnaires used in this study are validated and used in both clinical and research practices. In our findings, MDC was the most optimal statistically significant and clinically relevant MCID calculation approach. It was consistently greater than measurement error (allowing for reliable interpretation of true change in treatment effectiveness), and it corresponded well to the patient perception of therapeutic improvement. For similar reasons, additional reports have also identified MDC as the most reliable MCID calculation method compared with other approaches.6,8,11,17 Our MDC values also correlated with other previously described MCID thresholds.9,13
The lack of an objective external anchor, which is common for all MCID reports present in the literature, has shown to be a drawback for achieving the most representative MCID calculation method.28,29 Subjective external anchors, the current mainstay for MCID computation, use a single-item self-report, such as the health transition item of the SF-36, to evaluate patient's overall improvement in PRO scores. Their unrestricted use for MCID calculation becomes statistically problematic because it uses a self-report (subjective anchor) to validate another self-report (improvement in PRO scores) on the same construct. It should be highlighted, however, that the dependence on self-reports was stimulated by the absence of a reliable objective external criterion as a base for MCID computation.30,31 Behavioral measures, such as health care use, medications, and return to work, have been tested as possible objective external anchor-based approaches.32,33 None of these efforts so far have been widely adopted. As a consequence, more efforts are being applied for the development of objective external criteria that could be used in the evaluation of a specific treatment. This is the problem that the rationale introduced in the present study tries to overcome.
Based on a Medline review of the literature, this appears to be the first report to compare different anchor-based and distribution-based approaches as a measure of meaningful therapeutic efficacy in patients undergoing surgery for degenerative cervical spine disease, while introducing a new objective anchor for MCID calculation. We introduce an external and arguably more objective criterion, surgeon ratings, as a novel anchor for this surgical population. Surgeon ratings was compared with other 2 well-established subjective anchors (HTI and patient overall status). In comparison with the other 2 anchors, surgeon ratings consistently gave us the greatest AUC for different PRO surveys. Our results suggest that, among the analyzed methods, surgeon ratings may be the best anchor for MCID calculation.
Other reports have partially addressed these topics. Carreon et al13 evaluated different MCID calculation methods for patients undergoing cervical spine surgery, but, in contrast to our methodology, they used HTI as a single anchor. Stratford et al34 also introduced physician assessment as an anchor for MCID calculation in 48 patients with neck pain, but did not focus on individuals undergoing surgery for degenerative cervical spine disorders. In addition, the patients analyzed in our report present similar averages of PRO scores pre- and post-treatment in comparison with other historical literature data that evaluate patients undergoing cervical spine surgery with a variety of diagnoses, including fusion for spondylosis, herniated disk, and stenosis.12,13,34,35 The use of surgeon ratings as a method to evaluate surgical outcomes is a concept that has been studied for almost 2 decades. Unlike PROs, it encompasses both subjective and clinically objective measures. Therefore, it may be considered an objective external anchor for MCID calculation. In comparison with different PRO surveys, surgeon ratings has shown to correlate with patient satisfaction after surgical intervention.36-44 Epstein et al36 compared surgeon assessment scales with patient's self-reported questionnaires derived from the SF-36 in patients treated with far lateral lumbar disc surgery. They found that surgeon ratings and PROs had statistically significant correlation and proposed that surgeon ratings could be reliable indicators of patient outcomes. Lattig et al37 and Porchet et al38 have also shown that surgeons' perception of outcome have exactly matched those of patients in half of the cases, and approximately 90% of the evaluated ratings were ±1 grade of each other. We found similar results in our patient population.44 Taken together, these results suggest that surgeon ratings can be considered a valid external criterion for MCID calculation that correlates with patient perceptions of therapeutic outcome. Surgeon ratings achieves a more consistent and trustworthy MCID threshold value.
MCID threshold values were highly variable depending on the calculation method. According to our results, the MDC approach was shown to be the optimal MCID calculation approach. MDC values were not affected by the choice of anchor and their threshold of improvement was statistically greater than the chance of error from unimproved patients. Taking into account the wide range of values for MCID calculation obtained from the comparison of different approaches, MDC together with the surgeon ratings anchor appears to be the most appropriate MCID method. This combination offers the greatest AUC (threshold above the 95% CI), and the choice of the anchor did not significantly affect this result.
Funding was received from the Spine Education and Research Fund at The University of Chicago. The authors have no personal financial, or institutional interest in any of the drugs, materials, or devices described in this article.
1. Gallagher EJ, Liebman M, Bijur PE. Prospective validation of clinically important changes in pain severity measured on a visual analog scale. Ann Emerg Med. 2001;38(6):633–638.
2. Vernon H, Mior S. The neck disability index: a study of reliability and validity. J Manipulative Physiol Ther. 1991;14(7):409–415.
3. Ware JE Jr. SF-36 health survey update. Spine (Phila Pa 1976). 2000;25(24):3130–3139.
4. Stratford PW, Binkley JM, Riddle DL, Guyatt GH. Sensitivity to change of the Roland-Morris back pain questionnaire: part 1. Phys Ther. 1998;78(11):1186–1196.
5. Copay AG, Subach BR, Glassman SD, Polly DW Jr, Schuler TC. Understanding the minimum clinically important difference: a review of concepts and methods. Spine J. 2007;7(5):541–546.
6. Parker SL, Mendenhall SK, Shau D, et al.. Determination of minimum clinically important difference in pain, disability, and quality of life after extension of fusion for adjacent-segment disease. J Neurosurg Spine. 2012;16(1):61–67.
7. van der Roer N, Ostelo RW, Bekkering GE, van Tulder MW, de Vet HC. Minimal clinically important change for pain intensity, functional status, and general health status in patients with nonspecific low back pain. Spine (Phila Pa 1976). 2006;31(5):578–582.
8. Copay AG, Glassman SD, Subach BR, Berven S, Schuler TC, Carreon LY. Minimum clinically important difference in lumbar spine surgery patients: a choice of methods using the oswestry disability index, medical outcomes study questionnaire short form 36, and pain scales. Spine J. 2008;8(6):968–974.
9. Hägg O, Fritzell P, Nordwall A. The clinical importance of changes in outcome scores after treatment for chronic low back pain. Eur Spine J. 2003;12(1):12–20.
10. Wyrwich KW, Nienaber NA, Tierney WM, Wolinsky FD. Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Med Care. 1999;37(5):469–478.
11. Parker SL, Mendenhall SK, Shau DN, et al.. Minimum clinically important difference in pain, disability, and quality of life after neural decompression and fusion for same-level recurrent lumbar stenosis: understanding clinical versus statistical significance. J Neurosurg Spine. 2012;16(5):471–478.
12. Parker SL, Godil SS, Shau DN, Mendenhall SK, McGirt MJ. Assessment of the minimum clinically important difference in pain, disability, and quality of life after anterior cervical discectomy and fusion: clinical article. J Neurosurg Spine. 2013;18(2):154–160.
13. Carreon LY, Glassman SD, Campbell MJ, Anderson PA. Neck disability index, short form-36 physical component summary, and pain scales for neck and arm pain: the minimum clinically important difference and substantial clinical benefit after cervical spine fusion. Spine J. 2010;10(6):469–474.
14. Jensen MP, Turner JA, Romano JM. Correlates of improvement in multidisciplinary treatment of chronic pain. J Consult Clin Psychol. 1994;62(1):172–179.
15. McCarthy MJ, Grevitt MP, Silcocks P, Hobbs G. The reliability of the Vernon and Mior neck disability index, and its validity compared with the short form-36 health survey questionnaire. Eur Spine J. 2007;16(12):2111–2117.
16. Vernon H. The neck disability index: state-of-the-art, 1991-2008. J Manipulative Physiol Ther. 2008;31(7):491–502.
17. Parker SL, Adogwa O, Paul AR, et al.. Utility of minimum clinically important difference in assessing pain, disability, and health state after transforaminal lumbar interbody fusion for degenerative lumbar spondylolisthesis. J Neurosurg Spine. 2011;14(5):598–604.
18. Redelmeier DA, Guyatt GH, Goldstein RS. Assessing the minimal important difference in symptoms: a comparison of two techniques. J Clin Epidemiol. 1996;49(11):1215–1219.
19. Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal important change in a disease-specific quality of life questionnaire. J Clin Epidemiol. 1994;47(1):81–87.
20. Beaton DE, Bombardier C, Katz JN, et al.. Looking for important change/differences in studies of responsiveness. OMERACT MCID working group. Outcome measures in Rheumatology. Minimal clinically important difference. J Rheumatol. 2001;28(2):400–405.
21. Wells G, Beaton D, Shea B, et al.. Minimal clinically important differences: review of methods. J Rheumatol. 2001;28(2):406–412.
22. Jaeschke R, Guyatt GH, Sackett DL. Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The evidence-based medicine working group. JAMA. 1994;271(9):703–707.
23. Riddle DL, Stratford PW, Binkley JM. Sensitivity to change of the Roland-Morris back pain questionnaire: part 2. Phys Ther. 1998;78(11):1197–1207.
24. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839–843.
25. Wang YC, Hart DL, Stratford PW, Mioduski JE. Baseline dependency of minimal clinically important improvement. Phys Ther. 2011;91(5):675–688.
26. Cook CE. Clinimetrics corner: the minimal clinically important change score (MCID): a necessary pretense. J Man Manip Ther. 2008;16(4):E82–E83.
27. Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10(4):407–415.
28. Gatchel RJ, Lurie JD, Mayer TG. Minimal clinically important difference. Spine (Phila Pa 1976). 2010;35(19):1739–1743.
29. Gatchel RJ, Mayer TG. Testing minimal clinically important difference: additional comments and scientific reality testing. Spine J. 2010;10(4):330–332.
30. Theodore BR. Methodological problems associated with the present conceptualization of the minimum clinically important difference and substantial clinical benefit. Spine J. 2010;10(6):507–509.
31. Copay AG. Commentary: the proliferation of minimum clinically important differences. Spine J. 2012;12(12):1129–1131.
32. Gatchel RJ, Mayer TG. Testing minimal clinically important difference: consensus or conundrum? Spine J. 2010;10(4):321–327.
33. Wilson HD, Mayer TG, Gatchel RJ. The lack of association between changes in functional outcomes and work retention in a chronic disabling occupational spinal disorder population: implications for the minimum clinical important difference. Spine (Phila Pa 1976). 2011;36(6):474–480.
34. Stratford PR, Riddle DL, Binkley JM, et al.. Using the neck disability index to make decisions concerning individual patients. Physiother Can. 1999;51(2):107–112.
35. Auffinger BM, Lall RR, Dahdaleh NS, et al.. Measuring surgical outcomes in cervical spondylotic myelopathy patients undergoing anterior cervical discectomy and fusion: assessment of minimum clinically important difference. PLoS One. 2013;8(6):e67408.
36. Epstein NE, Hood DC. A comparison of surgeon's assessment to patient's self analysis (short form 36) after far lateral lumbar disc surgery. An outcome study. Spine (Phila Pa 1976). 1997;22(20):2422–2428.
37. Lattig F, Grob D, Kleinstueck FS, et al.. Ratings of global outcome at the first post-operative assessment after spinal surgery: how often do the surgeon and patient agree? Eur Spine J. 2009;18(suppl 3):386–394.
38. Porchet F, Lattig F, Grob D, et al.. Comparison of patient and surgeon ratings of outcome 12 months after spine surgery: presented at the 2009 Joint Spine Section Meeting. J Neurosurg Spine. 2010;12(5):447–455.
39. McGrory BJ, Morrey BF, Rand JA, Ilstrup DM. Correlation of patient questionnaire responses and physician history in grading clinical outcome following hip and knee arthroplasty. A prospective study of 201 joint arthroplasties. J Arthroplasty. 1996;11(1):47–57.
40. McGee MA, Howie DW, Ryan P, Moss JR, Holubowycz OT. Comparison of patient and doctor responses to a total hip arthroplasty clinical evaluation questionnaire. J Bone Joint Surg Am. 2002;84-A(10):1745–1752.
41. Brokelman RB, van Loon CJ, Rijnberg WJ. Patient versus surgeon satisfaction after total hip arthroplasty. J Bone Joint Surg Br. 2003;85(4):495–498.
42. Ragab AA. Validity of self-assessment outcome questionnaires: patient-physician discrepancy in outcome interpretation. Biomed Sci Instrum. 2003;39:579–584.
43. Smith AM, Barnes SA, Sperling JW, Farrell CM, Cummings JD, Cofield RH. Patient and physician-assessed shoulder function after arthroplasty. J Bone Joint Surg Am. 2006;88(3):508–513.
44. Roitberg BZ, Thaci B, Auffinger B, et al.. Comparison between patient and surgeon perception of degenerative spine disease outcomes—a prospective blinded database study. Acta Neurochir (Wien). 2013;155(5):757–764.
The authors have provided a detailed analysis of the outcome data after cervical surgery for degenerative disease to search for a scientifically valid MCID (minimal clinically important difference) to assess surgical outcome. They found that using only the patient-reported data was insufficient, and that their new anchor of “surgeon ratings” provided the best results. Although this would appear to contradict the common sense doctrine that the patient's viewpoint of postoperative function is always the best, this study provides scientific evidence that the patient's views, reports, and scales can be made more diagnostic by including the pre- and postoperative clinical analysis, radiologic analysis, and change in function as assessed by the surgeon. In simplistic terms, the surgeon can, with the “surgeon ratings,” correct a low patient self-assessment score in patients who have actually undergone a significant improvement in function postoperatively, but who assess these pain and function ratings by the residual rather than positive changes from preoperative values. This new anchor scale also corrects for patients who use as a baseline a clinical state predating the surgery by months or years, or who have unrealistic surgical goals.
This study provides scientific evidence that the assessment of surgical outcome is a complex subject that requires scientifically validated scales and a continuing search for optimal methodology. If neurosurgeons as a surgical specialty do not collectively, via our literature, develop scientifically validated scales for all of our surgeries, then insurance companies or government could define them for us, most likely using administrative protocols instead of developing validated scales. It is my hope that this article will stimulate others to investigate their surgical outcome databases with reference to the ROC (receiver operating characteristic curve) and investigate other anchors.
Fred H. Geisler
Anchor-based; Anchors; Degenerative cervical spine disease; Distribution-based approaches; Minimum clinically important difference; Patient-reported outcomes
Figure. No available...Image Tools
Supplemental Digital Content
Copyright © by the Congress of Neurological Surgeons
Highlight selected keywords in the article text.