In Neurology Today, we often report on the results of clinical trials that move the field of neurology forward, either by demonstrating a positive treatment effect or a negative treatment effect. Behind communicating the results of all reported trials is an attempt at interpreting the change in a primary outcome measure. Very little attention is often paid, however, to whether the change in the primary outcome is truly meaningful to patients. There are some instances where there is near unanimity of opinion that a change, even a small change, in an outcome is important — death, stroke, ability to ambulate. Unfortunately, many of the neurology outcomes in current use do not have the luxury of such unambiguous interpretation.
The reason behind this difficulty is the complex nature and manifestations of neurological disease. As a result, we often use rating scales to capture this diversity. Every neurological subspecialty has a growing portfolio of scales and they are increasingly used as outcome measure in clinical trials. The health domains captured by these scales vary and can include symptoms, functional status, impairments, and quality of life.
Two types of rating scales are in common use: single-item scales (for example, the modified Rankin scale, Kurtzke Expanded Disability Scale), and multi-item scales (such as the ALS Functional Rating Scale, Alzheimer's Disease Rating Scale–Cognitive). Single-item scales are straightforward and place subjects on a continuum of health. The scores they generate are relatively easy to interpret and communicate, although they may have problems with reliability and responsiveness.
Multi-item scales combine a set of items to produce a single value, and it is often the changes observed in these scales that leave patients, clinicians, researchers, and regulators scratching their heads.
What does a 10-point change on the 1–100 range Barthel Index actually mean to patient and families? Is a four-point change on the 1–40 range ALS Function Rating Scale meaningful to patients? Do patients notice a three-point change on the Epworth Sleepiness Scale?
The risks of not knowing are not trivial. First, important changes may exist but may not be detected or interpreted as important. Indeed, some have wondered about whether our spectacular failures in clinical trials across many of our conditions over the past decade have resulted from poorly developed clinical scales that lack adequate sensitivity (for example, type II errors).
Alternatively, and possibly of more relevance, are multi-item scales that capture weak signals that might be of statistical significance but of unclear or of limited clinical significance. With limited intuitive notion of what changes in scale scores actually mean there is the potential to exploit the ambiguous evidence-base toward one's self-interest.
We are all aware of the controversies and differences of opinions that surround acetylcholinesterase inhibitors in Alzheimer disease and dopamine agonists and monoamine oxidase inhibitors in Parkinson disease (PD), much of which can be attributed to clinical trial endpoint interpretation. The poignant title of a 2006 paper in the American Journal of Psychiatry—“Why olanzapine beats risperidone, risperidone beats quetiapine, and quetiapine beats olanzapine” — also speaks to the risks of not anchoring scales' scores onto something patients and families care deeply about.
This is why the discussion of the January Archives of Neurology paper by Lisa Shulman, MD, and colleagues published on page 4 of this issue is so important. It attempts to define the clinically important differences (CIDs) on the Unified Parkinson's Disease Rating Scale (UPDRS), a multi-item rating scale ranging from 0 to 144 points. For the past 20 years, it has been the most commonly used primary outcome measure in PD clinical trials. Using a variety of methods (anchor-based and distribution-based methods), the authors define ranges for mild, moderate, and large CIDs in the motor UPDRS and the total UPDRS.
For example, the STEP-UP trial comparing pramipexole to placebo in early PD, reported in the Journal of the American Medical Association (JAMA) in 1997, showed a five-point improvement in the total UPDRS favoring pramipexole. The results were interpreted as demonstrating sufficient efficacy and used as evidence of “direct clinical benefit” in obtaining FDA market approval. However, in a 2000 JAMA paper, the CALM-PD trial investigators compared pramipexole with levodopa, its chief competitor, and found a five-point improvement in favor of levodopa; the results were interpreted by some as not being clinically meaningful.
Dr. Shulman's data highlight that these changes are in the minimal range, making their interpretation possibly more vulnerable to the funding effect — when multiple interpretations exist, the tendency is to interpret the trial results in favor of the sponsor's interest.
In addition to Dr. Shulman's article, there are many hopeful signs that we are on the brink of a clinically meaningful endpoint revolution — just as we are in the midst of a biomarker revolution. In December 2009, the Food and Drug Administration issued its finalized guidance on the use of Patient Reported Outcome measures to support new drug applications and labeling claims in product development.
The NIH Toolbox initiative is utilizing state-of-the-art psychometric and technological approaches to develop brief yet comprehensive assessment tools for measuring motor, cognitive, sensory, and emotional function.
The NINDS funded Neuro-QOL [Quality of Life] project aims to develop clinically relevant and psychometrically robust health-related quality of life assessment tools for adults and children that will be responsive to the needs of researchers in a variety of neurological disorders. Even the UPDRS is undergoing extensive revisions to address many of its shortcomings.
These are all encouraging developments and many more are underway. But as the biomarker revolution provides us with an ever-increasing number of targets for early translational trials, we cannot forget the patient's voice and its proper incorporation into the design of our therapeutic programs.
Just as patient-centeredness and the “medical home” are becoming legitimate domains of health care quality in the clinical arena, so too must they become legitimate domains of measurement in the research arena. Full incorporation however, will require subtle and possibly unsettling shifts of power and culture from those who sponsor and conduct the research to those who participate in it. Clear verdicts to therapeutic achievements will not occur without it.
• Hobart JC, Cano SJ, Thompson AJ, et al. Rating scales as outcome measures for clinical trials in neurology. Lancet Neurol 2007;6:1094–105.
• Heres S, Davis J, Leucht S, et al.Why olanzapine beats risperidone, risperidone beats quetiapine, and quetiapine beats olanzapine: an exploratory analysis of head-to-head comparison studies of second-generation antipsychotics. Am J Psychiatry
• Shulman LM, Gruber-Baldini AL, Weiner WJ, et al. The clinically important difference on the unified Parkinson's disease rating scale. Arch Neurol
• Parkinson Study Group. Safety and efficacy of pramipexole in early Parkinson Disease: A Randomized dose-ranging study. JAMA
• Parkinson Study Group. Pramipexole vs. levodopa as initial treatment for Parkinson's disease: A randomized controlled trial (CALM-PD). JAMA 2000;284(15):1931–1938/
• Guidance for Industry. Patient-reported outcome measures: Use in medical product development to support labeling claims. December 2009. http://bit.ly/7ROMCu
• Gershon RC, Cella D, Wagster MV, et al. Assessment of neurological and behavioural function: the NIH Toolbox. Lancet Neurol 2010;9;138–139
• Goetz CG, Tilley BC, LaPelle N, et al.Movement Disorder Society UPDRS Revision Task Force. Movement Disorder Society-sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS): scale presentation and clinimetric testing results. Mov Disord