The distinction between typical acute demyelinating optic neuritis (ON), as may occur during the course of multiple sclerosis (MS), and ON in the setting of neuromyelitis optica (NMO) is a challenge frequently faced by neuro-ophthalmologists. The separation of these 2 forms of ON is important based on the relatively poor visual prognosis associated with NMO and the differences in how these 2 conditions are treated (1–4). Patients with NMO often require immunosuppressive therapy and interferon-beta may have deleterious effects on the relapse rate in NMO patients (1,2). Although the aquaporin-4 antibody continues to be evaluated as an important diagnostic NMO biomarker in both clinically definite and limited forms (NMO spectrum disorders) (5,6), there is increasing attention to the structure and function of the anterior visual pathway as a potential system for helping to distinguish patients with NMO vs MS.
Optical coherence tomography (OCT) has emerged as a powerful tool to provide structural information about the retina and optic nerve. Thus, it is logical to ask whether OCT may be able to distinguish unilateral ON associated NMO from that typically observed with MS. This question was one of many addressed by 2 independent reports (7,8) in this issue of the Journal of Neuro-Ophthalmology. As such, both articles compare peripapillary retinal nerve fiber layer (RNFL) thickness, measured by spectral-domain optical coherence tomography, in eyes with a history of ON for patients with NMO vs relapsing-remitting multiple sclerosis (RRMS). Consistent with findings in the literature from previous investigations of time-domain optical coherence tomography (9), eyes with a history of ON (>6 months before OCT) in both of these cross-sectional studies demonstrated thinner RNFL in patients with NMO compared with RRMS (Table 1).
Despite the similar imaging findings, the 2 studies demonstrated different results in terms of the magnitudes of the differences between NMO and RRMS eyes (effect sizes). The reader, therefore, may be left with some questions given the different statistical outcome of the 2 studies:
- Does peripapillary RNFL thickness by OCT have a clinical role in distinguishing eyes of patients with NMO vs RRMS in the months following acute ON?
- What factors in research design, case definition, and statistical analysis may have contributed to the observed differences (and similarities) in the findings between the 2 reports, and how can the reader apply what we learn here to future studies in which 2 investigations by established research groups yield somewhat different results?
- What do statistical tests do (not) for us, anyway? After all, if statistics are only a tool to help us determine the role of chance in our results, then why so much emphasis on the P-value? What about clinical meaningfulness … or potential bias?
Systematic comparison of the 2 studies with regard to methodologies actually shows that they are more similar than different with regard to the underlying results, and with respect to the take-home message on OCT studies of ON from this issue of the Journal. Several methodologic aspects of these studies are important to highlight here, particularly given their broader applicability to neuro-ophthalmologic studies in general:
- One eye or 2? In the case of unilateral ON, the affected eye is usually the one of primary interest. However, to the extent that the fellow eye may help to estimate a “baseline” RNFL thickness in patients with ON as a first demyelinating event, or could demonstrate evidence of subclinical RNFL axonal loss in MS cohorts, investigators often choose to include both eyes of each patient. Including both eyes, if appropriate for the research question, increases statistical power and allows for data from affected (history of ON) eyes and those without ON history. Both the study by Lange et al (7) and that by Bichuetti et al (8) compared affected eyes primarily using pairwise comparisons of NMO vs RRMS groups. Lange et al (7) also calculated intereye differences and found these to be the most significant discriminator of NMO vs RRMS eyes. Although these authors used statistical models to examine potential associations of RNFL thinning and clinical features, such as worse Expanded Disability Status Scale (EDSS) score, other statistical techniques, such as linear mixed effects or generalized estimating equation (GEE) models, could also be used to determine whether NMO vs RRMS status could be associated with RNFL thinning, accounting for within-patient intereye correlations and potentially including both eyes in patients with bilateral ON. The advantage of using a GEE regression approach is that if the eyes of patients are very different, then the models will adjust the variances of the observations to reflect the intercorrelation of the eyes. Such models will yield results similar to simple linear regression. If, however, the eyes of patients tend to be more similar than different (i.e., intercorrelated), then the variances will be adjusted so that the levels of significance of associations between variables reflect these similarities of the eyes within patients. The use of models that include this approach is therefore a win–win for study designs and analyses that include both eyes of each patient—and including both eyes when possible is a double win–win from a generalizability standpoint.
- Statistical tests: how do we choose? The choice of statistical tests is based on characteristics of the study design, variables, and outcome measures (10). Since the study design depends, by definition, on the research question, it is at this phase of the research process that the types of statistical tests should be first considered. Methods for descriptive statistics (calculation of summary measures, such as mean) and hypothesis testing (comparison of groups or measurement of associations) are both dependent upon the distributions of observations as judged by the investigator (do the data fit a normal distribution or “bell” curve, for example?). When continuous data fit assumptions for normality, including absence of skewness and relatively large sample size, then summary measures of mean and standard deviation and hypothesis tests of linear regression, t tests, and analyses of variance are used. However, when small sample size (rule of thumb approximately <20 per group) or measurable skewness are evident, then nonparametric tests that are based on ranks or “order” within the data points are the preferred methods. Table 1 demonstrates the interdependence of sample size, effect size (difference in means), and statistical methods for comparison of group data in determining the outcome of comparisons of groups with respect to statistical significance. Investigators in the study by Lange et al (7) used a Wilcoxon rank sum test to compare the NMO vs RRMS eye groups with history of ON (Table 1). This seems appropriate, given the relatively modest sample size in each group (26 NMO and 13 RRMS); the result was a P-value of 0.46, indicating a 46% probability that the difference in mean RNFL thickness of ∼10 μm could have been observed by chance alone in this cohort. Although neuro-ophthalmologists would consider a 10-μm difference in RNFL thickness, particularly between groups, to be clinically meaningful (∼10% of normal RNFL thickness by SD OCT), the P-value of 0.46 may falsely lead the reader to believe that no real differences exist between NMO and RRMS for RNFL thickness months following ON. Using a 2-sample t test, the difference in mean RNFL thickness in the study by Lange et al (7), given the same standard deviations and sample sizes, would have demonstrated a trend (P = 0.09). Also using a 2-sample t test, Bichuetti et al (8) found the difference in NMO vs RRMS eyes with a history of ON to be statistically significant (P = 0.004), although there was also a greater effect size (difference in means) and smaller overall sample size. In general, nonparametric tests, such as the Wilcoxon rank sum test, are more conservative with regard to demonstrating statistical significance, yet may be more appropriate in the case of small sample sizes, the most common situation in which data cannot be assured to fit assumptions for normality.
- Sample size: what goes into it? The answer is—a lot! Determining the minimum sample size, or number of patients or eyes per group needed to demonstrate a statistically significant difference between groups if one actually exists, depends on the effect size (minimum clinically meaningful difference we would like to detect), variance (or standard deviations anticipated for each group), power (1—the maximum probability of a type II error, or chance of not detecting a difference when one exists), and alpha (maximum probability of a type I error, or chance of demonstrating a significant difference when one does not, in fact, exist). Investigators generally set power at 0.80–0.90 and alpha at 0.05. Based on the observed differences in means, the standard deviations, and power of 0.80 and alpha 0.05, the numbers of patients/eyes needed to show a statistically significant difference in the study by Lange et al (7) would be 61 with NMO and 31 with RRMS. For the investigation by Bichuetti et al (8), 16 with NMO and 32 with RRMS would be needed. These numbers assume similar distributions of NMO and MS eyes to those chosen in the published studies. It turns out, however, that fewer patients are usually needed for the comparator (or control) group compared with the diseased group of interest; this is related to the relatively smaller variances observed in groups of “controls” vs cases (see next section). Although the study by Bichuetti et al (8) had the opposite design of fewer cases (NMO) than controls (RRMS), significant differences were likely demonstrated by virtue of the larger effect size and less conservative statistical test (Table 1). Larger sample sizes also minimize the effects of sampling error or the potential to observe results that do not accurately estimate the true values for measurements in the populations of interest. Statistical software packages have become user-friendly and allow investigators to calculate sample sizes using information from the literature or from their own preliminary data.
- Might we need more patients with the disease of interest vs in the comparator group? Yes! Since the distributions of observations are generally “tighter” in control or comparator groups, with less variance and lower standard deviations (as in the RRMS groups in the 2 studies), more patients with the disease of interest may be needed than controls to show a significant difference. In some cases, disease-free controls or a comparator group of patients with a milder form of disease may be easier to recruit for studies, yet relatively fewer are required. Statistical software packages include options for specifying ratios of cases and controls when calculating sample size.
- When do statistical models help us, and what should they include? Models such as linear and logistic regression, and GEE (allows adjustment for inter-eye correlations), are useful when examining associations between 2 or more variables or characteristics. These models enable us to account for age, for example, while determining the relation between RNFL thickness and visual function or neurologic disability. A general rule of thumb for regression and sample size is that at least 10 patients/eyes are needed per covariate in the model; for example, a regression model examining the association of RNFL thickness and EDSS score, accounting simultaneously for age, will require at least 30 patients. The question of what disease characteristics to account for in regression analyses when determining associations of 2 other variables depends on the research question. If a characteristic, such as history of ON, is part of the disease definition (as in NMO), then accounting simultaneously for that variable could lessen the observed degree of association between the primary characteristics of interest. Consulting a biostatistician when considering use of regression models can be helpful to ensure that both the data and the research question are consistent with these types of analyses.
- Why does everyone keep asking about our inclusion criteria? … and case definition? Inclusion criteria, and specifically the case definition for disease, are critical to determining who is included in your study, and how they are categorized. In the case of disorders for which the spectrum is still being discovered and investigated, different studies may use slightly different case definitions. This is perhaps the most important area of potential difference between studies, particularly when diagnosis of a disease entity relies, even in part, upon tests that may be imperfect “gold standards.” Baseline likelihood of a disease within the population sampled also affects the predictive value of diagnostic tests; the less prevalent the disease, the greater the likelihood of false-positive test results. Differences in geographic region, genetics, and other population characteristics are likely factors in observed differences between studies that may seem otherwise similar with respect to their design, analyses, and other methodologic aspects.
- How can we put this all together? The authors of both manuscripts are to be congratulated for taking on this challenging and critical area of research in neuro-ophthalmology. Their work further emphasizes the importance of distinguishing NMO from MS and highlights how the relatively poor prognosis in NMO may be related to the marked degrees of axonal loss that have been noted consistently across OCT studies. These investigations, when viewed from the standpoint of how methodologic differences may play a role, have yielded results that are more similar than different. So, as neuro-ophthalmologists, we probably do have methods to our madness after all!
1. Warabi Y, Matsumoto Y, Hayashi H. Interferon beta-1b exacerbates multiple sclerosis with severe optic nerve and spinal cord demyelination. J Neurol Sci. 2007;252:257–261.
2. Shimizu J, Hatanaka Y, Hasegawa M, Iwata A, Sugimoto I, Date H, Goto J, Shimizu T, Takatsu M, Sakurai Y, Nakase H, Uesaka Y, Hashida H, Hashimoto K, Komiya T, Tsuji S. IFNβ-1b may severely exacerbate Japanese optic-spinal MS in neuromyelitis optica spectrum. Neurology. 2010;75:1423–1427.
3. Cree BA, Lamb S, Morgan K, Chen A, Waubant F, Genain C. An open label study of the effects of rituximab in neuromyelitis optica. Neurology. 2005;64:1270–1272.
4. Galetta SL. MS and NMO: partners no more. J Neuroophthalmol. 2012;32:99–101.
5. Marignier R, Bernard-Walnet R, Giraudon P, Collongues N, Papeix C, Zephir H, Cavillon G, Rogemond V, Casey R, Frangoulis B, De Seze J, Vukusic S, Honorat J, Confavreux C; for the NOMADMUS Study Group. Aquaporin-4-antibody-negative neuromyelitis optica: distinct assay sensitivity-dependent entity. Neurology. 2013;80:2194–2200.
6. Sato DK, Nakashima I, Takahashi T, Misu T, Waters P, Kuroda H, Nishiyama S, Suzuki C, Takai Y, Fujihara K, Itoyama Y, Aoki M. Aquiaporin-4-antibody-positive cases beyond current diagnostic criteria for NMO spectrum disorders. Neurology. 2013;80:2210–2216.
7. Lange A, Sadjadi R, Zhu F, Alkabie S, Costello F, Traboulsee AL. Spectral domain optical coherence tomography of retinal nerve fiber layer thickness in NMO patients. Neuroophthalmology. 2013;33:213–219.
8. Bichuetti DB, Camargo AS, Falcao AB, Goncalves FF, Tavares IM, Oliveira EML. The retinal nerve fiber layer of patients with neuromyelitis optica and chronic relapsing optic neuritis is more severely damaged than patients with multiple sclerosis. Neuroophthalmology. 2013;33:220–224.
9. Ratchford JN, Quigg ME, Conger A, Frohman T, Frohman E, Balcer LJ, Calabresi PA, Kerr DA. Optical coherence tomography helps differentiate neuromyelitis optica and MS optic neuropathies. Neurology. 2009;73:302–308.
10. Portney LG, Watkins MP. Foundations of Clinical Research: Applications to Clinical Practice. Upper Saddle River, NJ: Prentice Hall, Inc, 2009.