“Measurement” and “recovery” are interrelated concepts. We “know” that a patient “recovers,” in either a clinical or a research context, because his measurements “improve.” We “know” that specific measurements are valid because they delineate “recovery.” The two concepts elucidate each other.
There are outside referents for both ideas. Ultimately, the patient recovers because he feels better and apparent function is restored, but we look deeper than just asking the patient because nonapparent dysfunction may mean the disease is not over. We refer our ideas to the history of a large group of patients that can elucidate the relevance of pervasive, long-term, significant patterns.
The design of a clinical trial to assess treatment and recovery from an injury or illness like acute spinal cord injury (SCI), with multiple long-lasting deficits, ideally would derive from quantitative understanding of the evolution of clinical manifestations in the reference populations, integrated with the pathophysiology of the disorder. It would use assessment instruments and statistical methods appropriately matched to the clinical manifestations being studied. But what can realistically be achieved is often far from this ideal.
The scales used for the Sygen® study were selected because they have been standard, widely accepted tools readily administered in the acute SCI setting. 2,5,11,12,25,27,28,31,47–49 None of them could provide a global measure of all cord functions. Acute SCI, like traumatic brain injury and stroke, is a multidimensional continuum that evolves over time. It would be impossible to assess it from all perspectives simultaneously. Instead, we approach it from several different viewpoints and then try to arrive at a reasonable picture of the whole.
Past studies of the recovery from SCI have included the following: predicting motor recovery from early motor scores, 3 sensory root recovery, 32 motor recovery in zone of partial preservation, 44 motor recovery in upper extremities, 29 recovery of zero grade muscles, 58 motor sparing with absence of sensation, 41 72-hour examination as recovery predictor, 6 ambulation as predicted by sensory examination in motor complete patients, 14 recovery of motor strength in zone of injury, 43 recovery of upper extremity strength, 24 neurologic recovery in model SCI systems, 46 recovery of ambulation in incomplete tetraplegia, 8 bladder recovery, 57 pin sensation as a predictor of extensor carpi radialis recovery, 7 biceps and extensor carpi radialis recovery, 40 predicting recovery, 26 mortality in ventilatory patients, 10 surgical and conservative care comparison, 54 surgical timing, 30,33,56 wrist strength as predictor of functional independence, 45 motor power in the first 2 weeks, 38 multitrauma effect on neurologic outcome, 50 central cord injury management, 4,42 mortality and length of stay, 55 ambulation prediction based on quadriceps recovery, 13 MRI imaging, 51 and reflex recovery during spinal shock. 39
The present article, like the first in this series, 36 uses the data from the multicenter clinical trial of Sygen® in acute SCI to quantify some important issues in neurotrauma practice and in designing further clinical trials. It makes no attempt to discuss the effectiveness or ineffectiveness of Sygen®, a topic reserved for the third article 37 in the series.
The caveats expressed in the first of these articles still hold. Briefly, this study was not designed as epidemiology. The experience it reflects is strongly filtered by the Sygen® study’s design as a drug effectiveness study: by its inclusion–exclusion criteria and its randomization. Nonetheless, it is based on a very large sample of 760 patients, and the data from the sites were carefully monitored by a dedicated, centrally based monitoring team and crosschecked after data entry.
A total of 760 evaluable patients were recruited at 28 neurotrauma centers in North America during a 5-year recruitment period from April 1992 to January 1997. To ensure that they had an injury that was severe but still an essentially pure SCI, the study required the injury to be rostral to the T10 bony level, that at least one lower extremity has an AIS motor score of <15 of 25, and that anatomic transections be excluded. Patients were required to have received the NASCIS II dose of methylprednisolone (MPSS) for 24 hours, starting within 8 hours of injury and completed before starting study medications.
Patients had an emergency room evaluation and then had a detailed baseline evaluation just before the first dose of study medication, which began no later than 72 hours after the injury. The baseline examinations in this study were therefore performed well after the initial trauma evaluation and resuscitation. This delayed examination is more reliable and complete than can practically be performed in the emergency room as part of the initial trauma evaluation and treatment but does not include recovery from injury to the evaluation time.
Motor and Sensory Scores
Figure 1 summarizes the neurophysiology in a way that emphasizes an important problem. Motor assessment, which may seem like a clear, objective measurement, is only available at a limited number of spinal levels. Relatively large differences in anatomic injury or recovery can exist without being measurable by routine motor examination.
The American Spinal Injury Association (ASIA) motor score is measured in the group of key muscles shown in Table 1. Each muscle is evaluated according to the Medical Research Council (MRC) muscle grades shown in Table 2.
The total ASIA motor score is the sum of these measurements. Because 10 muscle groups are evaluated on two sides of the body and assigned a grade between 0 and 5, the total ASIA motor score is an integer between 0 and 100. One can also divide the total ASIA score into scores for the upper extremities and lower extremities.
The sensory examination assigns a grade of 0, 1, or 2 to the 27 index levels between 2 and 28 (Figure 1) on each side between C3 and S4 + 5, when there is absent, abnormal, or normal response, respectively, to light touch or pinprick. The total light touch or pinprick score is thus an integer between 0 and 108.
Figure 2 defines other useful motor and sensory indexes. The ASIA criterion for motor level of impairment is the segment for which the corresponding ASIA muscle has an MRC grade of at least 3. Rostral and caudal levels of impairment, for motor function or light touch or pinprick sensation, are determined, unilaterally or bilaterally, as follows. The level of lowest impairment is the first cord segment in rostro-caudal sequence at which there is no longer good function. The level of rostral impairment is the last cord segment (if any) in rostro-caudal sequence at which there is any function. There are several options for combining the data from both sides of the body: taking the average level or the highest or lowest of the two seems preferable to always taking one side only.
Change From Baseline.
The change of these quantities between baseline and the selected endpoint examination can be expressed in a surprising variety of ways, and the choice among them has a strong effect on the clinical interpretation.
As one example, it might seem good to express a patient’s recovery as the percent of the original motor deficit at baseline that has been recovered. However, if a patient entered with a near-perfect baseline total ASIA motor score of 95 and regains the lost 5 points, then he counts as 100% recovery on this system. A second patient who entered with a score of 40 and now scores 70 has regained 50% of 60 points that were lost in his injury; thus, his recovery counts only half as much as the first, less severely injured patient. Even if this second patient eventually recovers to a perfect score of 100, he still will count no more than the first patient. This may be correct in a particular application, but such choices in preconditioning of the data before statistical analysis are important and need to be explained clearly.
There is a specific clinical meaning to these mathematical choices. The one just discussed gives more weight to patients with lighter injuries, and clinical experience suggests that they are more likely to improve. Statistically, one hopes they will be randomized equally into both treatment groups, and block randomization is a potential tool for encouraging this. But even if they are numerically equal, it is a dilution of any treatment effect to include patients who are likely to improve with either treatment, and this problem is made worse if an endpoint variable is chosen that emphasizes their success.
It is perhaps best to express recovery in terms of any of these motor or sensory scores as the change from baseline.
Benzel and ASIA Impairment Scales.
Table 3 shows the definition of the modified Benzel Classification, as used in the Sygen® study. Traction, orthopedic devices, or the patient’s vertical instability routinely prevents walking assessment at baseline, and so the Sygen® study used the similar ASIA impairment scale, shown in Table 4, for the baseline evaluation.
As with the motor and sensory scores, a way needs to be found to express the patient’s recovery (i.e., the change over time) in these functional scales.
A binary outcome variable (“success”vs. “failure”) is often an attractive choice, despite comparative insensitivity and seeming disregard for anatomic niceties, for three reasons:
- They often can be defined to represent a degree of recovery that is readily apparent to the patient, to the family, and to the clinician as a “major improvement.”
- Statistically, the fact that there are only two possible outcomes means the exact probability distributions are fully known, and the assumptions underlying the use of the standard techniques can be guaranteed to be met.
- Any deaths or patients lost to follow-up can be included in the analysis in a natural way as “failures.” (This is in contrast to a motor or sensory score, where if a patient dies or is lost to follow-up a numerical value needs to be deleted, manufactured, or imputed. This can be done conservatively, for example, by carrying the last value forward or even by assuming the worst case. A nonconservative approach would be to analyze only the survivors and not adhere to an intent-to-treat philosophy. But none of these techniques is accurate, as the names “conservative” and “nonconservative” indicate.)
In the case of SCI, using the ASIA Impairment Scale (AIS) at baseline and the modified Benzel Classification at week 26 as the beginning and ending points of a binary improvement criterion is potentially attractive:
- These functional scales fit with focusing on recovery that should be apparent and useful to the patient.
- A large improvement in these scales is likely to reflect anatomic recovery in the central matter of the spinal cord and going through the injury zone rather than local recovery involving peripheral nerve, bone, brachial plexus, or other confounding factors. It requires a change in lower extremity function.
The Sygen® study accordingly used the proportion of “marked recovery,” defined in Table 5 as the primary efficacy endpoint.
Of the 760 patients, 43 (or 5.7%) died within 365 days. Figure 3 shows the Kaplan-Meier curve, with a logarithmic time scale. The death rate was higher for the complete injuries (those with baseline AIS = A) than for patients in the B or C + D groups. The statistical comparison of the counts between A and B + C + D has P = 0.017, using Fisher’s exact test two-tailed. The Kaplan-Meier curves are shown in Figure 4, where the steeper curve in the complete group is visually clear.
Changes in Functional Scale
Figure 5 shows the distribution of change in Benzel grade at 26 weeks, for each value of the AIS scale at baseline. Few in the A group improve in grade; those in B have a mixed improvement and those in C + D recover several grades. Note that a patient with baseline severity AIS equal to D can only improve three grades because the maximum MBC is 7. Similarly, patients entering with AIS grades of A, B, and C are limited to 6, 5, or 4 grades of improvement, respectively.
In Figure 6 these results are translated into the marked recovery outcome (defined in Table 5) as a function of time. It is again apparent that severity of injury at baseline was a critical determinant.
When the marked 26-week outcome is further broken down by anatomic level (Figure 7), it is seen that the cervical group has slightly higher rate than the thoracic group, at each baseline severity. The greatest trend is in the severely injured A group.
Figure 8 compares patients whose MPSS was started before 3 hours with those started later. It was one of the main results of the NASCIS III study that there was a difference between these two groups in their ASIA motor score. Because neither the NASCIS study nor the Sygen® study prospectively randomized patients to earlier versus later MPSS, either or both studies might show a picture of this difference that is confounded by other factors, and we cannot expect they will necessarily agree. Still, although the experience in the Sygen® study may not contradict NASCIS on this point, it does not confirm it. MPSS timing was not found to be a significant covariate in analyzing SCI recovery patterns.
The top graph in Figure 9 shows a slight trend in marked recovery in favor of patients who were not operated. This graph illustrates the way in which clinical trial data are not epidemiologic and need cautious interpretation. These data do not represent cohorts with matched severity: the nonoperated group had less severe bony injuries. This might possibly be interpreted in light of their baseline AIS score and their spinal level. However, Table 7 in the first article in this series 36 shows that 604 patients were operated on and only 156 were not. Further, Table 4 in that article shows that there were disproportionately few thoracic injuries and disproportionately few with injuries that were not complete. Combining these problems suggests the difficulty of reaching statistically reliable characterizations in such small subsamples.
Patients with suspected central cord injuries achieved much higher marked recovery, as shown in the second graph in Figure 9. Similarly, stable injuries without fracture dislocation did better. The last graph in Figure 9 shows a “mild injury” group derived by combining the nonoperated, central cord and stable spine patients together.
Changes in Motor and Sensory Score
Figure 10 shows the progression over time of the difference from baseline in light touch, pinprick, and motor scores, broken down by baseline AIS severity and by spinal level. Visually, all three of its graphs tell similar stories: gradual improvement, much of it finished between weeks 8 and 26. The degree of improvement is greater in less severe injuries, despite their having less room to improve. The light touch score shows the AIS B group as more resembling AIS C + D, whereas the motor score shows the AIS B group more like AIS A. The light touch score may reach maximum recovery earlier than motor score does.
Light Touch Score.
Figure 11 shows the progression in the distribution of light touch scores, at baseline and at weeks 8, 26, and 52 as well as the difference from baseline to week 26. The mean value changes from 37.3 to 50.0 to 53.4 to 54.6 at the four selected time periods.
The median light touch score changes from 32 at baseline to 46.5 at week 8 to 52 at week 26, and then is the same again: 52 at week 52. This means that the middle value in the group moved by about 20 light touch score points over the study period. However, this increase did not apply to all patients equally. The first quartile was 15 at baseline, so that 25% of the patients had scores ≤15. By week 8 this number had only progressed to 21, and it was 23 and 25 at weeks 26 and 52, respectively, for a total gain of 25 − 15 = 10 points.
In contrast, the third quartile was 56 at baseline, so that 75% of the patients had this score or lower. This advanced to 70, then 75, then 79, and finally 81 at week 52. The total gain was 81 − 56 = 25 points.
Although none of these calculations applies to specific, individual patients, it is clear that, on the whole, the patients in the Sygen® study sample who started with lower light touch scores improved less than those who started with higher ones did.
The stories told by the medians and quartiles are more detailed than those told by the mean values, and they indicate the lack of homogeneity within the sample. The top four curves in Figure 11 are visually not the same as the bell-shaped normal, or Gaussian probability curve; they tend too much to bunch up at the right or left end of the scale.
This lack of homogeneity is explored further in Figure 12, which divides the 26-week light touch scores by the anatomic level. The two empirical distributions shown in this figure are different from each other: the cervical curve is bathtub-shaped and the thoracic one is upside-down from the cervical. The two graphs show the closest bell-shaped, normal approximations as dotted lines and neither fits well. The different lack of fit in each reflects the different anatomies of cervical and thoracic injuries and the variable recovery across the recovery strata.
The curves in Figure 12 have almost the same mean values: 52.4 and 56.5. Their medians are even closer to each other: 50 and 52. The difference is in the spread of values. In the thoracic curve the first and third quartiles are 42 and 62.5, closely flanking the median. In the cervical group the quartiles are 19 and 84.5, far from the median.
This greater dispersion in the cervical group is reflected in the flatness of the best normal fit: the standard deviation is 35.0, compared with 20.0 for the thoracic group.
The unusual shapes of these distributions reflect the different contributions of each severity stratum, as shown in Figure 13. Visually, the cervical distribution in Figure 12 is the weighted sum of the ones shown in the left column of Figure 13, in which AIS Group A is skewed to the left and the other two are skewed to the right. Similarly, the thoracic curve in Figure 12 is the weighted sum of the right column of Figure 13, with the added complication that the sample sizes in the incomplete injury groups are impossibly low.
Figure 14 shows the week 26 light touch scores plotted against baseline. The floor shows the scatter of the data points. If the normal probability distribution did apply to these data, then the ellipse would be a good fit, showing the central location of the scatter. However, it is far from doing that. The walls in this plot show the histograms.
Figure 15 shows the first in a series of analyses done on all the motor and sensory scores and levels. In it the difference in light touch score from baseline to week 26 is plotted against the baseline score. Because of the mathematical links between these two variables, the points are constrained to lie in a diagonal parallelogram.
There is a set of contour lines indicating the empirical, nonparametric density of the points. Visually, there seem to be three groups. One group, clustered about the x-axis, represents patients who did not change much from baseline. There is a second, smaller group clustered about the upper diagonal edge. These are patients who made nearly complete recoveries. There is the suggestion of a third, still smaller group, in between these two, of patients who made a strong but not complete recovery in their light touch score.
The figure also shows the best-fit straight line. It evidently is pulled among the three groups of patients just discussed, according to their relative predominance in the makeup of the populations served by the Sygen® study SCI centers. In particular, it does not represent any reflection of inherent underlying disease patterns.
All of the analyses in this article are post hoc secondary analyses: descriptive and exploratory, attempts to informatively characterize the particular sample drawn and to suggest hypotheses, guidelines, and resources for further study.
Ellipses and lines are drawn for the cervical and thoracic patients in Figure 16. The ellipses do not fit well but give a general indication where each group is centered, and the lines are subject to the same criticisms mentioned above. It is interesting that the lines are nearly parallel. The cervical patients gain more than the thoracic ones. As Table 6 indicates, their average starts so much lower at baseline that, even though they gain more than twice as much on the average as the thoracic patients, they still are lower at 26 weeks.
In Figure 17 this is repeated for the three AIS baseline severity groups. The line for AIS B is parallel to the one for AIS C + D and lower. The line for AIS A is not parallel. It has a lower intercept and stays lower until it intersects the other two at the right side of the graph. The AIS A group also seems to occupy a different location on the graph than the other two groups.
Completing this series, Figure 18 suggests that the age stratification does little to differentiate the patients’ light touch recovery.
Observing such relationships is an essential step to validating any analysis of covariance that would take into account the study’s randomization strata and the patients’ baseline values.
Figures 19–24 repeat this analysis for the ASIA motor score. Many of the general features repeat those of the light touch analysis and of the analyses (not shown) for pinprick and lower motor score, as well as for the spinal levels associated with each of these four variables.
Many patients with complete thoracic injuries present with an ASIA motor score that is equal to 50. It is impossible to measure motor function in the region of their injury, so they have perfect scores in the muscle groups above it and they have no function in the muscle groups below. This is one cause of the spike at 50 in the histogram in Figure 19 for ASIA motor score.
Further, these same patients could recover anatomic cord integrity for several levels below their injury without improving their reported motor score beyond the initially tested 50. This is one reason for the spike at 0 in the histogram in Figure 19 for ASIA motor score change.
This feature of the ASIA motor score also causes an artifact in Figure 22, comparing cervical and thoracic. Many patients start with a score of 50, on the x-axis. Those who recover completely fall on a diagonal line parallel to the main diagonal for the population as a whole. The regression line for the thoracic group is nearly parallel to that line, but it is weighted down on the right by a whole set of points that start at 50 but exhibit zero improvement. These latter points are all plotted on top of each other and do not appear distinct. This behavior accounts for some of the observed differences between Figure 16 and Figure 22.
Neither Figure 23 nor Figure 24 is very different in practical interpretation from the corresponding Figures 17 and 18.
The central graph in Figure 25 shows wide apparent differences in the fraction of patients with bladder control at 26 weeks among the three baseline AIS severity groups. The other two graphs show smaller and less consistent differences between cervical and thoracic and between young and old. Similar findings were observed for bowel control, sacral sensation, and anal contraction.
Statistical Distribution of Motor and Sensory Scores
Using Normal Theory Statistics.
As mentioned above, binary (success/failure) outcome scales come equipped with probability distributions that are known to statisticians and embodied in procedures such as Fisher’s exact test. However, the statistics for measurements like the ASIA motor and sensory scores always involve assumptions and risks of error.
One is tempted to use normal theory procedures, such as the t test, or analysis of variance (ANOVA), or analysis of covariance. However, these tests require that the underlying data have the normal or Gaussian probability distribution: that they fit a smooth, bell-shaped curve. The discussion above has repeatedly indicated that these assumptions are largely untenable for data in this SCI study.
Another assumption of normal theory parametric statistics is equality of variance among the groups. The discussion of Figure 12 indicates that a t test for light touch score between the cervical and thoracic groups is unlikely to be valid. Strictly speaking, the t test and ANOVA cannot be used with SCI data like those presented in this article.
The central limit theorem of statistics does say that such tests give approximately correct answers if the sample size is large enough. It is generally unknown how large a sample must be to get a close approximation: the answer depends on the particular shape of the distribution. (If one is a believer that P < 0.05 means the effect is there and P > 0.05 means it is not, then how close an “approximation” is good enough?) It is sometimes possible and more prudent to use nonparametric statistical tests instead.
In the case of the total sensory and motor scores, nonparametric tests are less appropriate. Some clinical thought suggests that there are underlying reasons for the anomalies in the histograms.
Patients with cervical injuries have an ASIA motor score that is almost always <50. This is one reason the histogram in Figure 19 is skewed to the left. It would be less so if the recruitment in the thoracic stratum were higher.
For the ASIA motor and sensory data, using the t test or ANOVA (even in large samples) or using nonparametric tests may amount to forcing square pegs into round holes, not because the histograms are the wrong shape but because they arise from distinct subgroups and from the eccentricities of the ways that our chosen measurement scales apply in some of these subgroups. The anomalies in the histograms tip us off that that we should be doing better, more thoughtful statistics based on modeling clinical reality in focused, homogeneous subgroups. Otherwise, the results that we think are “valid” may change disconcertingly with a change in measurement scales or in recruitment patterns.
There are, of course, limitations to modeling identifiable population subgroups. The Discussion of the first article in this series 36 points out that the severity strata and the injury level strata have highly nonuniform recruitment.
Types of Measurement Scale.
Measurement scales are classified by the degree to which one can take literally any implied comparisons among their grades. For example, a heart rate of 120 beats/min is literally twice as fast as a rate of 60 and 3 times as fast as a rate of 40: heart rate is a ratio scale. An SCI patient whose rostral level of injury is 20 has essentially twice as many functioning spinal levels as a patient whose relative level is 10, although it would be an overinterpretation to say that this means he has “twice as much function.”
A muscle group with an MRC grade of 4 has more function than one with a grade of 2, but it is not clear in any sense how much that difference represents: it is more function, but it would be meaningless to say “twice as much function.” The MRC grade is ordinal.
In between the ordinal scales and the ratio scales are the interval scales, in which a 10-point difference between 20 and 30 counts for the same as the one between 70 and 80. Even less structured than the ordinal scales are the nominal scales, which have no order to them at all: nationality, gender, etc.
Using the t test and ANOVA requires that we can make sense of the parameters of the normal probability distribution. The concept of comparing the difference between two mean values needs at least an interval scale. Using the standard deviation to do so needs a ratio scale: it measures how expanded or contracted the spread of the distribution is.
Even when we have only an ordinal scale we are often tempted to go ahead with t test and ANOVA “as an approximation,” whatever this might mean. But we should remember that there are different degrees of ordinality, as shown in Figure 26. A procedure may almost make sense for one ordinal scale, but for another it may fall short of even that modest goal.
The measurement scales used in SCI are not interval or ratio scales and are not even strictly ordinal. Statistical results involving them need to be interpreted carefully.
When the MRC grades are added together as the total ASIA motor score, interpretation becomes even more problematic. Suppose two patients have complete injuries at different levels and one patient recovers from MRC Grade 0 to Grade 2 in five muscle groups while the other recovers from Grade 0 to Grade 5 in two muscle groups. Are these recoveries “the same?” Are they “equivalent” in any sense? Are they “equally useful” to the patient? Both recoveries count as the same change in total ASIA motor score. An increase in motor score on follow-up examination could result from: 1) weak muscles regaining strength, 2) paralyzed muscles regaining strength, 3) descending of complete motor level, or 4) any combination of these.
Association Between Baseline and Endpoint Measurement
Natural Course of the Disease and the Treatment Effect.
Measuring a treatment effect means detecting the difference between a baseline and a follow-up measurement. How much did the patient improve? The placebo patients do not stand idle during the study period: they flow wherever the “natural course of the disease” takes them, as influenced by the “standard therapy” designed into the placebo group regimen.
When we measure the treatment effect we compare the beginning-to-end change in the active treatment group with the beginning-to-end change in the placebo group. Therefore, the choice and presentation of the baseline measurement are of paramount importance in defining the apparent outcome.
Two clinical trials with different baseline measurement times may not be directly comparable. The degreeto which the patients are uniform and stabilized can change the reported results. An emergency room examination cannot be thorough.
Within a single study it is obvious that the two treatment arms need to be comparable in their baseline measurements. This issue can be addressed ahead of time in the randomization or afterwards by statistical adjustments.
It is regrettable how often simple randomization fails to produce comparability between arms in important prognostic factors. A reasonably effective method of ensuring it is block randomization by strata. The limitation is that when there are too many strata, the block size becomes impractically large. In the Sygen® study there were originally three AIS severity levels and two anatomic levels, making six strata. Later, the patients were further divided into two age levels, making 12 strata. If each block has two patients per stratum, this means a total of 24 patients per block. However, the average recruitment was five patients per center per year so that it takes too long for some centers to go through a complete block.
Preventing imbalance is always better than statistical adjustment, which usually requires additional, possibly unfounded assumptions about the natural history of the disease. Consider an ANOVA with total ASIA motor score as the dependent variable and with treatment arm, anatomic level, and baseline AIS as the independent factors. Such a model adjusts for the covariates to some degree, but it is limited by assuming that the pattern of difference among the three AIS levels is the same for cervical injuries as it is for thoracic. This is probably not true. Thus, if there were an imbalance between the treatment arms, this model might be limited in its ability to compensate.
This problem could be alleviated by including terms for the interaction of the anatomic and the severity levels, but it is only fully solved by including interaction terms for all three pairs of independent factors and for the three factors together. Such a model is a technical and interpretive nightmare: the “treatment effect” is now partly included in the interaction terms, and there is no unique way to select which subset of factors is statistically significant. And still this model does not yet include age or other important predictors. The credibility of such models decreases rapidly with the number of factors included.
One can never be sanguine that baseline imbalances have been fully accounted for and adjusted. Handling them is a major hurdle for any study analysis.
State of the Art.
As recently as 15 years ago, many clinicians were pessimistic about the chances for improvement as large as marked recovery in severe SCI patients. The reports by Stover and colleagues 1,9,15–23,34,35,52,53 of the National SCI Database (1973–1985) included data, illustrated here as Figure 27, that invite comparison with our Figure 5, which uses different but related measurement scales. In one respect these two figures present a similar message: recovery depends on initial severity.
But in a crucial respect the two figures are very different. The older data suggest that there is little hope, for any group, of marked recovery, whereas the newer figure shows substantial numbers of patients with marked recovery.
The sample size designed for the Sygen® study was based on literature estimates by Stover et al 52,53 and on the direct advice of at least 50 active SCI practitioners. It assumed that the rate of marked recovery in the placebo group was unlikely to be >10%, and more probably would be 5–7%. In the actual trial the placebo patients had 28% marked recovery and the rate (Figure 6) was >10% even in the complete injury group.
The conclusions to the first article 36 in this series suggest a number of possible factors in this change. That article seems to show a relatively disciplined, efficient delivery system for neurotrauma care. EMTs arrive early at the scene of injury and the patients are brought rapidly to tertiary care centers where they are given appropriate specialized treatment. When a protocol item like MPSS (which requires an early start followed by a 24-hour course) is identified as high priority, its timing is prompt and its dosage is accurate despite an administration that may be divided among several providers.
So the first suggestion is that recovery is better because care is better. What is difficult, though, is to identify which elements of the newer system are responsible for the improvement.
Some of the results in this article are puzzling in this respect. Many clinicians believe that aggressive procedures are necessary in the acute phase to alleviate cord compression before metabolically induced changes are irreversible and to prevent secondary injury. Many other clinicians believe in stabilizing the patient first. Regrettably, we can throw little light on this. The first of these articles showed that, whatever community pressure there may be to operate early, the majority of operations at our centers (mostly large and university affiliated) are much later than the acute phase. Figure 9 suggests a slight trend toward improvement in the nonoperated group, but even this vague result is confounded by the fact that it includes no control for whether the cord was compressed or surgery was needed.
The situation with MPSS is similarly inconclusive. After the initial report of NASCIS II, there was intense pressure to use MPSS as a de facto standard of care. This pressure was so great that the Sygen® study was designed to give the NASCIS II dose of MPSS to all patients. Because this is the major difference between the present Sygen® multicenter study and the earlier Maryland Sygen® study and because the marked recovery rate for placebo patients in that study was 1 in 14, it is tempting to assume that MPSS played a part in the success of the more recent group of patients.
This may indeed be true, but caution is indicated, as always when thinking about historical controls. On common sense evidence the neurotrauma systems differences are just as plausible an explanation.
Promise of the Future
This article and the first one in the series 36 have been devoted to showing how the present state of acute SCI looks when filtered through the design assumptions of the multicenter Sygen® study.
The prognosis for these patients may have improved between the 1980s and the 1990s. In a general sense, this advance can be attributed to changes in acute SCI care. However, data and analyses are scarce that would help us identify specifically which of those changes have made the difference and which further changes could be more help to specific classes of patients.
It appears that these patients can be helped and that some of the things we have done have made a difference. We need research that can show how to maximize and extend our gains.
- This is a retrospective analysis of recruitment and early treatment in carefully monitored data from 760 patients seen at 28 centers in North America during the mid-1990s in a clinical trial.
- The prognosis appeared better than was often assumed earlier.
- The general patterns are similar across different measurement scales, although there are intriguing differences.
- The patterns in different strata are different in specifics, and complete injuries do less well.
- Pooling data from different strata may result in probability distributions that depart from normal-theory assumptions and give misleading results depending on recruitment patterns.
Frank Dorsey, Francesca Patarnello, and Simonetta Piva provided statistical analysis support and William Taylor provided SAS software support for the data analysis.
1. Spinal Cord Injury: Clinical Outcomes From the Model Systems. Gaithersburg, MD: Aspen, 1995.
2. American Spinal Injury Association/IMSoP. International Standards for Neurologic and Functional Classification of Spinal Cord Injury. Atlanta: American Spinal Injury Association, 1992.
3. Blaustein DM, Zafonte R, Thomas D, et al. Predicting recovery of motor complete quadriplegic patients: 24 hour v 72 hour motor index scores. Am J Phys Med Rehabil 1993; 72: 306–11.
4. Bose B, Northrup BE, Osterholm JL, et al. Reanalysis of central cervical cord injury management. Neurosurgery 1984; 15: 367–72.
5. Bracken MB, Webb SB Jr, Wagner FC. Classification of the severity of acute spinal cord injury: implications for management. Paraplegia 1978; 15: 319–26.
6. Brown PJ, Marino RJ, Herbison GJ, et al. The 72-hour examination as a predictor of recovery in motor complete quadriplegia. Arch Phys Med Rehabil 1991; 72: 546–8.
7. Browne BJ, Jacobs SR, Herbison GJ, et al. Pin sensation as a predictor of extensor carpi radialis recovery in spinal cord injury. Arch Phys Med Rehabil 1993; 74: 14–8.
8. Burns SP, Golding DG, Rolle WA Jr, et al. Recovery of ambulation in motor-incomplete tetraplegia. Arch Phys Med Rehabil 1997; 78: 1169–72.
9. Charles ED, Fine PR, Stover SL, et al. The costs of spinal cord injury. Paraplegia 1978; 15: 302–10.
10. Claxton AR, Wong DT, Chung F, et al. Predictors of hospital mortality and mechanical ventilation in patients with cervical spinal cord injury. Can J Anaesth 1998; 45: 144–9.
11. Cohen ME, Ditunno JF Jr, Donovan WH, et al. A test of the 1992 International Standards for Neurological and Functional Classification of Spinal Cord Injury. Spinal Cord 1998; 36: 554–60.
12. Coleman W, Benzel E, Cahill D, et al. A critical appraisal of the reporting of the National Acute Spinal Cord Injury Studies (II and III) of methylprednisolone in acute spinal cord injury. J Spinal Disord 1999; 13: 185–99.
13. Crozier KS, Cheng LL, Graziani V, et al. Spinal cord injury: prognosis for ambulation based on quadriceps recovery. Paraplegia 1992; 30: 762–7.
14. Crozier KS, Graziani V, Ditunno JF, et al. Spinal cord injury: prognosis for ambulation based on sensory examination in patients who are initially motor complete. Arch Phys Med Rehabil 1991; 72: 119–21.
15. DeVivo MJ, Black KJ, Stover SL. Causes of death during the first 12 years after spinal cord injury. Arch Phys Med Rehabil 1993; 74: 248–54.
16. DeVivo MJ, Fine PR, Maetz HM, et al. Prevalence of spinal cord injury: a reestimation employing life table techniques. Arch Neurol 1980; 37: 707–8.
17. DeVivo MJ, Kartus PL, Rutt RD, et al. The influence of age at time of spinal cord injury on rehabilitation outcome. Arch Neurol 1990; 47: 687–91.
18. DeVivo MJ, Kartus PL, Stover SL, et al. Cause of death for patients with spinal cord injuries. Arch Intern Med 1989; 149: 1761–6.
19. DeVivo MJ, Kartus PL, Stover SL, et al. Seven-year survival following spinal cord injury. Arch Neurol 1987; 44: 872–5.
20. DeVivo MJ, Rutt RD, Black KJ, et al. Trends in spinal cord injury demographics and treatment outcomes between 1973 and 1986 [published erratum appears in Arch Phys Med Rehabil 1992;73: 1146]. Arch Phys Med Rehabil 1992; 73: 424–30.
21. DeVivo MJ, Shewchuk RM, Stover SL, et al. A cross-sectional study of the relationship between age and current health status for persons with spinal cord injuries. Paraplegia 1992; 30: 820–7.
22. DeVivo MJ, Stover SL, Black KJ. Prognostic factors for 12-year survival after spinal cord injury [published erratum appears in Arch Phys Med Rehabil 1992;73: 1146]. Arch Phys Med Rehabil 1992; 73: 156–62.
23. DeVivo MJ, Stover SL, Fine PR. The relationship between sponsorship and rehabilitation outcome following spinal cord injury. Paraplegia 1989; 27: 470–9.
24. Ditunno JF, Cohen ME, Hauck WW, et al. Recovery of upper-extremity strength in complete and incomplete tetraplegia: a multicenter study. Arch Phys Med Rehabil 2000; 81: 389–93.
25. Ditunno JF, Ditunno JF Jr. American spinal injury standards for neurological and functional classification of spinal cord injury: past, present and future. 1992 Heiner Sell Lecture of the American Spinal Injury Association. J Am Paraplegia Soc 1994; 17: 7–11.
26. Ditunno JF, Ditunno JF Jr. The John Stanley Coulter Lecture. Predicting recovery after spinal cord injury: a rehabilitation imperative. Arch Phys Med Rehabil 1999; 80: 361–4.
27. Ditunno JF Jr, Graziani V, Tessler A. Neurological assessment in spinal cord injury. Adv Neurol 1997; 72: 325–33.
28. Ditunno JF Jr, Young W, Donovan WH, et al. The international standards booklet for neurological and functional classification of spinal cord injury: American Spinal Injury Association. Paraplegia 1994; 32: 70–80.
29. Ditunno JF, Stover SL, Freed MM, et al. Motor recovery of the upper extremities in traumatic quadriplegia: a multicenter study. Arch Phys Med Rehabil 1992; 73: 431–6.
30. Duh MS, Shepard MJ, Wilberger JE, et al. The effectiveness of surgery on the treatment of acute spinal cord injury and its relation to pharmacological treatment. Neurosurgery 1994; 35: 240–8; discussion 8–9.
31. Dvir Z. Grade 4 in manual muscle testing: the problem with submaximal strength assessment. Clin Rehabil 1997; 11: 36–41.
32. Eschbach KS, Herbison GJ, Ditunno JF. Sensory root level recovery in patients with Frankel A quadriplegia. Arch Phys Med Rehabil 1992; 73: 618–22.
33. Fehlings MG, Tator CH. An evidence-based review of decompressive surgery in acute spinal cord injury: rationale, indications, and timing based on experimental and clinical studies. J Neurosurg 1999; 91: 1–11.
34. Fine PR, Kuhlemeier KV, DeVivo MJ, et al. Spinal cord injury: an epidemiologic perspective. Paraplegia 1979; 17: 237–50.
35. Fine PR, Stover SL, DeVivo MJ. A methodology for predicting lengths of stay for spinal cord injury patients. Inquiry 1987; 24: 147–56.
36. Geisler FH, Coleman WP, Grieco G, et al. Recruitment and early treatment in a multicenter study of acute spinal cord injury. Spine 2001; 26 (suppl 1): S58–67.
37. Geisler FH, Coleman WP, Grieco G, et al. The Sygen® Multicenter Acute Spinal Cord Injury Study. Spine 2001; 26 (suppl 1): S87–98.
38. Herbison GJ, Zerby SA, Cohen ME, et al. Motor power differences within the first two weeks post-SCI in cervical spinal cord-injured quadriplegic subjects. J Neurotrauma 1992; 9: 373–80.
39. Ko HY, Ditunno JF, Graziani V, et al. The pattern of reflex recovery during spinal shock. Spinal Cord 1999; 37: 402–9.
40. Kornsgold LM, Herbison GJ, Decena BF, et al. Biceps vs extensor carpi radialis recovery in Frankel grades A and B in spinal cord injury patients. Paraplegia 1994; 32: 340–8.
41. Kowalske KJ, Herbison GJ, Ditunno JF, et al. Spinal cord injury syndrome with motor sparing in the absence of all sensation. Arch Phys Med Rehabil 1991; 72: 932–4.
42. Levi AD, Tator CH, Bunge RP. Clinical syndromes associated with disproportionate weakness of the upper versus the lower extremities after cervical spinal cord injury. Neurosurgery 1996; 38: 179–83; discussion 83–5.
43. Mange KC, Ditunno JF, Herbison GJ, et al. Recovery of strength at the zone of injury in motor complete and motor incomplete cervical spinal cord injured patients. Arch Phys Med Rehabil 1990; 71: 562–5.
44. Mange KC, Marino RJ, Gregory PC, et al. Course of motor recovery in the zone of partial preservation in spinal cord injury. Arch Phys Med Rehabil 1992; 73: 437–41.
45. Marciello MA, Herbison GJ, Ditunno JF, et al. Wrist strength measured by myometry as an indicator of functional independence. J Neurotrauma 1995; 12: 99–106.
46. Marino RJ, Ditunno JF, Donovan WH, et al. Neurologic recovery after traumatic spinal cord injury: data from the Model Spinal Cord Injury Systems. Arch Phys Med Rehabil 1999; 80: 1391–6.
47. Marino RJ, Rider-Foster D, Maissel G, et al. Superiority of motor level over single neurological level in categorizing tetraplegia. Paraplegia 1995; 33: 510–3.
48. Maynard FM, Bracken MB, Creasey G, et al. International Standards for Neurological and Functional Classification of Spinal Cord Injury: American Spinal Injury Association. Spinal Cord 1997; 35: 266–74.
49. Medical Research Council of the U.K. Aids to the Examination of the Peripheral Nervous System (Memorandum No. 45). London: Her Britannic Majesty’s Stationery Office, 1976.
50. Meguro K, Tator CH. Effect of multiple trauma on mortality and neurological recovery after spinal cord or cauda equina injury. Neurol Med Chir (Tokyo) 1988; 28: 34–41.
51. Shepard MJ, Bracken MB. Magnetic resonance imaging and neurological recovery in acute spinal cord injury: observations from the National Acute Spinal Cord Injury Study 3. Spinal Cord 1999; 37: 833–7.
52. Stover SL, DeVivo MJ, Go BK. History, implementation, and current status of the National Spinal Cord Injury Database. Arch Phys Med Rehabil 1999; 80: 1365–71.
53. Stover SL, Fine PR. Spinal Cord Injury: The Facts and Figures. Birmingham, AL: University of Alabama, 1986.
54. Tator CH, Duncan EG, Edmonds VE, et al. Comparison of surgical and conservative management in 208 patients with acute spinal cord injury. Can J Neurol Sci 1987; 14: 60–9.
55. Tator CH, Duncan EG, Edmonds VE, et al. Neurological recovery, mortality and length of stay after acute spinal cord injury associated with changes in management. Paraplegia 1995; 33: 254–62.
56. Tator CH, Fehlings MG, Thorpe K, et al. Current use and timing of spinal surgery for management of acute spinal surgery for management of acute spinal cord injury in North America: results of a retrospective multicenter study. J Neurosurg 1999; 91: 12–8.
57. Weiss DJ, Fried GW, Chancellor MB, et al. Spinal cord injury and bladder recovery. Arch Phys Med Rehabil 1996; 77: 1133–5.
58. Wu L, Marino RJ, Herbison GJ, et al. Recovery of zero-grade muscles in the zone of partial preservation in motor complete quadriplegia. Arch Phys Med Rehabil 1992; 73: 40–3.