'Statistical inference from experimental or observational data has become indispensable in medical research.' So said Abt . We agree, and would like to comment on methods of comparing the development of responses over time in two groups of patients. A technique known as repeated measures analysis of variance, that provides a neat way of expressing the results, will be described. We appreciate that readers of this journal are not statisticians, but we are confident that most of them have sufficient contact with statistics to form an intelligent view on what we say.
In Abt's table 1, some (artificial) data were given of the following type. There were six patients in a placebo group and six patients in an active treatment group; for each patient, the grade of intensity of skin signs was recorded at four times. Looking at the data, one can see that skin signs intensity has an approximately linear dependence on time in each patient. We have no quarrel with the principle of what Abt did. However, readers may be interested in the following succinct method of doing it. It should be noted that the design of this experiment is what is called 'repeated measures'; that 'repeated measures analysis of variance' is an appropriate technique for the analysis; and that (as Abt says) what is of interest is the 'treatment by time interaction'.
It is necessary to clarify what is meant by interaction. This refers to the question of whether the development of skin signs intensity over time is the same in the active treatment group as in the placebo group. Unless modification is made, there will be no restriction on what the pattern of this development might be (e.g. in principle, it could be that skin signs intensity increases from time 1 to time 2, declines at time 3, and increases again at time 4). However, we do need to make modification - in view of previous experience with this dependent variable and because of the linearity that is evident in this data, we want to insist that the relation between time and skin signs intensity is a straight line. When this is done, asking whether there is treatment by time interaction is the same as asking whether the skin signs intensity vs. time graph has different gradients for the two groups of patients.
The following syntax indicates how this might be done in SPSS (Statistical Package for the Social Sciences).
MANOVA skin1 skin2 skin3 skin4 by group(1,2)
/wsdesign=time(1) time(2) time(3)
Brief explanation: (a) The MANOVA command is followed by a list of dependent variables (in repeated measures ANOVA using the multivariate approach, repeats are treated as distinct dependent variables, rather than as several measurements of one dependent variable), the keyword 'by', and then a list of independent variables (with the codes for the minimum and maximum levels of interest of these in the current analysis being in brackets). It is here presumed that skin signs at the four times have been named skin1, skin2, skin3, and skin4, and the variable identifying whether the patient received placebo or treatment is named group. (b) The wsfactors subcommand specifies that the dependent variables are really a single variable, that this was measured at four levels of the within-subjects factor, and that this within-subjects factor will be called 'time' in this MANOVA analysis. (c) Our chief concern is with the slope of the relation between time and skin signs intensity. The effect of time will be split into polynomial (i.e. linear, quadratic, etc.) components. This is specified by the contrast subcommand: the variable is named, and the type of contrasts is identified. (d) The wsdesign subcommand specifies the within-subjects design. Here, analysis of linear trend [shown as time(1)], quadratic trend [time(2)], and cubic trend [time(3)] is specified. (It would be possible to include only linear trend, but it is usual to obtain results for the others as well, just in case they are important.) (e) The design subcommand specifies the between-subjects aspects of the design - in this case, there is a single between-subjects factor, and its name is group.
We now turn to the results of the analysis of the interaction of group (placebo vs. treatment) and linear trend. The relevant statistics, extracted from the SPSS output, are shown in Table 1. (Abt's first scoring scheme, of 0, 1, 2, 3 for the four grades of intensity was used.) We see two things in Table 1. First, there is a linear effect of time on skin signs. Second, the fact that the interaction is significant tells us that the gradient of the linear effect is different for the two groups of patients. (As previously indicated, we are not intending to criticize the principle of the analysis that Abt made. Indeed, our analysis is in line with it, and our results are similar to Abt's.)
There is one technical point on which care is needed. The traditional approach to repeated measures data is termed 'univariate mixed model' ANOVA. An assumption of the interaction test is that correlation between treatment pairs is equal - that is, the variance (over subjects in each group) of the change between one time point and another is the same for all pairs of time points and for both groups. Unfortunately, simulation studies (e.g. ) have found that the technique is sensitive to violation of this assumption. Consequently, it is better to use, as we have done, the 'multivariate' approach to repeated measures, as it does not require the assumption. We might even say that it is surprising that the univariate mixed model approach still seems to predominate in the literature; perhaps this reflects the slowness with which research results make their way into textbooks.
The above analysis centres on a possible difference in slopes, and a slope itself is the ratio of the difference in the dependent variable to the corresponding difference in the independent variable. This leads us to comment on having treated the dependent variable as numeric, though it was actually grade of intensity of skin signs, which is really only ordinal. Abt did this too, and it is common practice, provided that one examines the data carefully to check that nothing untoward is going on. Our comment is to draw attention to the following example [3, p. 48], which brings home (in a simpler context) the disadvantage of taking too conservative an approach to ordinal data. Consider two students: Bob's grades are A B B B B B C C D D F, whereas Carol's are A A A B B B B B C C C. Summarizing their records with a truly conservative approach, we would determine the middle score for the students, which is B for both Bob and Carol. Most people, however, will agree this is inappropriate, and that the grade point average (2.3 for Bob, and 3.0 for Carol) is a more appropriate summary.
T. P. HUTCHINSON
Department of Psychology, Macquarie University, Sydney, Australia
1. Abt K. How statistics can 'lie'. Eur J Anaesthesiol
2. Boik RJ. A priori tests in repeated measures designs. Psychometrika
3. Dominowski RL. Research Methods.
Englewood Cliffs, New Jersey: Prentice-Hall, 1980.