Health-related quality-of-life (HRQOL) outcomes are established components of comprehensive evaluations of health care interventions. Missing data are a significant problem in clinical trials and longitudinal studies involving HRQOL outcomes. ^{1} Clinical trials of treatment interventions for HIV-related disease, oncology, end-stage renal disease, and other chronic diseases often have missing data because of mortality or other reasons. Having missing data complicates the statistical analysis and, depending on differential mortality between groups and the extent and nature of the missing data , makes it difficult to interpret treatment effects and can introduce significant bias in treatment comparisons.

HRQOL measures present different problems than clinical measures in both interpretation and statistical analysis. ^{1} For example, HRQOL outcomes are often multidomain, and these different HRQOL domains are usually correlated. Missing data pose special challenges to HRQOL outcomes ^{2} and cannot be ignored without introducing bias into treatment comparisons and data analyses. The missing data produce systematic bias within treatment groups and introduce differential bias in the between-group comparisons. Clearly, prevention of missing HRQOL data, when possible, is best. ^{1} Missing data as a result of death, hospitalization, or disease progression are unavoidable in clinical trials involving HRQOL outcomes in chronic disease populations.

Missing data in clinical trials occur because of mortality, disease progression, treatment-related toxicity, and for reasons unrelated to treatment or the underlying disease. ^{1,3} Little and Rubin ^{4} identify 3 processes underlying missing data : missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In MCAR, the missing HRQOL assessments are independent of past, current, and future assessments of HRQOL. For MAR, missing assessments are dependent on previous assessments but are independent of current and future HRQOL assessments. Informative (MNAR) missing data are associated with current and future HRQOL outcomes. Missing HRQOL observations are commonly MAR or MNAR. Informative missing data processes (MNAR) are most troublesome for the comparison of treatment group differences. Bias introduced into treatment group comparisons is proportional to differences in mean HRQOL scores between subjects with and without missing data and to the proportion of missing data . ^{3} There is no way to test for MAR or MCAR without the information contained in the missing data , and different statistical analysis models may result in biased estimates of treatment effects. ^{5}

There is no currently accepted technique for imputing missing HRQOL scores in clinical trials. ^{1,3,5} The common approach of last value carried forward (LVCF) when there are few data missing resulting from mortality is acceptable, but when there is increasing mortality in treatment groups, LVCF may underestimate the effect of treatment. The use of LVCF when there are differential mortality rates between treatment groups results in biased estimates of between-group differences. In situations in which HRQOL scores are declining over time, imputation using LVCF results in upwardly biased estimates.

Diehr and colleagues ^{6} recommend several strategies for incorporating information about mortality into the analysis of HRQOL over time. Their intent was to incorporate death and HRQOL into a single outcome indicator. Many of the strategies modify continuous HRQOL scores into dichotomous or trichotomous outcomes, thus losing some of the richness of the original patient health outcomes. When an arbitrary extreme value is assigned to end-point HRQOL scores for patients who die (eg, 0), there is often little empirical justification supporting this value, and the distribution of HRQOL scores may be affected. Depending on the value selected, this approach may underweight or overweight the impact of death in the statistical analysis of HRQOL data. When mortality rates are relatively high, sensitivity of the HRQOL scores may be compromised.

Few systematic comparisons of different imputation methods have been completed. It is largely unknown what the real differences are between the alternative approaches and their characteristics under different conditions (ie, mortality rate, change in HRQOL). Little and Rubin, ^{4} Rubin, ^{7} Lavori et al, ^{8} and Heyting et al ^{9} are exploring various multivariate techniques for imputing missing values in clinical studies, but there have been few practical applications of these techniques in the HRQOL literature. More recently, a special issue of Statistics in Medicine ^{2} discussed missing data and statistical analysis techniques for oncology clinical trials including HRQOL outcomes.

This study was designed to evaluate 4 imputation methods for missing values using simulation data sets that systematically vary rates of mortality, rates of change in physical health status (PHS) scores, baseline PHS, and sample size. The 4 imputation techniques—LVCF, arbitrary substitution (ARBSUB), within-subject modeling (WSMOD), and empirical Bayes (BAYES)—were selected to cover a range of approaches to imputing missing data . The intent was to examine the bias associated with each imputation technique. A model was developed for generating the simulated data sets with a known relationship between PHS and mortality and known characteristics in terms of change in PHS and mortality rate. The simulated data represent a study in which patients are followed for 18 months with PHS assessments at baseline and at 6, 12, and 18 months based on a completed clinical study. ^{10} Some of the parameters for the simulated data (ie, baseline PHS scores, changes in PHS, mortality rates) were selected to be comparable to this clinical trial. This was done to ensure that the simulated data were comparable to observed clinical trial data.

Methods
Simulation Study Plan
The general plan for this study involved (1) generation of PHS and mortality data for a 2-group comparative study based on a realistic model, (2) censoring of data on the basis of a mortality indicator, (3) application of the 4 imputation methods , and (4) comparison of how well each imputation technique recovers the true characteristics of the original simulated data. The data simulation replicates a study in which subjects are randomly assigned to 2 treatment groups and subjects are followed with 6-month PHS assessments over 18 months. For the simulations, it is assumed that 1 treatment group demonstrates superiority on both mortality and PHS outcomes. The following dimensions were varied. 1. Sample size. We used 100, 200, or 500 subjects in each treatment group. These specific group sample sizes were selected to cover the range of sample sizes observed in clinical trials incorporating PHS outcomes. 2. Mortality rates. The mortality rates were 0%, 10%, 20%, or 30% over 18 months. These mortality rates were selected to conform to a range of rates observed in PHS studies involving patients with chronic medical conditions. It was thought that a 30% mortality rate was a reasonable upper bound, since greater rates of missing data may require alternative data analysis procedures. 3. Change in PHS from baseline to 18 months. These changes were −20, −10, −5, 0, 5, 10, or 20 points, assuming scale scores ranging from 0 to 100 points, with higher scores indicating better PHS. We selected these different increases and decreases in PHS outcomes to correspond to small, moderate, and large differences between the treatment groups to represent the range in differences in PHS outcomes observed in clinical trials. 4. Baseline mean PHS scores of 45 or 70 (on the same 100-point scale). We were concerned that baseline status might impact estimates of change in PHS over time (ie, floor effects), especially for groups starting with relatively low baseline scores. The relatively low value (ie, 45) was based on observed physical function scale scores in the ESRD study. ^{10} The 70 baseline value was based on the mean physical function score for the SF-36 normative population ≥45 years of age. ^{11}

All combinations of the simulation study parameters were examined. We completed 50 replications of each constructed sample size, mortality, and PHS change data file.

Imputation Techniques
Four imputation techniques for missing values were examined: LVCF, ARBSUB, WSMOD, and BAYES. These specific imputation techniques were selected to include a simple and commonly used technique (eg, LVCF) and another recommended simple but unevaluated technique (eg, ARBSUB) to contrast with more complicated imputation methods (eg, WSMOD and BAYES). In all the simulations, if an imputed value was <0, it was assigned a value of 0.

Last Value Carried Forward.
In the LVCF approach, the last nonmissing PHS observation is used as the end-point score for all subsequent missing observations. This is a common technique for dealing with missing values in data analysis. This technique reflects an implicit model that PHS scores do not change after the last observation.

Arbitrary Substitution.
ARBSUB is comparable to an imputation method described by Diehr et al ^{6} and assumes that death is an indicator of worsening health. Missing PHS scores are assigned the previous complete assessment minus a decrement to account for the fact that the current assessment is missing because of mortality. A 15-point decrement was used that corresponds to the difference between mean SF-36 physical function scale scores for persons reporting that their health is unchanged and those reporting that their health worsened. ^{11} This difference was observed in a sample of ESRD patients in a clinical trial. ^{10} We arbitrarily selected the physical function score as the indicator. This decrement has some empirical basis and it was thought more reasonable than assigning 0 or some other arbitrary low score.

Within-Subject Modeling.
For WSMOD, the trajectory of PHS scores for subjects with missing values is estimated on the basis of the nonmissing observations using ordinary least squares (OLS) regression. ^{12} This approach requires the availability of PHS data for a minimum of 2 measurement occasions and estimates the missing PHS scores from the observed rate of change for subjects with missing values. The predicted WSMOD estimates are then substituted for missing values.

Empirical Bayes.
The BAYES with informed censoring imputation method was developed by the authors and is based on previous work by Mori et al. ^{13} This method groups subjects with n-r sequential observations and r sequential missing observations because of death. For each subject, we estimate an individual slope that is a weighted average of the individual’s slope for available time points and the OLS estimate for the population slope. The weights associated with the population slope and the individual slope are V_{I} /(V_{I} + A_{I} ) and 1 − (V_{I} /[V_{I} + A_{I} ]), respectively. V_{I} is the variance of an individual’s slope, and A_{I} is the variance of the population slope. In addition to this weighting procedure, a censoring parameter is incorporated into the population slope estimate. This allows the population slope to be conditioned on censoring time, which in this example is equivalent to the number of observations per subject. If subjects who die earlier have a higher rate of PHS decline over time, that steeper decline is reflected in the BAYES estimates for these individuals. While Mori et al ^{13} only estimates these slopes, we estimate an analogous intercept to estimate predicted values from this model. These predicted values are substituted for the missing data .

Rationale and Development of Simulated Data Files
One difficulty in assessing the effectiveness of missing data imputation techniques is establishing a standard for evaluating the relative success of each method. Diehr et al ^{6} made use of actual data sets and applied several imputation methods , analyzed the data, and contrasted the results. If there was little variation in the resultant parameter estimates across imputation methods , they concluded that the analysis was robust to method of imputation. If there was variability in the parameter estimates, greater caution in interpreting the results was warranted. We considered using either actual or simulated PHS data sets. Although the use of actual PHS data from clinical studies has practical advantages, we were concerned about uncertainty in knowing whether missing data were MAR or MNAR. We therefore decided to complete a data simulation study to assess the adequacy of the selected imputation methods . In these simulations, we were able to specify different “known” patterns of missing PHS data according to a known and specified mechanism for missing data . In addition, we were able to examine a range of missing data rates and different changes in PHS and the subsequent effect of varying these characteristics on the different imputation methods . The strength of the simulation data approach is the investigator control over key parameters as part of the evaluation of imputation techniques.

The data simulation strategy consists of 3 components. The first component involved developing a simple structural equation model ^{14,15} for generating the longitudinal simulated data sets. The model was specified to ensure a “known” relationship between mortality and PHS that is consistent with observed empirical data. ^{10} We specified a structural equation model with 4 latent (unobserved) variables (the Figure ). Each latent variable corresponds to a measurement occasion and has 2 observed indicators, a mortality indicator and a PHS indicator. The latent variables are specified as continuous, normally distributed variables. For any 2 adjacent time points, corresponding latent variables have a 0.800 correlation. The mortality indicator has a loading of 0.995 with its associated latent variable, and the PHS indicator has a loading of 0.964. The loading of the PHS indicator was selected to correspond to the square root of the reliability of the SF-36 physical function scale. ^{11} The observed variables have associated error terms that are assumed to be independent of each other, and the variances of the latent variables were fixed at 1.0. We did not specify correlated error terms for adjacent time points in the model because there was no basis to make a good estimate from the literature.

Fig: Structural equation model for simulated data. Q1 through Q4 indicate observed PHS at times 1 through 4; M1 through M4, observed mortality indicator at times 1 through 4; H1 through H4, latent (unobserved) health status at times 1 through 4; and e, error terms in the model for observed PHS and mortality indicators.

The second component consists of using the specified model to generate the base simulated data set. This was accomplished by evoking a random number generator to create the values for the latent variables with the associated restrictions. The observed variables were then generated by creating a linear combination of the latent variables with an appropriate randomly generated error term. In this way, we generated the simulation data with known population characteristics.

Finally, simulated data sets were constructed for each treatment group in which the change in PHS over time and the mortality rates in each group were varied in prespecified ways. Increases or decreases in PHS over time were introduced to define specified treatment group effects over time. For example, we constructed a simulated data file with a 10-point increase in PHS for group A and a 10-point decrease in PHS scores for group B. This data set was the “complete” version of the simulated data set. The final step was to modify the files with missing data resulting from death. Mortality for each individual subject was derived from the mortality indicator and a draw of a random number from a similar distribution to the mortality indicator. SAS was used to generate the simulated data sets. ^{16}

Data Analysis
A subset of the total number of combinations evaluated is included in this report and was selected to represent a plausible range in differences between treatment groups and mortality rates and to simplify the presentation of results. The selected subset is representative of the complete simulation study findings.

Two approaches were used to compare how well the 4 imputation techniques recovered the true characteristics of the simulated PHS data. First, we calculated a pseudo–root mean square residual (RMSR) for the differences between the true and imputed values for each simulation. ^{17} Lower RSMR values indicate less biased estimation of the true population characteristics, ie, group means at each time point. No differences in RMSRs or estimates of recovered slopes were observed by sample size (P = 0.469); therefore, we collapsed the data into single files for analysis. Second, we compared the ability of the 4 imputation methods to recover the true population slopes reflecting rate of change in PHS in the 2 treatment groups by evaluating the difference between the true and estimated PHS slopes for each imputation technique.

We compared RMSRs and differences between true and estimated slopes for the 4 imputation methods using analysis of variance (ANOVA). The ANOVAs included factors for mortality rate (6 levels), PHS change (11 levels), and imputation method (4 levels). Planned contrasts were made between the parameter estimates based on LVCF and the other 3 imputation methods . ANOVAs were run separately for baseline PHS scores of 45 or 70. Given the large number of statistical comparisons, a Bonferroni-adjusted ^{18} value of P = 0.0001 was used to evaluate statistical significance (0.05/528 = 0.00009, rounded to 0.0001).

Finally, we compared between group differences in end-point PHS means between the true population and imputation methods using ANOVA. Statistically significant overall differences were followed by Tukey-adjusted comparisons between pairs of imputation methods .

Results
Comparison of RMSRs by Imputation Method
When the baseline mean was 45, the ANOVA for comparing RMSRs by imputation method yielded an overall F statistic of 312.47 (P <0.0001). We found significant interaction effects for mortality rate by imputation method (F = 96.50, P <0.0001), change in PHS by imputation method (F = 2.13, P = 0.0005), and mortality rate by change in HRQOL (F = 3.18, P <0.0001). The 3-way interaction between mortality rate, change in PHS, and imputation method was not significant (F = 0.38, P = 0.999). The more data that were missing because of mortality, the greater the RMSR was (not shown) (P <0.0001).

Table 1 (top) summarizes mean RMSRs by imputation method for the main effects levels of mortality between the 2 simulated treatment groups when baseline means were 45. The RMSR data indicate that ARBSUB results in statistically significantly lower mean RMSR values across all categories of mortality rate (P <0.0001). The RMSR for BAYES does not differ from the RMSR for ARBSUB when the group A mortality rate is 0% and group B mortality rate is 30% (P = 0.027) and when the group A mortality rate is 20% and group B mortality rate is 30% (P = 0.070). In general, the rank ordering of imputation methods by lowest mean RMSR across all simulations was ARBSUB, BAYES, LVCF, and WSMOD. RMSR values increase as the amount of missing data increases, regardless of imputation method.

Table 1: Comparison of RMSRs by Imputation Method and Mortality Rate for Baseline Means of 45 and 70

RSMR results for change in PHS are summarized in Table 2 . There were statistically significant differences between each pair of imputation methods (P <0.0001). For every level of PHS change, ARBSUB has the lowest mean RMSRs (5.10–5.79). The rank ordering of imputation methods by RMSR values was as follows: ARBSUB, BAYES, LVCF, and WSMOD. In most cases, the ARBSUB and BAYES RSMRs were not significantly different.

Table 2: Comparison of RMSRs by Imputation Method and Change in PHS for Baseline Means of 45 and 70

When the baseline mean was 70, the ANOVA for comparing RMSRs by imputation method yielded an overall F statistic of 275.84 (P <0.0001). All 2-way interactions were significant (mortality rate by imputation method, F = 111.60, P <0.0001; change in PHS by imputation method, F = 5.90, P <0.0001; mortality rate by change in HRQOL, F = 2.89, P <0.0001). The 3-way interaction between mortality rate, change in PHS, and imputation method was not significant (F = 0.41, P = 0.999). The more data that were missing by mortality, the greater the RMSR was (not shown) (P <0.0001).

Table 1 (bottom) summarizes the mean RMSRs by imputation method for the main effects of levels of mortality between the 2 simulated treatment groups for a baseline mean of 70. The RMSR data indicates that overall BAYES provided the best estimates of the missing scores. In only 1 situation, when group A mortality was 0% and group B mortality was 10%, did ARBSUB have a lower RMSR. This RMSR was not significantly different from the RMSR generated with BAYES. While the rank ordering of imputation methods observed was less consistent than that observed when the baseline mean was 45, BAYES appears to generate the smallest RMSRs, and WSMOD generates the largest. The RMSRs from ARBSUB and LVCF tended to be smaller than WSMOD and larger than BAYES. The RMSRs for LVCF and ARBSUB were not statistically different when group A mortality was 0% and group B mortality was 20%, when group A mortality was 0% and group B mortality was 30%, or when group A mortality was 10% and group B mortality was 20%. Overall, RMSRs increase as the amount of missing data increases, regardless of imputation method.

The RSMR findings for change in PHS are reported in Table 2 . BAYES generated the smallest RMSRs (7.05–7.34), and WSMOD generated the largest (9.81–10.16). ARBSUB and LVCF produced RMSRs that were smaller than those from WSMOD and larger than those from BAYES. In many cases, RMSRs from ARBSUB and LVCF were not significantly different from each other.

Comparison of True and Estimated Slopes by Imputation Method
An ANOVA was completed comparing the difference between true and estimated slopes, including main effects for mortality rate, PHS change, imputation method, and 2-way and 3-way interactions. When a baseline mean of 45 was specified, the overall ANOVA model was statistically significant (F = 11.96, P <0.0001). The only statistically significant interaction term was for mortality rate by imputation method (F = 37.55, P <0.0001).

When there are few missing data because of mortality (row 1), the BAYES, WSMOD, and LVCF methods yield comparable and slightly higher estimates of the true slopes (Table 3 ). The ARBSUB method slightly under estimates the true slope and differs significantly from the LVCF (P <0.0001) and WSMOD approaches (P <0.0001).

Table 3: Comparison of Difference Between True Slope and Estimated Slope by Imputation Method and Mortality Rate for Baseline Means of 45 and 70

In the simulations with more missing data , comparable rank ordering and patterns of differences between imputation methods are observed (Table 3 ). The more specified missing data in the simulations, the greater were the differences between the true and estimated slopes, regardless of imputation method. The BAYES estimates result in the least deviation from the true slopes (absolute value of differences, 0.0007–0.0043), and these differences between estimated and true slopes are significantly different from the other imputation methods (P <0.0001). In most cases, the estimated slopes are slightly higher than the true slope. The exception is for the case with the greatest amount of missing data (row 6), for which there is a slight underestimation of the slope (0.0043).

ARBSUB underestimates the true slopes in every simulation (Table 3 ), and the estimated-true slope differences are significantly different from the other imputation methods (P <0.0001). LVCF and WSMOD are comparable in estimating the true slopes (P = 0.008–0.436). The estimated slopes for WSMOD deviate slightly less compared with LVCF.

When a baseline mean of 70 was specified, the overall ANOVA model was statistically significant (F = 14.21, P <0.0001). The only statistically significant interaction term was for mortality rate by imputation method (F = 40.74, P <0.0001). Over the range of mortality rates and missing data , the magnitude of the deviation from the true slope for BAYES and WSMOD is comparable (Table 3 ), although BAYES underestimates the true slope while the WSMOD overestimates the true slope. LVCF tended to overestimate the true slope, and ARBSUB tended to underestimate the true slope. In all cases, BAYES provided the closest estimate to the true slope, with WSMOD providing the next best estimate in simulations with fewer missing data (rows 7 and 8, Table 3 ). ARBSUB provided a better estimate of the true slope than LVCF, although the reverse was observed for simulations with more missing data .

Contrast of Treatment Group Differences by Imputation Method
We examined estimated mean end points for groups A and B by imputation method across all mortality conditions. There were no differences between imputation methods for estimating group A end-point means (P = 0.840), regardless of percent of missing data resulting from mortality (range, 0% to 20% missing). There were statistically significant differences between the imputation methods for estimating group B end-point means (P <0.0001), with these differences primarily attributable to differences among imputation methods with 30% missing data (P = 0.0004).

We compared the group A and B end-point HRQOL mean differences for the true population and by imputation method. The true difference was 7.73 points. The BAYES method (difference, 7.93 points) was not significantly different from the true between-group difference (Tukey-adjusted P >0.05). Both the WSMOD (7.05-point difference) and LVCF (6.35-point difference) underestimated the true between-group mean differences; only the contrast between the true and LVCF was significant (Tukey-adjusted P <0.05). ARBSUB (8.99-point difference) overestimated the true between-group differences (Tukey-adjusted P <0.05).

Discussion
This is the first study that examines different methods for imputation of PHS scores when there are missing resulting from death or other MNAR reasons. From the RMSR findings, the ARBSUB or BAYES techniques appear to most closely estimate the end-point means and other parameters, regardless of amount of missing data and differences in changes in PHS scores. When baseline PHS scores were low (ie, 45), there was evidence of illogical values (ie, imputed scores <0) and floor effects (ie, imputed scores equal to 0) associated with ARBSUB that may have constrained RMSR estimates. Therefore, BAYES was best at reproducing population parameters regardless of baseline HRQOL score ,and it did not have the problem of illogical values (ie, <0) or floor effects. LVCF and WSMOD consistently provided more biased parameter estimates when baseline mean PHS scores were low (ie, 45). There were few differences between ARBSUB and LVCF when baseline means were 70, and WSMOD still consistently had higher RMSRs. These methods introduce more variance when estimating missing PHS scores with increasing missing data .

BAYES clearly was most successful at estimating the true slopes for changes in PHS scores, regardless of extent of missing data and changes in PHS. The absolute differences between the true slope and estimated slope range from 0.0007 to 0.0206 over all group mortality rates examined in this simulation. In most situations, BAYES slightly underestimated the true slope; in the case with the most missing data , it slightly overestimated the slopes.

ARBSUB consistently overestimated the true slopes across each of the mortality rates. This overestimation was associated with baseline PHS scores; ie, when baseline means were 45, ARBSUB seemed to more closely replicate true slopes. Problems with illogical values (ie, < 0) and floor effects for imputed scores were demonstrated. Caution needs to be taken when using ARBSUB. In cases with large amounts of missing data resulting from mortality, ARBSUB resulted in PHS outcomes that are significantly higher than would be observed if complete data were available. The WSMOD and LVCF techniques consistently underestimated the true slopes, and this problem was exacerbated as mortality rates increased.

On the basis of these results, sample size has less impact on these imputation methods than the proportion of MNAR within each group. However, the smallest sample size evaluated in this study was 100 per group, and smaller samples may be impacted differently by missing data . It appears that the nature and amount of missing data are more important than sample size when estimating within-group change and between-group differences in PHS.

These findings may be generalizable to other MNAR reasons for missing PHS data, such as hospitalization or disease progression. From these results, BAYES is likely to work well at imputing missing PHS scores when there are moderate amounts of missing data . When PHS data are MCAR, the LVCF, BAYES, and other imputation methods adequately reproduce the missing PHS data. ^{19} It is unknown whether BAYES produces unbiased estimates of missing data when rates of missing data exceed 30% across groups. Unbalanced sample sizes between the groups are likely to impact the estimates of these imputation methods . A greater proportion of MNAR data in a group with a small sample size may have dramatic effects on end-point PHS estimates, and it is likely that these imputation methods cannot overcome this problem. Additional research is needed to extend these findings beyond the mortality rates examined in the present study.

Given the methods for constructing the simulation model, these findings are generalizable only to measures of physical functioning and health. The results may not be generalizable to indicators of mental health and psychological functioning that may not be as strongly correlated with mortality.

Research is needed to compare the BAYES technique to multiple imputation and mixed-model (random effects) ANOVA approaches to handling missing HRQOL data in statistical analyses. This research should be completed using simulated and actual data from clinical trials. Multiple imputation methods are technically complex, although computer programs are becoming more generally available. ^{7,8} Mixed-model ANOVA assumes that missing data are MAR and can handle incomplete HRQOL data. ^{5,20,21} Although there is ongoing research on multivariate statistical techniques for handling missing values in the analysis of end-point clinical and HRQOL scores, ^{5,7,8,20–26} techniques focused on providing unbiased and robust estimates of missing scores are still needed.

What recommendations can be made from this study about imputation of missing values due to mortality in the analysis of between-group differences in PHS outcomes? First, when there are very few missing data (<10%) and comparable rates of mortality across treatment groups, all the imputation methods provide consistent and unbiased estimates of treatment differences. Even when there is differential missing data resulting from mortality and relatively few missing data (<15% across groups), these imputation methods are fairly comparable, although there is a definite advantage for BAYES. ARBSUB produces PHS values <0 when baseline scores are low and when there are larger amounts of missing data . When missing data rates exceed 20%, the BAYES technique is the best approach. WSMOD is also recommended for cases when there is a moderate amount (20%) of missing data . WSMOD slightly underestimates change in PHS and therefore is a bit more conservative than BAYES. WSMOD is not recommended for cases with missing data rates exceeding 20% when baseline mean PHS scores are low (ie, 45), since in this case this method significantly underestimates change in PHS.

It is clear that LVCF does not adequately capture the between-group differences when there is moderate to high rates of missing data . Other methods for imputing missing PHS scores, such as the BAYES technique, may provide better estimates of differences between treatment groups. Missing HRQOL data in clinical trials cannot be ignored, and improved methods are needed for imputing missing scores and for statistical analysis of data with missing values.

Acknowledgment
This research was supported in part by Amgen, Thousand Oaks, Calif.

References
1. Bernhard J, Cella DF, Coates AS, et al. Missing quality of life data in cancer clinical trials: Serious problems and challenges. Stat Med 1998; 17: 517–532.

2. Bernhard J, Gelber RD, eds. Workshop on

missing data in quality of life research in cancer clinical trials: Practical and methodological issues. Stat Med 1998;17:511–796.

3. Curran D, Molenberghs G, Fayers PM, et al. Incomplete quality of life data in randomized trials: Missing forms. Stat Med 1998; 17: 697–710.

4. Little RJ, Rubin DB. Statistical analysis with

missing data . New York, NY: John Wiley & Sons; 1987.

5. Fairclough DL, Petersone HF, Cella D, et al. Comparison of several model-based methods for analyzing incomplete quality of life data in cancer clinical trials. Stat Med 1998; 17: 781–796.

6. Diehr P, Patrick DL, Hedrick S, et al. Including deaths when measuring health status over time. Med Care 1995; 33 (suppl): AS164–AS172.

7. Rubin DB. Multiple imputation for nonresponse in surveys. New York, NY: John Wiley & Sons; 1987.

8. Lavori PW, Dawson R, Shera D. A multiple imputation strategy for clinical trials with truncation of patient data. Stat Med 1995; 14: 1913–1925.

9. Heyting A, Tolboom JTBM, Essers JGA. Statistical handling of drop-outs in longitudinal clinical trials. Stat Med 1992; 11: 2043–2061.

10. Besarab A, Bilton WK, Browne JK, et al. The effects of normal as compared with low hematocrit values in patients with cardiac disease who are receiving hemodialysis and epoetin. N Engl J Med 1998; 339: 584–590.

11. Ware JE, Snow KK, Kosinski M, et al. SF-36 Health Survey: Manual and interpretation guide. Boston, Mass: The Health Institute, New England Medical Center; 1993.

12. Kleimbaum DG, Kupper LL. Applied regression analysis and other multivariable methods. North Scituate, Mass: Duxbury Press; 1978.

13. Mori M, Woodworth GG, Woolson RF. Application of empirical Bayes inference to estimation of rate of change in the presence of informative right censoring. Stat Med 1992; 11: 621–631.

14. Long JS. Covariance structure models: An introduction to LISREL. Newbury Park, Calif: Sage University Press; 1983.

15. Bollen KA. Structural equations with latent variables. New York, NY: John Wiley & Sons; 1989.

16. SAS/STAT user’s guide, release 6.03. Cary, NC: SAS Institute; 1996.

17. Zar JH. Biostatistical analysis. 3rd ed. Upper Saddle River, NJ: Prentice Hall; 1996.

18. Miller RG. Simultaneous statistical inference. 2nd ed. New York, NY: Springer-Verlag; 1981.

19. Revicki DA. Incorporating

mortality into quality of life data analysis: Methods and issues. Presented at Quality of Life: Constructs and Measures, Drug Information Association, January 1997, Scottsdale, Ariz.

20. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics 1982; 38: 963–974.

21. Wu AW, Gray SM, Brookmeyer R. Application of random effects models and other methods to the analysis of multidimensional quality of life data in an AIDS clinical trial. Med Care 1999; 37: 249–258.

22. Zwinderman AH. Statistical analysis of longitudinal quality of life data with missing measurements. Qual Life Res 1992; 1: 219–224.

23. Zee BC. Growth curve model analysis for quality of life data. Stat Med 1998; 17: 757–766.

24. Troxel AB. A comparative analysis of quality of life data from a southwest oncology group randomized trial of advanced colorectal cancer. Stat Med 1998; 17: 767–780.

25. Fairclough DL. Summary measures and statistics for comparison of quality of life in a clinical trial of cancer therapy. Stat Med 1997; 16: 1197–1209.

26. Troxel AB, Fairclough DL, Curran D, et al. Statistical analysis of quality of life with

missing data in cancer clinical trials. Stat Med 1998; 17: 653–666.