The assessment of efficacy of treatment in clinical trials that aim to prevent or treat acute kidney injury (AKI) and of the predictive values of AKI biomarkers depends critically on the choice of outcome metric. Most of these are surrogate outcomes for renal replacement therapy or death and are plasma creatinine based. Predictive values are usually assessed against a categorical metric, such as the relative or absolute increase in creatinine above a prespecified level (^{1}^{,}^{2}). Most clinical trials have used a categorical metric as primary outcome (^{3}^{,}^{4}); however, some trials have used a noncategorical, or continuous, metric, such as peak creatinine (^{5}^{,}^{6}). For both types of metric, change in creatinine is a surrogate for change in GFR. A lack of consensus in definition of AKI and its outcomes and in timing of intervention in clinical trials has impaired comparison among trials and perhaps contributed to negative outcomes of some trials. Two groups, the Acute Dialysis Quality Initiative (ADQI) and the Acute Kidney Injury Network (AKIN), have produced creatinine-based definitions and grading of severity of AKI, namely the RIFLE and AKIN criteria, respectively (Table 1) (^{7}^{,}^{8}). These criteria incorporate relative (percentage) changes in creatinine. A third criterion was recently proposed on the basis of absolute change in creatinine within specified time periods (Waikar and Bonventre [W&B]) (^{9}). Only RIFLE incorporates the alternative of using GFR decrease. The latter definition was recently corrected (^{10}). AKIN defines AKI as an abrupt rise over 48 h, and W&B defines AKI as an abrupt rise in 24 to 48 h. The original published RIFLE definition did not define the period, although ADQI recommended (http://www.ccm.upmc.edu/adqi/ADQI2/ADQI2g1.pdf) and later defined AKI as a sustained rise (>24 h) within 7 d (^{11}^{,}^{12}). Both RIFLE and AKIN make use of change in creatinine relative to a baseline, which is either measured or estimated using the Modification of Diet in Renal Disease (MDRD) formula (^{13}). RIFLE and AKIN criteria have been evaluated in epidemiologic studies (^{12}^{,}^{14}^{–}^{17}). Despite agreement on the importance of classifying severity, a striking variation remains between estimates of AKI incidence (11 to 67%) and mortality (17 to 70%).

Recently, Solomon and Segal (^{18}) used pharmacokinetic modeling to demonstrate how differing baseline values, or basing the definition of AKI on relative or absolute change in creatinine, can alter diagnosis for an identical change in GFR. Similarly, Waikar and Bonventre (^{9}) showed that a categorical metric that is based on absolute rise in creatinine and incorporates time enables earlier diagnosis of AKI than a metric that is based on relative rise.

Two possible new outcome metrics that incorporate the extent, rate, and duration of change in creatinine are the average change in creatinine above baseline (AVC) and the AVC relative to baseline (RAVC), which we introduce here (Figure 1). These are continuous metrics, which we hypothesized would detect small treatment benefits that reflected amelioration of GFR reduction.

We assessed these metrics by modeling creatinine change in 10,000 virtual inpatients (VIPs) with varying degrees of severity of AKI typical of a general intensive care unit (ICU) population. We compared continuous and categorical metrics in both prevention and intervention trials, and we used the model to examine the effect on outcome metrics of estimating baseline creatinine by the ADQI-recommended method of back-calculation from an assumed GFR (^{7}).

## Materials and Methods

Plasma creatinine changes with time were modeled with a one-compartment pharmacokinetic model (^{19}^{,}^{20}). Full details of the derivation of the formula used are given in Appendix 1. Briefly, creatinine as a function of time is as a follows:

where *C*_{0} is the concentration at time, *t* = 0, *k* is the renal elimination rate constant and *C _{ss}* is the steady-state concentration (see Figure 1).

When there is a decrease in GFR, the creatinine concentration must increase over time to a new steady state:

where Δ*g* is the percentage decrease in GFR and *C _{b}* is the baseline creatinine.

### Output Metrics

The proportion of patients in each categorical severity stage metric defined in Table 1 was determined from change in creatinine from baseline (*C _{b}*). Total RIFLE and total AKIN were evaluated as trial outcome metrics. The severity stage metrics R, I, and F from RIFLE and I, II, and III from AKIN are not normally used to define trial outcomes but are presented to aid understanding of how treatment changes the proportion of patients in each category; for simplicity, only the largest group of patients, RIFLE R and AKIN I, are shown in the figures.

The AVC (mg/dl; see Figure 1) over time period *t* is:

The RAVC is the AVC relative to baseline creatinine:

To simulate a clinical situation with a limited number of measurement times, the RAVC was calculated numerically using the trapezoidal method for the discrete time points *tm* = {0, 12, 24, 48, 72, 96, 120, 144} hours. The mean RAVC for the VIP population is presented in all graphs.

### VIP Population for Comparison of Trial Outcome Metrics

Equations 1 and 2 were used to model a population of 10,000 VIPs with initial conditions *C _{0n}*,

*C*, Δg

_{bn}*, and*

_{n}*k*, where subscript

_{bn}*n*denotes VIP number

*n.*The rate constant after a decrease in GFR for VIP

*n*then becomes

*k*=

_{n}*k*By substituting for

_{bn}C_{bn}/C_{ssn}.*k*,

_{n}*C*,

_{0n}*C*, and

_{bn}*C*into equation 1 we calculated the creatinine concentration at time points

_{ssn}*tm*after a decrease in GFR at time 0.

### Initial Values

Realistic initial conditions were defined from seven large hospital inpatient or ICU population studies (Table 2). A distribution of 10,000 initial creatinine values (Figure 2) was created using a log-normal distribution with a mean and SD comparable to the populations reported in Table 2 (^{26}^{–}^{28}). Each individual VIP was assigned an initial creatinine *C _{0n}* from this distribution. For calculating outcome metrics, the initial creatinine value served as the baseline creatinine (

*C*=

_{bn}*C*).

_{0n}Because GFR is inversely proportional to creatinine, the distribution of reported change in creatinine (Table 2) was used to estimate the distribution of initial change in GFR (Table 2, Figure 3). The studies of Ahlmstrom and Hoste had much higher rates of AKI than others and were treated as outliers. Of the remaining studies, 76 ± 10% (mean ± SD) patients had a GFR decrease of <33%, 12 ± 4% of 33 to 50%, and 11 ± 6% of >50%. To create a VIP population with the same distribution of AKI, decreases in GFR, Δg* _{n}*, were randomly distributed among the 10,000 VIPs. Thus, 77% of VIPs had a GFR decrease of <33%, 13% a decrease between 33 and 50%, and 12% a decrease >50% (Figure 3). The third metric,

*k*, is not typically measured. It may be calculated using the patient weight and creatinine clearance (see Appendix 1). For the simulation, our calculations showed that a single value of

_{bn}*k*could be used for all VIPs, in this case 0.1/h, without losing the power of the model to discriminate between differences in GFR change.

_{bn}### Calculated Changes

To simulate a prevention trial, we randomly assigned VIPs to two equal-sized groups, untreated and treated (each *n* = 5000). In the untreated group, there was no change in initial conditions, including GFR decrease Δg* _{n}*; therefore, the creatinine time course remained unaltered for each VIP. In the treated group, the GFR decrease was ameliorated for each VIP at time

*t*= 0, and the creatinine time course was recalculated. The extent of amelioration (called the treatment efficacy) was initially set at 10% for all VIPs (

*i.e.*, Δg

*[treated] = 0.9 × Δg*

_{n}*[before treatment]). The procedure was repeated for treatment efficacies of 20 to 90% in steps of 10%. A 100% treatment efficacy is equivalent to no change in GFR and complete prevention of AKI. A 0% treatment efficacy produced no amelioration of GFR decrease.*

_{n}We undertook three subgroup analyses to simulate intervention. VIP population subsets (subgroups) were selected to match predetermined decreases in GFR, which were severe enough to be ranked as the corrected R and F categories by RIFLE (^{10}), so that populations had respective decreases of >33.3% (VIP-33; *n* = 2286) and 66.7% (VIP-67; *n* = 427), thus corresponding to the R or higher and F categories in Table 1. A third high-risk subgroup was selected on the basis of an initial creatinine of >1.5 mg/dl (VIP-HR; *n* = 710). The same approach to treatment efficacy was then applied as for the prevention trial; that is, treatment produced a degree of amelioration of GFR decline referred to as the treatment efficacy. The ideal assumption is made here that AKI has been diagnosed by a biomarker of injury before GFR has declined, so although the patients are predicted to sustain a decrease in GFR sufficient to increase creatinine above the threshold for classification as AKI, this decrease was either fully or partially reversed by the intervention. As for the prevention trial, each subgroup was randomly divided into control and treated groups, with recalculation of the subsequent creatinine time course after the GFR decrease of the treated cohort was ameliorated at *t* = 0. An alternative simulation could be performed by delaying either the reversal of GFR or the timing of intervention until after an increase in creatinine was detected. This was not attempted, because there are no published data that could be used to define the otherwise arbitrary distribution of timing of such a delay.

### VIP Population for Assessing the Methods of Defining Baseline Creatinine

The lowest plasma creatinine measured within 24 h of ICU admission of 1037 consecutive patients who were admitted to a general ICU in 12 mo (Christchurch Hospital, 2007) defined the initial creatinine distribution, *C _{0}*, of a second VIP population. Patients who had an initial creatinine >3.0 mg/dl (

*n*= 14) or were younger than 17 yr (

*n*= 4) were excluded. The distribution of the decrease in GFR applied to this group was the same as for the first VIP population (Figure 3).

We used three methods for determining the baseline creatinine, *C _{b}*, for each VIP. For method A, the baseline was the same as the initial ICU creatinine,

*C*For methods B and C, baseline creatinine was back-calculated from the MDRD formula using an assumed GFR of 75 or 100 ml/min, respectively (

_{0}.^{13}).

### Statistical Analysis

The effect size is a measure of the relative ability of the outcome metric to distinguish between the treated and control groups at a given treatment efficacy. The effect size (Cohen's *d* statistic) was calculated as the difference between the means divided by the pooled SD; this is independent of the number of VIPs (^{29}).

## Results

### Time Course of Creatinine

The creatinine distribution for the VIP population was calculated as a function of time using equation 1 for an initial decrease in GFR, Δ*g _{n}*, and using the input values in Table 3. Figure 4 depicts the increase in creatinine relative to baseline for all 10,000 VIPs.

### Prevention Trial

Preventative treatment reduced RAVC approximately in proportion to the treatment efficacy, from 23.5 in the untreated group at 48 h to 1.6 in the treated group with a 90% treatment efficacy (Figure 5A). This corresponded to an effect size of 1.01 (Figure 5C). The relationship between treatment efficacy and effect size for the RAVC was approximately linear. The AVC showed a similar result with linear increase in effect size from 0.12 to 0.91. Because the AVC behaved very similarly to the RAVC in all groups, it is not shown in the graphs.

The categorical metrics had a negative sigmoidal (nonlinear) relationship to treatment efficacy (Figure 5, A and B). The gradual reduction in total RIFLE and total AKIN at treatment efficacies below 40% was not evenly reflected in the severity categories, as illustrated by the inconsequential changes in RIFLE R and AKIN I below a treatment efficacy of 40%. Total RIFLE, total W&B, and total AKIN classified no patients with AKI at treatment efficacies above 60, 70, and 80%, respectively.

At 48 h, the effect size was greater for RAVC than the categorical metrics for all treatment efficacies (Figure 5C). At 144 h, the effect size of RAVC was reduced for treatment efficacies above 40 to 50%, whereas there was little change in the effect size of the other metrics (Figure 5D).

### Intervention Trials

For intervention in the VIP-33 subgroup, RAVC and total W&B exhibited a linear relationship with treatment efficacy, and total RIFLE and AKIN showed an inverse sigmoidal relationship (Figure 6, A and B). The categorical variables erroneously classified all patients as showing 100% amelioration at treatment efficacies above 60% (total RIFLE and W&B) and above 80% (total AKIN). Below a treatment efficacy of 40%, more patients were classified as AKIN I than in the untreated cohort. This results from more patients having dropped down from a higher severity category than having dropped below the threshold for AKIN I as the result of treatment. The effect sizes were considerably greater for this intervention trial than for the prevention trial using the full cohort (compare Figure 6, C and D, with Figure 5, C and D), especially for total RIFLE and total AKIN above treatment efficacies of 40 and 60%, respectively, for which these metrics are more discriminatory than RAVC.

For intervention in the VIP-67 subgroup, only RAVC exhibited an approximately linear relationship with treatment efficacy (Figure 7). The nonlinearity of the categorical metrics that were observed in VIP-33 was greatly exaggerated. Total RIFLE and total AKIN were unable to detect any treatment effect below 40 to 50% (Figure 7, C and D). At treatment efficacies of 20 to 50%, some VIPs whose initial classification was RIFLE F or AKIN III became classified as RIFLE R or AKIN I, increasing these metrics.

The nonlinearity that was observed in the categorical variables of the total VIP cohort was exaggerated in VIP-HR (Figure 8). Total W&B and total AKIN were less discriminatory at treatment efficacies below 40% in this subgroup than in the total cohort (compare Figure 8, C and D, with Figure 5, C and D).

### Using MDRD to Estimate Baseline Creatinine

The effect of using an assumed GFR to back-calculate baseline creatinine is contrasted with the use of the initial value (model A) in Table 4. Model B (assumed GFR of 75 ml/min) resulted in a decrease in all categorical and continuous metric classifications of AKI compared with model A, with the exception of AKIN severity stage III and the W&B stages 2 and 3. Model C (assumed GFR of 100 ml/min) resulted in an increase in all categorical and continuous metric classifications of AKI compared with model A. The lower mean baseline creatinine in model C compared with model B explains the higher RAVC, AVC, and categorical metrics.

Despite the excess of misclassification by model C, the mean baseline creatinine for models A and C were not significantly different. The difference in output metrics is due to the difference in distribution of baseline creatinine. The variance of model C was significantly smaller (*P* < 0.0001) and the median significantly higher (*P* < 0.0001) than for model A. In other words, estimating baseline creatinine values using a “one size fits all” GFR (model C) compressed the range of baseline values compared with a measured baseline population (model A). This resulted in an overestimation of the proportion of patients who were classified by total RIFLE, total AKIN, and total W&B of 36, 21, and 92%, respectively.

## Discussion

The optimal outcome metric for clinical studies of AKI is uncertain (^{9}^{,}^{18}). In this study, the RAVC (and AVC) showed an approximately linear relationship with treatment efficacy, correctly reflecting the extent of the targeted intervention. At low treatment efficacy, the categorical metrics tended to underestimate and at high treatment efficacy to overestimate the extent of intervention. These deviations were most pronounced in intervention trials (subgroups VIP-33 and especially VIP-67); however, as the risk of AKI in a population increases (*e.g.*, VIP-67 *versus* VIP-33), categorical metrics will become progressively less discriminatory between the treated and untreated groups. In contrast, continuous metrics, (*e.g.*, RAVC) maintain discrimination.

The RIFLE and AKIN criteria have been used in retrospective epidemiologic studies but less widely in prospective clinical trials. Two published prevention trials used RIFLE as a secondary outcome metric (^{30}^{,}^{31}). and one used RIFLE R as a randomization criterion (^{32}). RIFLE has been used as a gold standard against which to assess the predictive value of novel biomarkers (^{33}^{–}^{35}). Many other measures of creatinine continue to be used as outcomes of clinical trials, including the proportion of patients with a serum creatinine of >150 μmol/L (^{36}) or a >25% increase in 72 h (^{30}). Solomon and Segal (^{18}) argued that individuals with high initial creatinine or low GFR were more likely to be identified as having kidney injury by the AKIN criteria. Waikar and Bonventre (^{9}) argued for AKI to be classified by absolute rather than relative changes and to incorporate time, effectively making the rate of change the important metric.

Both continuous and categorical outcome metrics depend crucially on correct determination of baseline creatinine. Lower baseline creatinine estimates that were determined using an assumed GFR of 100 ml/min (MDRD) increased the prevalence of AKI regardless of the metric used. Similarly, in a pediatric population, Zapitelli *et al.* (^{37}) showed that a baseline creatinine derived from an assumed GFR of 100 or 120 ml/min (Schwarz) progressively increased the prevalence of AKI compared with a measured baseline creatinine. Bagshaw *et al.* (^{38}) demonstrated in a population with severe AKI that back-calculation by the MDRD formula overestimates AKI by RIFLE category. Importantly, it was the narrower distribution (smaller interquartile range) of the back-calculated baseline distribution (100 ml/min), compared with the measured creatinine baseline distribution that resulted in more patients' being classified as having AKI. In other words, even when the estimated baseline creatinine distribution has the same mean as the measured baseline creatinine of the population, the use of this method for calculating baseline creatinine values will overestimate the incidence of AKI.

The implication for clinical trials is that it is not merely the mean and SD of the creatinine that enable us to compare trials but the distribution. For facilitation of comparisons between trials and epidemiologic studies, the medians and upper and lower quartile ranges of the populations should be published. Furthermore, every effort should be taken to use reasonable measured baseline creatinine values than estimated ones. In some cases, no preadmission creatinine is available. A strategy that involves using the lowest of the ICU values (^{39}) or even a post-ICU creatinine concentration (when this is lower than the ICU values) seems to be preferable to a formulaic estimate of baseline creatinine.

If total RIFLE, AKIN, or W&B is used as the primary or secondary outcome for a trial, then the results will depend on the baseline creatinine. These results highlight the potential for misinterpreting trial outcomes. The effect of partially efficacious treatment is to reduce the proportions in the more severe AKIN or RIFLE stages while increasing the proportion in the less severe stages. This may lead to the treatment arm's having higher proportions in AKIN I or AKIN II, for example, than the placebo arm. Caution is required when patients are stratified according to stage of severity; AKIN and RIFLE stages are inappropriate as stand-alone outcomes in clinical trials.

The results show that categorical metrics suggest perfect treatment when the amelioration of GFR decrease is not 100%. This is because categorical outcomes aggregate all creatinine changes below a cutoff (*e.g.*, 0.3 mg/dl in the case of AKIN) into one “non-AKI” category. A noncategorical metric, such as the RAVC, incorporates the differences among all patients. The RAVC also incorporates several time points rather than relying on a single point to classify a patient into a RIFLE or an AKIN stage. The importance of time in a definition of AKI was recently highlighted by the W&B model, which demonstrated that absolute change in creatinine was an earlier marker of a change in GFR for patients who had chronic kidney disease and developed AKI (^{9}). This study has not sought to establish RAVC as an alternative definition of AKI for an individual patient; rather, it is limited to RAVC as an outcome metric in clinical trials. The question of whether a snapshot of renal function or an average over several days in the ICU is a better predictor of outcome, such as death, needs validation in clinical studies.

A proportion of patients who enter the ICU will have had time for the disease to evolve from insult to GFR decrease and then to the point that this becomes detectable by an increase in creatinine. Intuitively, this highlights that using the entry to ICU value of creatinine as a baseline underestimates the prevalence of AKI. The presence of a proportion of both treated and placebo groups with a GFR decrease will then mask treatment efficacy in an intervention trial.

This study has limitations. Our evaluation is limited to “treatment efficacy,” defined as a proportional reduction in GFR decrease irrespective of initial GFR. It is possible that some treatments are more effective at ameliorating or preventing GFR decrease (*e.g.*, in patients who have a higher initial GFR). Within a population of treated patients, treatment efficacy is likely to be normally distributed. When the change in the outcome metric is approximately linear across this distribution, the modeled outcomes will be unaffected; however, when the changes are not symmetrical on either side of the mean treatment efficacy, the outcome metric curves will be “smoothed out” with less abrupt changes with treatment efficacy. The model, at present, does not allow for a transient rise in plasma creatinine, which is known to occur in some patients.

Our model further assumes that the rate of production of creatinine and the volume of distribution are constant during the 7 d and that nonglomerular creatinine excretion is negligible. In the ICU situation, total muscle mass changes at a rate of approximately 2%/d (^{40}). Plasma creatinine concentration does not decrease immediately and may in fact increase a little (^{41}). During 48 to 144 h, these changes are negligible. The volume of distribution may change, for example as a result of high volumes of fluid intake; however, this is unlikely to have a major effect on the rate of change of creatinine when the rate of elimination of creatinine is changing rapidly as in AKI. Furthermore, at any given GFR, tubular secretion will reduce creatinine and therefore affect both categorical and noncategorical outcome metrics, similarly reducing both RAVC and the chance of categorical classification of AKI. This does not change any of the conclusions that arise from this study.

Only one continuous metric has been modeled. Others, such as the maximal change in creatinine, are possible. Any difference between these categories and AVC and RAVC would require further evaluation that takes into account both the time point and the duration during which each metric is evaluated.

This creatinine-kinetic modeling highlights the importance to clinical trials of choosing an outcome metric in AKI that enables detection of small differences in GFR change. Categorical metrics are poorly discriminatory as outcome variables in populations with high proportions of patients with severe AKI. The RAVC (and AVC) are discriminatory irrespective of cohort composition and reflect treatment outcome across all degrees of treatment efficacy. When a method is used to estimate the baseline creatinine values of a population, it is important that the choice of estimated GFR approximate the real mean GFR of the population; however, a measured baseline creatinine is better still.

## Disclosures

None.

This study was supported by Health Research Council of New Zealand grant 05/131.

Published online ahead of print. Publication date available at www.cjasn.org.

## Appendix 1

A one-compartment pharmacokinetic model can be used to calculate the changes in creatinine with time (^{19}^{,}^{20}):

where *C* is the creatinine concentration, *t* is the time, *R* is the rate of production of creatinine, *V* is the volume of distribution, and *k* is the renal elimination rate constant. *R* and V are assumed to be constant; therefore equation A1 can be solved to give the creatinine as a function of time:

where *C _{ss}*, the steady-state creatinine concentration, is

Because creatinine is assumed to be entirely eliminated by renal clearance, the renal clearance, *Cl*, is simply the renal elimination rate constant multiplied by volume of distribution:

If we estimate the volume of distribution as 0.6*w* L, where *w* is the weight of the patient, and use a measured or estimated clearance, then we may estimate the initial renal elimination rate constant, *k.* Alternatively, knowing that the half-life of creatinine, *t _{1/2}*, is simply 0.693/

*k*, we may calculate

*k*if we measure or estimate the half-life. Figure A1 is a histogram of

*k*calculated this way for 382 patients who were admitted to ICU at Christchurch or Dunedin hospital and enrolled in the study EARLYARF (Early Intervention in Acute Renal Failure) (ACTRN12606000032550; http://www.actr.org.au). The mean ± SD was 0.12 ± 0.08/h and median (interquartile range) was 0.10/h (0.06 to 0.17/h).

## References

*versus*estimated baseline creatinine for determination of RIFLE class in patients with acute kidney injury. Nephrol Dial Transplant 24: 2739–2744, 2009