Albuminuria is often used as a clinical trial end point to establish drug efficacy in early stages of drug discovery. In these trials, the change in albuminuria is often determined between two predetermined time points: randomization and end of treatment (1–3). However, many more urine samples for albuminuria measurement are usually collected during the trial that are not used to determine drug efficacy. The use of only baseline and end-of-treatment albuminuria values to determine drug efficacy may be problematic because albuminuria is subject to large day-to-day variability (4). The large day-to-day variability in albuminuria may hamper the accuracy and precision of drug effect estimates if only two predetermined time points are used.
To our knowledge, no study has systematically assessed the effect of the number of albuminuria measurements on antialbuminuric drug effect estimates. First, we questioned whether multiple consecutive albuminuria collections per visit at baseline and end of treatment alter the average drug effect. Second, we questioned whether using the albuminuria data of multiple follow-up visits alters the average drug effect estimates. Finally, we assessed whether the precision of the drug effect changes when the frequency of urine collections is increased during follow-up.
Materials and Methods
Patients and Clinical Trials
Data from the Aliskiren Combined with Losartan in Type 2 Diabetes and Nephropathy (AVOID), Selective Vitamin D Receptor Activation for Albuminuria Lowering (VITAL), and Residual Albuminuria Lowering with Endothelin Antagonist Atrasentan (RADAR) trials were used for this study (1,2,5). These trials all enrolled patients with type 2 diabetes and macroalbuminuria.
In the AVOID trial, 599 patients with type 2 diabetes and a urinary albumin/creatinine ratio (UACR)>300 mg/g (or UACR>200 mg/g for patients on medication blocking the renin-angiotensin-aldosterone System [RAAS]) and eGFR≥30 ml/min were enrolled (2). After a run-in period of 3 months in which every patient received losartan 100 mg/d, patients were randomly allocated to either placebo or active treatment with aliskiren 150 mg/d with a dose increase to 300 mg/d after 12 weeks. During the 24-week follow-up period, first morning voided (FMV) urine was collected at randomization and at week 4, 8, 12, 16, and 24.
In the VITAL trial, 281 patients on stable RAAS medication with type 2 diabetes and nephropathy defined as a UACR of 100–3000 mg/g and eGFR between 15 and 90 ml/min were randomly allocated to 24 weeks of treatment with either placebo or active treatment with paricalcitol 1 µg/d or 2 µg/d (1). FMV urine collections were performed at randomization and 4, 8, 12, 16, 20, and 24 weeks thereafter. The study demonstrated a dose-dependent effect of paricalcitol on albuminuria. Paricalcitol 1 µg/d did not decrease albuminuria, whereas paricalcitol 2 µg/d did when all study visits over time were taken into account. Because there was no effect of paricalcitol 1 µg/d on albuminuria, we excluded this treatment arm for the purpose of this study.
The RADAR trial enrolled 211 patients taking the maximum recommended antihypertensive dose of RAAS medication who were diagnosed with type 2 diabetes and a UACR of 300–3500 mg/g and eGFR of 30–75 ml/min (5). After a 4- to 12-week run-in period, patients were randomly allocated to placebo or 12 weeks of treatment with either 0.75 mg/d or 1.25 mg/d atrasentan. FMV urine collections were performed at randomization and 2, 4, 6, 8, 10, and 12 weeks thereafter.
Urine Collections and Albuminuria Measurement
In all three trials, patients collected three consecutive FMV urine samples for albuminuria before each scheduled follow-up visit. Patients started collecting urine samples 2 days before each study visit and then collected the third FMV urine sample on the morning of the study visit. In all trials, urinary albumin (milligrams per liter) and urinary creatinine (grams per liter) were measured and subsequently expressed as the UACR (milligrams per gram). All assessments of urinary albumin and creatinine were performed by the same company (CRL Medinet in AVOID, Quintiles in VITAL, and QUEST in RADAR) in a central laboratory located in the United States, Europe, or Japan. The urinary albumin concentration was determined by immunoassays.
Baseline characteristics are presented as the mean and SD for continuous variables and counts and proportions for discrete variables. Because the distribution of albuminuria is skewed, albuminuria is presented as the geometric mean and SD.
Statistical analyses of albuminuria data were conducted using log-transformed UACR values, taking the non-normal distribution of albuminuria into account. All patients who collected three consecutive urine samples at all study visits were included in this study. The treatment effect was calculated based on single, the average of two, or the average of three urine samples for albuminuria measurements collected before each follow-up visit. The albuminuria measurement derived from the urine collected on the morning of the study visit was used to calculate the treatment effect for a single measurement. The urine on the day before the study visit and at the day of the study visit was used to calculate the treatment effect for a duplicate measurement and all three urine collections were used to assess the impact of three averaged albuminuria measurements.
The drug effect and its precision at baseline to end of treatment were assessed by analyses of covariance. Change in albuminuria was analyzed with treatment as the classification variable and baseline UACR as the covariate. The drug effect is reverse transformed to the natural scale and presented as the percent change. The precision of the log drug effect is reflected by its SEM.
Longitudinal analyses (across all follow-up visits) were based on a generalized estimating equation (6). The model contains treatment as a classification variable and visit, albuminuria level at baseline, and a treatment times visit interaction as covariates. The Akaike information criterion was used to compare the model fit taking different covariance structures into account. We compared models that take into account the decreasing within-patient correlation between measurements over time. We compared model fit of models with ante dependence, autoregressive, and Toeplitz covariance structures using the corrected Akaike information criterion. Based on the model fit, a full Toeplitz model was used in which equal SDs for log (UACR) were assumed at each follow-up visit, and correlations between albuminuria measurements at different visits were assumed to depend on the difference between the respective visit numbers (Supplemental Tables 1 and 2 provide details on the models used).
To determine how the SEM would change if more than three albuminuria measurements were considered at each time point, we performed simulations using input from a patient-specific linear mixed model that was applied to the data. In this simulation, we used the estimated correlation of UACR measurements between time points within patients. The correlation between consecutive measurements collected the days before a study visit in a participant was set to 0, because this would maximize the simulated gain in precision given by each additional measurement collected before a follow-up visit. In order to present the estimated SEMs with their 95% confidence intervals, we performed 1000 simulations.
All analyses were performed using SAS 9.2 software. A two-sided P value <0.05 was considered statistically significant. The additional data simulation was conducted using R 3.1.0 software and power calculations were performed using G*Power 3.1.9 software.
A total of 464 patients (77.4%) from AVOID, 129 patients (68.8%) from VITAL, and 180 patients (85.3%) from RADAR had three albuminuria measurements at each study visit available and were included in this study. Key baseline characteristics of included trial participants are shown in Table 1. Baseline characteristics were well balanced among randomized treatment groups. In all included trials, the average age was 60–65 years; the majority of participants were men (66%–81%) and were Caucasian (43%–87%). Baseline albuminuria was similar across the three trials.
Treatment Effect between Baseline and End of Treatment
The treatment effects of aliskiren, paricalcitol, and atrasentan estimated from the baseline and end-of-treatment measurements are summarized in Figure 1. Relative to placebo, active treatment decreased albuminuria by approximately 20% in the AVOID and VITAL trials, 35% in the RADAR low-dose trial, and 40% in the RADAR high-dose trial. We observed minimal variation in the mean albuminuria-lowering treatment effect if the effect was calculated from single, double, or triple consecutive urine collections at baseline and end of treatment (Figure 1). For example, the treatment effect ranged between −17% and −18% in the VITAL trial and was not statistically significant (P=0.09 for three consecutive urine collections at baseline and end of treatment) and ranged between −39% and −41% in the RADAR trial (1.25 mg/d; P<0.01).
The SEM of the treatment effect estimate decreased (increased precision) if albuminuria measurements from two consecutive urine collections at baseline and end of treatment were averaged (Figure 1, bottom panels) compared with using single urine collections. For example, the SEM decreased from 0.13 to 0.11 in the VITAL trial. There was a small further decrease in SEM if the treatment effect was estimated from three consecutive urine collections in the AVOID trial but not in the VITAL or RADAR trials. Additional simulation studies showed no appreciable decrease in SEM beyond three urine collections (Supplemental Figure 1).
Treatment Effect Considering All Follow-Up Visits
The albuminuria-lowering treatment effects of aliskiren, paricalcitol, and atrasentan estimated from albuminuria measurements collected at follow-up visits during the treatment phase are summarized in Figure 2 (top panels). Across the study, urine was collected at five follow-up visits in the AVOID trial and six follow-up visits in the VITAL and RADAR trials. The albuminuria-lowering treatment effect was consistent regardless of (1) the number of consecutive urine collections at each follow-up visit and (2) the number of follow-up visits used to calculate the albuminuria-lowering effect (all P values >0.1).
Figure 2 (bottom panels) shows that the SEM of the treatment effect decreases with increasing number of visits at which urine was collected during follow-up without reaching a plateau below which the SEM did not further decrease. In the AVOID and VITAL trials, the smallest SEM (highest precision) was observed when three consecutive urine collections were performed at all follow-up visits. In the RADAR trial, the SEM was not different between double or triple urine collections at all follow-up visits. As a result of the increase in precision, the treatment effect of paricalcitol in the VITAL study became statistically significant (P=0.047 for using three consecutive urine collections at all follow-up visits).
Table 2 shows the effect of the gain in precision on sample size requirements when all albuminuria measurements during follow-up were selected versus using only the baseline and end-of-treatment measurements. The sample sizes only modestly decreased when triple, compared with double or single, consecutive urine collections were performed at each visit. However, there was a clear decrease in the sample size when all albuminuria measurements at all follow-up visits were considered. The sample size requirements to detect a 30% reduction in albuminuria decreased 4- to 6-fold compared with only using the baseline and end-of-treatment measurements (Table 2).
This study was conducted to assess the effect of the number of urine collections at one visit and the number of study visits on albuminuria-lowering drug effects. We observed that increasing the frequency of urine collections at two predefined time points or at multiple follow-up visits did not alter the estimate of the mean albuminuria-lowering drug effect. However, the precision of the drug effect increased with an increased frequency of urine collections at baseline and end of treatment, and markedly increased when more albuminuria measurements were used during follow-up. The increase in precision in the absence of a change in the average drug effect led to a marked increase in statistical power, which translated into a 4- to 6-fold lower sample size requirement for trials designed to test the efficacy of an albuminuria-lowering drug. Future studies should use consecutive albuminuria measurements per visit and at multiple follow-up visits.
The largest effect on the precision of the drug effect was observed when the frequency of follow-up visits was increased. As a result of the increase in precision, the albuminuria-lowering effect of paricalcitol 2 µg/d in the VITAL study became statistically significant, whereas it was not when the baseline and end-of treatment values alone were analyzed (1). We did not identify a threshold above which the precision did not further increase, suggesting that the larger the number of urine collections during a trial, the greater the precision and gain in statistical power. However, increasing the number of urine collections goes at the expense of additional burden on the patient and increases the costs and complexity of the trial. There are a couple of additional considerations that should be taken into account. First, when using multiple albuminuria measurements over time, one should also take into account the time course of the antialbuminuric drug effect. Importantly, the drugs included in this analysis are known to have relatively rapid onsets of action. The time course of other albuminuria-lowering drugs may be different. One should be cautious about analyzing albuminuria measurements obtained directly after randomization if the drug does not have a direct effect on albuminuria because including such cases may dilute the overall drug effect and decrease statistical power. Second, the albuminuria-lowering drug effect needs to be stable over time in order to justify using all albuminuria measurements over time. If this is not the case, statistical power can be significantly compromised. Third, the correlation between albuminuria measurements likely depends on the time interval between visits. Therefore, the effect of using multiple albuminuria measurements over time on statistical power may depend on the time interval between visits. We suggest that the decision on the timing and frequency of urine collections in a clinical trial should be based on individual grounds, taking into account drug characteristics, pharmacodynamics, patient population, aims of the trials, and operational aspects. We recommend, however, that multiple collections be used rather than baseline and end of treatment alone.
Increasing the number of urine collections at each study visit also increased the precision of the treatment effect. In the AVOID and VITAL studies, the highest precision was observed when three urine samples at each study visit were collected; in the RADAR study, the precision reached a maximum with two urine samples. Because no more than three consecutive urine collections were performed in each trial, we performed a simulation analysis to estimate the SEMs of the treatment effect for more than three urine samples. Those results confirmed that no appreciable gain in precision was observed when collecting more than three consecutive urine samples. Previous studies on the intraindividual variability in albuminuria also concluded that three consecutive FMV urine samples and measurement of the UACR should be performed to quantify albuminuria (7). These conclusions are reflected in current clinical practice guidelines advocating collection of three consecutive urine samples for diagnostic and prognostic purposes (8,9). Because we found that the optimal SEM was achieved with three consecutive urine collections and because three urine collections are recommended to diagnose and monitor albuminuria in clinical practice, we recommend that albuminuria assessment in clinical trials at each study visit should be based on the average of three consecutive FMV collections.
Various operational aspects of albuminuria measurements have been studied in the past. Prior studies investigated whether FMV or daytime urine samples can replace the gold standard of 24-hour urine samples for prediction of renal and cardiovascular disease (10,11). In addition, the intraindividual variability in albuminuria over time and the stability of albuminuria after prolonged frozen storage have been thoroughly investigated (4,12). However, to the best of our knowledge, this is the first systematic analysis on the effect of the frequency of urine collections during a clinical trial to determine drug efficacy.
Our study has strengths and limitations. The strengths are the individual data of different clinical trials resulting in a well characterized cohort of participants with diabetes and nephropathy on an appropriate modern therapeutic background. Hence, our results and conclusions are mostly useful for the assessment of treatment effects on albuminuria as a surrogate outcome in trials of diabetic nephropathy. In addition, the use of different drug classes increases the generalizability of our results. The limitations are that this is a post hoc analysis and the trials were not designed and powered to address our research question. We were also unable to compare our results with gold-standard measurements because 24-hour urine samples were not collected at each follow-up visit in the included trials. Finally, baseline albuminuria was defined as the average albuminuria level collected at 3 consecutive days before the randomization visit. Unfortunately, albuminuria was not assessed at multiple visits before randomization. We were therefore unable to assess the effect of the frequency of urine collections to define an optimal baseline measurement.
In conclusion, increasing the number of urine collections per study visit and the number of visits over time does not change the average drug effect estimate but markedly increases the precision of the estimate, thereby enhancing statistical power of clinical trials. Because many trials already collect multiple urine samples for albuminuria measurements during follow-up, considering these data in the final efficacy analysis is a simple and practical way to increase statistical power. Hence, current clinical trial designs in diabetic nephropathy trials using albuminuria as an end point can be significantly improved, leading to smaller sample sizes and less complex and more cost-effective trials.
D.d.Z. is a consultant for and received honoraria (to employer) from AbbVie, Astellas, AstraZeneca, Chemocentryx, Johnson & Johnson, HemoCue, Novartis, Reata, Takeda, and Vitae. D.L.A. is employed by AbbVie and owns AbbVie stock. F.P. is employed by Steno Diabetes Center, a nonprofit institution owned by Novo Nordisk; in addition, F.P. has received lecture fees from Novo Nordisk, Novartis, Eli Lilly, Boehringer Ingelheim, as well as advisory honoraria from Bristol-Myers Squibb and AstraZeneca. H.H.P. reports having equity in Merck and Novo Nordisk and receiving consulting and lecture fees from AstraZeneca, Abbott, Novartis, and Reata. H.J.L.H. is a consultant for and received honoraria (to employer) from AbbVie, Astellas, Janssen Pharmaceuticals, Reata, and Vitae.
This research was presented in part at the American Society of Nephrology Annual Meeting, held November 5–10, 2013, in Atlanta, Georgia. For this study no funding or other financial support was received.
Published online ahead of print. Publication date available at www.cjasn.org.
This article contains supplemental material online at http://cjasn.asnjournals.org/lookup/suppl/doi:10.2215/CJN.07780814/-/DCSupplemental.
1. de Zeeuw D, Agarwal R, Amdahl M, Audhya P, Coyne D, Garimella T, Parving HH, Pritchett Y, Remuzzi G, Ritz E, Andress D: Selective vitamin D receptor activation with paricalcitol for reduction of albuminuria in patients with type 2 diabetes (VITAL study): A randomised controlled trial. Lancet 376: 1543–1551, 2010
2. Parving HH, Persson F, Lewis JB, Lewis EJ, Hollenberg NKAVOID Study Investigators: Aliskiren combined with losartan in type 2 diabetes and nephropathy. N Engl J Med 358: 2433–2446, 2008
3. Bakris GL, Toto RD, McCullough PA, Rocha R, Purkayastha D, Davis PGUARD (Gauging Albuminuria Reduction With Lotrel in Diabetic Patients With Hypertension) Study Investigators: Effects of different ACE inhibitor combinations on albuminuria: Results of the GUARD study. Kidney Int 73: 1303–1309, 2008
4. Witte EC, Lambers Heerspink HJ, de Zeeuw D, Bakker SJ, de Jong PE, Gansevoort R: First morning voids are more reliable than spot urine samples to assess microalbuminuria. J Am Soc Nephrol 20: 436–443, 2009
5. de Zeeuw D, Coll B, Andress D, Brennan JJ, Tang H, Houser M, Correa-Rotter R, Kohan D, Lambers Heerspink HJ, Makino H, Perkovic V, Pritchett Y, Remuzzi G, Tobe SW, Toto R, Viberti G, Parving HH: The endothelin antagonist atrasentan lowers residual albuminuria in patients with type 2 diabetic nephropathy. J Am Soc Nephrol 25: 1083–1093, 2014
6. Hanley JA, Negassa A, Edwardes MD, Forrester JE: Statistical analysis of correlated data using generalized estimating equations: an orientation. Am J Epidemiol 157: 364–375, 2003
7. Smulders YM, Slaats EH, Rakic M, Smulders FT, Stehouwer CD, Silberbusch J: Short-term variability and sampling distribution of various parameters of urinary albumin excretion in patients with non-insulin-dependent diabetes mellitus. J Lab Clin Med 132: 39–46, 1998
8. American Diabetes Association: Standards of medical care in diabetes—2014. Diabetes Care 37[Suppl 1]: S14–S80, 2014
9. Chronic Kidney Disease Working Group: KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int Suppl 3: 1–150, 2013
10. Lambers Heerspink HJ, Brantsma AH, de Zeeuw D, Bakker SJ, de Jong PE, Gansevoort RTPREVEND Study Group: Albuminuria assessed from first-morning-void urine samples versus 24-hour urine collections as a predictor of cardiovascular morbidity and mortality. Am J Epidemiol 168: 897–905, 2008
11. Lambers Heerspink HJ, Gansevoort RT, Brenner BM, Cooper ME, Parving HH, Shahinfar S, de Zeeuw D: Comparison of different measures of urinary protein excretion for prediction of renal events. J Am Soc Nephrol 21: 1355–1360, 2010
12. Lambers Heerspink HJ, Nauta FL, van der Zee CP, Brinkman JW, Gansevoort RT, de Zeeuw D, Bakker SJ: Alkalinization of urine samples preserves albumin concentrations during prolonged frozen storage in patients with diabetes mellitus. Diabet Med 26: 556–559, 2009