ppmHR: A Privacy-protecting Tool to Fit Inverse Probability Weighted Cox Models in Multisite Studies : Epidemiology

Secondary Logo

Journal Logo



A Privacy-protecting Tool to Fit Inverse Probability Weighted Cox Models in Multisite Studies

Shu, Di; Toh, Sengwee

Author Information
Epidemiology 32(2):p e6-e7, March 2021. | DOI: 10.1097/EDE.0000000000001300

To the Editor:

It is increasingly common in epidemiologic research to analyze data from multiple sources. For example, the Sentinel System, funded by the US Food and Drug Administration, uses data from a diverse group of health plans and delivery systems to monitor the safety of approved medical products.1 However, data sharing is often a challenge in multisite studies due to contractual or legal restrictions, or concerns about breaches of patient privacy, unauthorized reuses of transferred data, or incorrect analysis of the data.2,3 Analytic methods and software that enable valid statistical analysis while minimizing sharing of granular individual-level data are useful and preferred. In this article, we introduce an R package, ppmHR,4 which implements a one-step risk set approach5 to fit inverse probability weighted (IPW) Cox models without sharing individual-level data in multisite studies.

The ppmHR package allows each data-contributing site to share only two eight-column summary-level risk set tables with the analysis center,5 one using unstabilized (or conventional) weights and the other using stabilized weights (Figure part A). Each row in the table represents a risk set. For each risk set (or row), the columns represent the sum of weights for the exposed cases (sumEC), the sum of weights for all cases (sumC), the sum of weights for all exposed patients (sumE), the sum of weights for all unexposed patients (sumUnE), the sum of squared weights for the exposed cases (sumSquareEC), the sum of squared weights for the unexposed cases (sumSquareUnEC), the sum of squared weights for all exposed patients (sumSquareE), and the sum of squared weights for all unexposed patients (sumSquareUnE) of that risk set. The cells in the risk set tables contain only summary-level information on weights and therefore offer better privacy protection.

Screenshots of the first five rows of an example risk set table with stabilized weights (A), and outputs of overall (B1) and site-specific hazard ratio (B2, B3, and B4 for sites 1, 2 and 3, respectively) estimation using the ppmHR package. Each analysis returns results using both the unstabilized (or conventional) inverse probability weights and stabilized inverse probability weights.

Each data-contributing site can use the function createRisksetTable in ppmHR to create its two site-specific risk set tables from its individual-level data. A logistic regression model is used to estimate the propensity scores with an option to do weight truncation. Once receiving the risk set tables from all sites, the analysis center can use the function estimateStratHR to estimate the overall hazard ratios and robust sandwich variances using IPW Cox models stratified on data-contributing site, and also use the function estimateSiteHRs to estimate site-specific hazard ratios and robust sandwich variances. The algorithm is available in the eAppendix; https://links.lww.com/EDE/B744.

Here, we illustrate the use of ppmHR using three simulated datasets site1, site2, and site3 available in the package, which represent individual-level data from three sites. We tested the package to see if the results from the functions estimateStratHR and estimateSiteHRs were the same as the results from the corresponding pooled individual-level data analyses using function coxph in the package survival.6 Replication R code is available in the eAppendix; https://links.lww.com/EDE/B744.

Figure parts B1–B4 report results of the summary-level data analysis using ppmHR. For example, the estimated overall log hazard ratio and robust sandwich variance-based standard error were 0.4834492 and 0.1414647, respectively, using the unstabilized weights, and 0.4923302 and 0.1443092, respectively, using the stabilized weights. As expected, these results from ppmHR were equivalent to the results from the corresponding pooled individual-level data analysis using the standard package survival.

We have illustrated that ppmHR enables valid hazard ratio estimation using only one file transfer of summary-level risk set tables between the data-contributing sites and the analysis center. This package also provides functions to check for local and global covariate balance before and after weighting. Future work will generate unadjusted and IPW-adjusted Kaplan–Meier survival curves,7 using only summary-level information shared by sites.


1. Behrman RE, Benner JS, Brown JS, McClellan M, Woodcock J, Platt R. Developing the sentinel system–a national resource for evidence development. N Engl J Med. 2011;364:498–499.
2. Hill EM, Turner EL, Martin RM, Donovan JL. “Let’s get the best quality research we can”: public awareness and acceptance of consent to use existing data in health research: a systematic review and qualitative study. BMC Med Res Methodol. 2013;13:72.
3. Mazor KM, Richards A, Gallagher M, et al. Stakeholders’ views on data sharing in multicenter studies. J Comp Eff Res. 2017;6:537–547.
4. Shu D, Toh S. ppmHR: Privacy-protecting hazard ratio estimation in distributed data networks, 2020. Available at: https://CRAN.R-project.org/package=ppmHR. R package version 1.0.
5. Shu D, Yoshida K, Fireman BH, Toh S. Inverse probability weighted Cox model in multi-site studies without sharing individual-level data. Stat Methods Med Res. 2020;29:1668–1681.
6. Therneau T. survival: Survival analysis, 2020. Available at: https://CRAN.R-project.org/package=survival. R package version 2.44-11.
7. Cole SR, Hernán MA. Adjusted survival curves with inverse probability weights. Comput Methods Programs Biomed. 2004;75:45–49.

Supplemental Digital Content

Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.