Government datasets are newly available on open data platforms that are publicly accessible, available in nonproprietary formats, free of charge, and with unlimited use and distribution rights. They provide opportunities for health research, but their quality and usability are unknown.
To describe available open health data, identify whether data are presented in a way that is aligned with best practices and usable for researchers, and examine differences across platforms.
Two reviewers systematically reviewed a random sample of data offerings on NYC OpenData (New York City, all offerings, n = 37), Health Data NY (New York State, 25% sample, n = 71), and HealthData.gov (US Department of Health and Human Services, 5% sample, n = 75), using a standard coding guide.
Three open health data platforms at the federal, New York State, and New York City levels.
Data characteristics from the coding guide were aggregated into summary indices for intrinsic data quality, contextual data quality, adherence to the Dublin Core metadata standards, and the 5-star open data deployment scheme.
One quarter of the offerings were structured datasets; other presentation styles included charts (14.7%), documents describing data (12.0%), maps (10.9%), and query tools (7.7%). Health Data NY had higher intrinsic data quality (P < .001), contextual data quality (P < .001), and Dublin Core metadata standards adherence (P < .001). All met basic “web availability” open data standards; fewer met higher standards of “hyperlinked to other data.”
Although all platforms need improvement, they already provide readily available data for health research. Sustained effort on improving open data websites and metadata is necessary for ensuring researchers use these data, thereby increasing their research value.
Supplemental Digital Content is Available in the Text.
Nelson A. Rockefeller Institute of Government, Albany, New York (Dr Martin); Rockefeller College of Public Affairs & Policy (Dr Martin and Ms Law), College of Computing and Information (Ms Ran), and School of Public Health (Dr Birkhead), University at Albany, Albany, New York; and New York State Department of Health, Albany, New York, NY (Drs Helbig and Birkhead).
Correspondence: Erika G. Martin, PhD, MPH, Rockefeller College of Public Affairs & Policy, University at Albany, 1400 Washington Ave, Milne 300E, Albany, NY 12222 (firstname.lastname@example.org).
The authors are grateful to Courtney Burke, Patricia Lynch, Theresa Pardo, and Ozlem Uzuner for providing comments on an early draft; Christopher Kotfila for providing JSON technical support to assist with the metadata scrape; and Oscar Alleyne, Erich Bremmer, Sharon Dawes, Janine Jurkowski, Jacqueline Lawler, Kimberly Libman, Rachel Manes, Erin Pascaretti, Giri Tayi, Johnson Qian, and Mike Zdeb for providing feedback on how health data are used, characteristics of data and metadata with high quality and usability, and the conceptual model.
This work was supported by a grant from the Robert Wood Johnson Foundation's Public Health Services & Systems Research Program (Grant ID#71597 to E.G.M. and G.S.B.). G.S.B. and N.H. are employees of the New York State Department of Health, which maintains the Health Data NY open data platform reviewed in this study.
Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal's Web site (http://www.JPHMP.com).
The authors declare no conflicts of interest.