In this issue, Smith and colleagues1 discuss how the “cornucopia of data” produced by mobile technologies and social networks poses new opportunities and challenges to academic medicine. The data collected through these new systems, they argue, provide an opportunity for individuals and health care professionals alike to gain a greater understanding of health conditions and to identify novel ways to change behavior and treat illness. On the other hand, the highly personal nature of these interconnected data and the many reports of their unauthorized use create in the public a growing “climate of fear and distrust.”1 As personal data sharing becomes more commonplace, society is sailing into uncharted waters. Academic medicine, accustomed to uncertainty, has both an opportunity and responsibility to help navigate through the changes ahead.
The pace of growth in technology has been dramatic. Only two decades ago, the machinery of the academic medical enterprise was fueled by paper. Over the next few years, paper-based information systems were abandoned and replaced by digital systems that afforded new ways of working. Digital library systems, electronic health records, research databases, genome sequencers, and other technologies conferred efficiencies and brought forth new and more economical research, education, and treatment opportunities. With these new opportunities came new challenges. Unlike the photocopy machine, camera, or audio tape, digital systems allowed for an unlimited number of perfect copies of any piece of data recorded. In theory, one could misappropriate any book, song, invention, or data set and, through the Internet, release it without permission to a worldwide audience. This transition from paper to digital data collection, distribution, and archiving necessitated a comprehensive examination of the laws governing individual privacy, intellectual property, and many other important topics.
As computer systems became more interconnected over the next decade, shared databases, analytic systems, and collaboration software transformed how individuals and teams could conduct research, disseminate knowledge, educate, and provide care. When combined with federal mandates to implement interoperable health records, these new and interconnected systems made feasible large-scale research programs like the Clinical Translational Science Awards, the Patient-Centered Outcomes Research Institute, and the more recent Precision Medicine Initiative.2–4 As was the case with the earlier transition from paper to local digital technologies, the new approaches required yet another (and still ongoing) reexamination of policies governing privacy, intellectual property, and data management.
Though each of these transitions, our way of delivering health care still largely relied on clinical and administrative transactional data, which could only be obtained from the hospital or clinic. Information remained under the control of professionals. What changed was the relative enormity of aggregated data sets and the expanded range of scientific questions to which these data sets could be applied. In contrast, Smith and colleagues1 describe a far more dramatic expansion of the settings in which data will be collected, the types of data produced, and the inferences that can be drawn when these data are analyzed. Each new technology contributes to a clearer picture of the individual: Search engines describe what one needs; social networks detail who one knows; wearable sensors report how one is; location services expose where one is; and personal health records record what health information one values. In contrast to previous eras of health information management, the creation and initial control of data arises from consumers and not biomedical researchers; the financial support descends from commercial concerns and not health care providers.
These emerging data networks will dramatically expand our ability to monitor health behaviors and to intervene in ways that promote health.5–7 The data arising from this expansion may lead to a new notion of “phenotype” that more explicitly incorporates the behaviors, activity levels, social support, living environments, and other factors that determine health outcomes.8–11 The growing importance of these massive, detailed, and integrated data sets, we are told, heralds an era of “big data” and the ascension of the “data scientist”—a professional class that pundits proclaim to be the “sexiest job in the 21st century.”12,13
These new technologies and professions are harbingers of a promising new era, but it will take many years for this seemingly inchoate market of ideas and products to mature and to more systematically impact medicine and daily life. A new commerce of ideas will require both “buyer” and “seller” to reach a consensus on value. The negotiation will be driven by the fact that these powerful and intrusive technologies can both benefit and harm. The order that society seeks in the current and turbulent “Wild West” of big data cannot be imposed by a highly prescriptive regulatory apparatus, nor can it be maintained by any single authority. A collective effort—governed by principles, policies, laws, and practices—is necessary. Newly created systems must help foster societal trust by demonstrating that these systems collect, aggregate, and use data only in ways that are consistent with societal expectations.14,15
Concerns about data collection are traditionally addressed by limiting the amount of data collected and by obtaining consent at the time data are collected. With some exceptions, every new collection or use of an individual’s personal health information requires additional consent or authorization. This approach has practical limitations: Complex and time-consuming consent processes may actually make choices more difficult and may be perceived as taking too much of the limited time clinicians spend with patients.16 When personal health information is to be collected and retained for incompletely specified and indeterminate future uses, trust must be transferred from a known entity to larger and less familiar organizations. Clear rules for data stewardship and governance are essential. Organizations should explicitly describe how both identified and deidentified data will be used and shared with others.
Aggregating and linking disparate deidentified health information data sets increases the threat of reidentification. Although each individual data set may obscure individual identity through the removal of names, dates of birth, or other identifiers, it is possible at times to link multiple disparate resources together and reidentify an unsuspecting and previously “deidentified” individual.17 By linking what one needs, who one knows, where one is, how one is, and how one feels, the data scientist can develop a highly detailed and stark characterization of an individual and, through comparison with millions of other individuals, develop a fairly clear prediction of that individual’s future behavior and health.
The collection and linking of new data sets promise dramatic improvements in society’s ability to monitor health and prevent or mitigate the consequences of illness. Most would willingly consent to extensive data collection and linking if the resulting analyses were used only by those who are either responsible for their care or by those trusted health researchers who are studying how to help others. Concerns arise because one cannot be certain that linked and possibly reidentified personal health data will not be used for more sinister purposes. As the number of ways personal data can be collected increases and the cost of linking and analyzing population data decreases, more third parties will seek to obtain and use these data for commercial purposes far beyond expectations set when the data were initially generated. One need only extrapolate from the highly personalized advertisements of social networking sites and search engines to see a future in which data scientists may claim that they know more about us than we know about ourselves.
Academic medicine simply must engage in this debate in a more concerted fashion. Our professions rely on our continued commitment to the Hippocratic Oath. Our patients and research participants already often share with us their most intimate concerns and will continue to do so as new technologies only confirm what has already been disclosed to us. Patients will, one hopes, expect us to explain and to advocate on their behalf following higher standards than those they expect from a firm selling plumbing supplies. The public will continue to raise their expectations and seek from us better explanations, sounder advice, greater scientific discovery, and more effective treatments. As trusted and informed professionals, those responsible for making laws and policies governing privacy will look to us for help in finding a balance between information sharing and data privacy. Should trust be violated or credibility called into question through cavalier privacy practices or fuzzy data governance policies, the very underpinnings of our research, teaching, and patient care missions may be seriously compromised. The threat is real.
In 1960, the management theorist Theodore Leavitt claimed that the passenger railroad industry declined because markets created by automobiles and planes were better suited to meet growing and rapidly evolving transportation needs. Leavitt said that the railroads, seemingly incapable of changing their business models, failed to accommodate new technologies. The railroads, Leavitt stated, “assumed themselves to be in the railroad business rather than in the transportation business.”18
Personal health information is a foundation for academic medicine. Centuries ago, Hippocrates emphasized this point in a way that resonates in the modern era. Without maintaining trusted stewardship over personal information, patient care, research, and education will suffer. Facing new technologies and behaviors, academic medicine now confronts an era every bit as disruptive to its operations as the existential threat posed to the railroad industry by the airplane and the automobile. Academic medicine, therefore, is first and foremost in the “trust business.” Academic medicine must not simply react to events but, rather, must anticipate trends and maintain public trust as new social and technological challenges arise.
1. Smith RJ, Grande D, Merchant RM.. Transforming scientific inquiry: Tapping into digital data by building a culture of transparency and consent. Acad Med. 2016;91:469–472
2. Leshner AI, Terry SF, Schultz AM, Liverman CT. The CTSA Program at NIH: Opportunities for Advancing Clinical and Translational Research. 2013 Washington, DC National Academies Press
3. Selby JV, Lipstein SH.. PCORI at 3 years—progress, lessons, and plans. N Engl J Med. 2014;370:592–595
4. Collins FS, Varmus H.. A new initiative on precision medicine. N Engl J Med. 2015;372:793–795
5. Estrin D.. Small data, where n = me. Commun ACM. April 2014;57:32–34
6. Chiauzzi E, Rodarte C, DasMahapatra P.. Patient-centered activity monitoring in the self-management of chronic health conditions. BMC Med. 2015;13:77
7. Elenko E, Underwood L, Zohar D.. Defining digital medicine. Nat Biotechnol. 2015;33:456–461
8. Jain SH, Powers BW, Hawkins JB, Brownstein JS.. The digital phenotype. Nat Biotechnol. 2015;33:462–463
9. Ziegelstein RC.. Personomics. JAMA Intern Med. 2015;175:888–889
10. Grossmann C, Goolsby WA, Olsen L, McGinnis JM Engineering a Learning Healthcare System: A Look at the Future: Workshop Summary. 2011 Washington, DC National Academy of Sciences
11. Beachy SH, Olson S, Berger AC Genomics-Enabled Learning Health Care Systems: Gathering and Using Genomic Information to Improve Patient Care and Research: Workshop Summary. 2015 Washington, DC National Academy of Medicine
12. McAfee A, Brynjolfsson E, Davenport TH, Patil D, Barton D.. Big data: The management revolution. Harv Bus Rev. October 2012;90:61–67
13. Davenport TH, Patil DJ.. Data scientist: The sexiest job of the 21st century. Harv Bus Rev. 2012;90:70–76, 128
14. Nissenbaum HF. Privacy in Context: Technology, Policy, and the Integrity of Social Life. 2010 Stanford, Calif Stanford Law Books
15. Nissenbaum H.. A contextual approach to privacy online. Daedalus. Fall 2011;140:32–48
16. Goldstein MM.. Health information technology and the idea of informed consent. J Law Med Ethics. 2010;38:27–35
17. El Emam K, Rodgers S, Malin B.. Anonymising and sharing individual patient data. BMJ. 2015;350:h1139
18. Levitt T.. Marketing myopia. Harv Bus Rev. July–August 1960;38(4):24–47