Secondary Logo

Journal Logo

Invited Commentaries

A Call to Investigate the Relationship Between Education and Health Outcomes Using Big Data

Chahine, Saad PhD; Kulasegaram, Kulamakan (Mahan) PhD; Wright, Sarah MBA, PhD; Monteiro, Sandra PhD; Grierson, Lawrence E. M. PhD; Barber, Cassandra MA; Sebok-Syer, Stefanie S. PhD; McConnell, Meghan PhD; Yen, Wendy MA; De Champlain, Andre PhD; Touchie, Claire MD, MHPE, FRCPC

Author Information
doi: 10.1097/ACM.0000000000002217
  • Free


Teachers, students, and society invest a large amount of resources into medical education based on the assumption that exceptional training will lead to expert physicians, which will result in exceptional care.1–5 This causal argument is the basis for ongoing innovation and improvement in medical education. While several theories posit such a relationship between education, clinical practice, and outcomes, in 2018, there is still a paucity of empirical research investigating these relationships.

Medical education is an important, albeit expensive and time-consuming, enterprise for clinician educators and students. Subsidized by a societal commitment to the education of physicians, both time and financial resources are heavily invested into improvement and innovation in medical education. Given this, we believe the time has come to evaluate the results of continued dedication to improving medical education. Where is the evidence to show that the investments being made in medical education result in more competent physicians and better patient care?

In this issue, Triola and colleagues6 raise this fundamental question and ask us to consider more deliberate use of publicly available, open data, such as those available in the Medicare system (e.g., Centers for Medicare & Medicaid Services Physician Compare, Part D Prescriber). This has many advantages as there are few barriers to accessing and using these data, and typically, publicly available data are meaningful to stakeholders such as policy makers, regulators, educators, and researchers. Further, these data are important sources of evidence that could potentially influence educational innovation and policy, and place medical education on an evidence-informed and accountable footing.

While publicly available data provide important insights, other sources of data are also needed to evaluate the relationship between education and health outcomes. That is, many of the questions of interest relevant to medical education cannot be easily answered by publicly available data alone, publicly available data are not always easily accessible or meaningful in all health systems, and several data silos exist among non-publicly available data. Connecting the data in these silos and publicly available data to previous and future educational experiences will require deliberate planning and construction of databases and storage havens.

We agree with Triola and colleagues6 that a rigorous approach to evaluating the spectrum of medical education from admission to clinical practice and continuing professional development, which will require a long-term commitment, is needed so that the relationships between education and health outcomes can be investigated. So far, the lack of longitudinal studies investigating these links makes it difficult to identify empirically to what extent educational practices translate to improved health care outcomes for patients and society. Thus, one potential strategy is to systematically harness the immense amount of data collected across the training spectrum. We propose that using the “big” longitudinal data already collected on medical trainees, institutions, and clinical outcomes is an ideal way to investigate the impacts of education on health care quality.

The Power of Big Data

“Big data” has become a popular term used to describe large, complex datasets requiring novel analytic strategies to examine patterns and relationships.7,8 Researchers use big data to understand physical, psychological, and social patterns, and more recently to reveal new discoveries that can enhance health care quality (e.g., proposing treatment options for patients and detecting postoperative complications).9,10

A common criticism of big data is that associations can be made between variables having little relevance to one another. Certainly, while relationships may exist among a variety of variables, correlation does not infer causation. Caution is required to ensure that the relationships being investigated represent meaningful and relevant linkages that can address social needs.11,12 The following are examples of carefully executed studies that highlight meaningful uses of big data and epidemiological approaches to better inform health care quality and medical training.

In a U.S. context, Asch and colleagues13 (in 2014) investigated the maternal complication of delivery rates of over 2,000 obstetricians longitudinally for nearly 20 years, representing over 2 million births. This research revealed an association between the number of years of clinical experience, location of residency training program, initial skill upon entering practice, and complication rates. When high and low performance quartiles of newly practicing obstetricians were compared, the authors found that the initial skill of a physician explained more of the variance in overall maternal delivery quality than the physician’s number of years of clinical experience. Specifically, it takes approximately 15 years for residents graduating with the highest complication rates to catch up to residents graduating with average complication rates. Additionally, graduates of programs with the highest complication rates never caught up with those graduating from programs with the lowest complication rates. These findings built on earlier work where the authors were able show the potential of using clinical data to evaluate training programs.14,15

In a Canadian context, Tamblyn and colleagues16 (in 2002) investigated the associations between licensing exam scores and patient care using ~900 primary care physicians who cared for 3.4 million patients. Their research showed a significant association between performance on licensing exams and future practice performance that was sustained for more than seven years after the exam. Specifically, they found that high-scoring performers on the exam ordered more mammograms, had higher rates of disease-specific prescribing relative to symptom-relief prescribing, and had lower rates of contradicted prescribing. In a subsequent study, Tamblyn and colleagues17 found that prescribing habits differed on the basis of training institution, suggesting that curricular approaches can influence clinical practice.

Amid the type of careful data collection exhibited in these examples, some studies have been conducted to investigate the intricate relationship between training and patient care through the linking of data.18–20 These studies yield important conclusions about the impact of education at the individual and program level and about the impact of education on health care outcomes. They answer important questions about the quality of educational outcomes and raise new ones about how education can be optimized. Unfortunately, these studies are still scarce, likely because of the resources involved in collecting and linking several sets of data. Conducting these types of analyses requires large datasets that coherently link education to health care practice or outcomes. Doing so on a study-by-study basis is an expensive proposition and limits generalizability to the broader issue of how medical training and outcomes for patients are linked.

Despite these challenges, the studies highlighted above as well as a handful of others21–27 leverage big data to demonstrate a relationship between training and care. As a collection, this work has the potential to shape educational and policy change. Triola and colleagues6 make the case for using the open data movement to study educational impacts on health care systems. While this conversation is slowly evolving, the open data movement recently emerging in the United States has the potential to change how medical educators study the influence of their programs on patient care. We echo this call and encourage the open data initiative globally, as the medical education and health care fields are lagging behind industry in the use of big data to inform change.28

Now Is the Time: Fulfilling the Big Data Promise

Calls to harness longitudinal data have previously occurred.29 But the need to do so is now more imperative than ever. Education practice is undergoing the competency-based medical education (CBME) revolution.30 CBME is based on outcomes relevant to professional practice and societal needs.31,32 Through the implementation of CBME, learners are empowered via the increasing accountability of training institutions to provide personalized learning opportunities, continuous feedback, and support through assessment on key milestones.33,34 While cutting-edge research is taking place on the use of big data to study learning gains,35,36 it remains unknown whether these CBME-related changes in education and their resultant learning gains translate into quality health care system improvements.

It is the responsibility of medical education researchers and academic leaders to meaningfully evaluate the effects of CBME and other major innovations on education systems. Sparked by CBME and a lack of systematic data collection, we have organized a grassroots initiative, inclusive of universities in one Canadian province and national licensing organizations that are working together to collect, organize, link, and analyze big data to study the relationship between pedagogical approaches to medical training and patient care outcomes.

As an unofficial consortium, we are delving into the conceptual and practical possibilities of integrating big data in education with the health care system. We can conceive a spectrum of data extending from admissions into continuing professional development allowing for ongoing and thorough research, evaluation, and improvement of not only CBME but of multiple pressing issues, including expanding diversity, detecting unmet training needs, and addressing health human resources, while improving patient care and population health outcomes and reducing health care costs.

Our consortium is cognizant of the ethical and legal challenges related to, as well as the logistical and technical issues that may arise from, using big data and linking educational and health care processes and outcomes. Cooperation among stakeholders and the use of a third-party linking service are considered best practices that we are attempting to follow.37 However, the process of linking is not perfect, and advances are continuing to be made to ensure that the privacy of learners, faculty, and patients is protected.38,39 While a great deal of this work is technical, and advances in computing and analytics could support data acquisition and verification, which ensure data security, the facilitation of this work is built on human relationships. Negotiating across sectors (including training, health care, testing, and regulation) at each level of training to create a linked database for multiple specialties across Canada will be a delicate and potentially contentious process. As a consortium, we believe that it is fundamental to build trust among all parties involved. This will ensure uniformity and data quality as we investigate the novel application of big data to study the educational effects on the health care system.

As a way forward, we encourage pilot collaborations that allow for institutions from across the different levels of training to work together in identifying barriers and enablers to the successful use of big data and linkage of educational and health care data. Each specialty or geographical region could begin pilot projects under the aegis of research before institutionalizing the collection and use of big data. Some institutions already have the data systems set up, such as the Research on Medical Education Outcomes registry at New York University’s School of Medicine, which has collected and stored resident and fellow data since 2008.40 However, apart from individual institutional databases and recent initiatives such as the Ontario Physician Human Resource Data Centre,41 there is a lack of concerted efforts to link data from across institutions into a single network. A potential model of what such a network might look like is the Institute of Clinical Evaluation Services (ICES),42 which has five offices housed at different universities across Ontario. These offices provide secure storage of clinical data obtained directly from the health care system and are hubs of active research. Using a central office, ICES is able to coordinate efforts to produce research on clinical and health services questions (e.g., setting hospital benchmarks for cardiac care).43


Big data is already a feature of the digital and commercial landscape. In the future, it is inevitable that it will become part of scientific and educational activities. There is a need for medical education research to go beyond the outcomes of training to study practice and clinical outcomes as well. Systematically harnessing data can shift efforts to improve education from being based on informed and sometimes-costly guesses to being more evidence informed and accountable. As the social, time, and financial investments in medical education continue to increase, it is imperative to understand the relationship between education and health outcomes. We believe that embracing big data will allow this and thus help to answer a key question in medical education: How do educational approaches influence the health of patients and the well-being of society?

Acknowledgments: This Invited Commentary reflects the personal views of an unofficial community of researchers, informally known as the Barcelona Consortium.


1. Frenk J, Chen L, Bhutta ZA, et al. Health professionals for a new century: Transforming education to strengthen health systems in an interdependent world. Lancet. 2010;376:19231958.
2. Carraccio CL, Englander R. From Flexner to competencies: Reflections on a decade and the journey ahead. Acad Med. 2013;88:10671073.
3. Carraccio C, Wolfsthal SD, Englander R, Ferentz K, Martin C. Shifting paradigms: From Flexner to competencies. Acad Med. 2002;77:361367.
4. Hawkins RE, Welcher CM, Holmboe ES, et al. Implementation of competency-based medical education: Are we addressing the concerns and challenges? Med Educ. 2015;49:10861102.
5. Malone K, Supri S. A critical time for medical education: The perils of competence-based reform of the curriculum. Adv Health Sci Educ Theory Pract. 2012;17:241246.
6. Triola MM, Hawkins RE, Skochelak SE. The time is now: Using graduates’ practice data to drive medical education reform. Acad Med. 2018;93:826828.
7. Jin X, Wah BW, Cheng X, Wang Y. Significance and challenges of big data research. Big Data Res. 2015;2:5964.
8. Pecaric M, Boutis K, Beckstead J, Pusic M. A big data and learning analytics approach to process-level feedback in cognitive simulations. Acad Med. 2017;92:175184.
9. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309:13511352.
10. Li J, Tao F, Cheng Y, Zhao L. Big data in product lifecycle management. Int J Adv Manuf Tech. 2015;81:667684.
11. Khoury MJ, Ioannidis JP. Medicine. Big data meets public health. Science. 2014;346:10541055.
12. Ellaway RH, Pusic MV, Galbraith RM, Cameron T. Developing the role of big data and analytics in health professional education. Med Teach. 2014;36:216222.
13. Asch DA, Nicholson S, Srinivas SK, Herrin J, Epstein AJ. How do you deliver a good obstetrician? Outcome-based evaluation of medical education. Acad Med. 2014;89:2426.
14. Asch DA, Nicholson S, Srinivas S, Herrin J, Epstein AJ. Evaluating obstetrical residency programs using patient outcomes. JAMA. 2009;302:12771283.
15. Epstein AJ, Srinivas SK, Nicholson S, Herrin J, Asch DA. Association between physicians’ experience after training and maternal obstetrical outcomes: Cohort study. BMJ. 2013;346:f1596.
16. Tamblyn R, Abrahamowicz M, Dauphinee WD, et al. Association between licensure examination scores and practice in primary care. JAMA. 2002;288:30193026.
17. Tamblyn R, Abrahamowicz M, Dauphinee WD, et al. Effect of a community oriented problem based learning curriculum on quality of primary care delivered by graduates: Historical cohort comparison study. BMJ. 2005;331:1002.
18. Wenghofer E, Klass D, Abrahamowicz M, et al. Examinations predict quality of care in future practice. Med Educ. 2009;43:11661173.
19. Cadieux G, Tamblyn R, Dauphinee D, Libman M. Predictors of inappropriate antibiotic prescribing among primary care physicians. CMAJ. 2007;177:877883.
20. Kawasumi Y, Ernst P, Abrahamowicz M, Tamblyn R. Association between physician competence at licensure and the quality of asthma management among patients with out-of-control asthma. Arch Intern Med. 2011;171:12921294.
21. Norcini JJ, Boulet JR, Opalek A, Dauphinee WD. The relationship between licensing examination performance and the outcomes of care by international medical school graduates. Acad Med. 2014;89:11571162.
22. Cuddy MM, Young A, Gelman A, et al. Exploring the relationships between USMLE performance and disciplinary action in practice: A validity study of score inferences from a licensure examination. Acad Med. 2017;92:17801785.
23. Papadakis MA, Teherani A, Banach MA, et al. Disciplinary action by medical boards and prior behavior in medical school. N Engl J Med. 2005;353:26732682.
24. Papadakis MA, Hodgson CS, Teherani A, Kohatsu ND. Unprofessional behavior in medical school is associated with subsequent disciplinary action by a state medical board. Acad Med. 2004;79:244249.
25. Grace ES, Wenghofer EF, Korinek EJ. Predictors of physician performance on competence assessment: Findings from CPEP, the Center for Personalized Education for Physicians. Acad Med. 2014;89:912919.
26. Hess BJ, Weng W, Holmboe ES, Lipner RS. The association between physicians’ cognitive skills and quality of diabetes care. Acad Med. 2012;87:157163.
27. Smirnova A, Ravelli ACJ, Stalmeijer RE, et al. The association between learning climate and adverse obstetrical outcomes in 16 nontertiary obstetrics-gynecology departments in the Netherlands. Acad Med. 2017;92:17401748.
28. Wang Y, Kung L, Byrd TA. Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technol Forecast Soc. 2016;126:313.
29. Cook DA, Andriole DA, Durning SJ, Roberts NK, Triola MM. Longitudinal research databases in medical education: Facilitating the study of educational outcomes over time and across institutions. Acad Med. 2010;85:13401346.
30. Frank JR, Snell L, Englander R, Holmboe ES; ICBME Collaborators. Implementing competency-based medical education: Moving forward. Med Teach. 2017;39:568573.
31. Touchie C, ten Cate O. The promise, perils, problems and progress of competency-based medical education. Med Educ. 2016;50:93100.
32. Frank JR, Mungroo R, Ahmad Y, Wang M, De Rossi S, Horsley T. Toward a definition of competency-based education in medicine: A systematic review of published definitions. Med Teach. 2010;32:631637.
33. Carraccio C, Englander R, Van Melle E, et al.; International Competency-Based Medical Education Collaborators. Advancing competency-based medical education: A charter for clinician–educators. Acad Med. 2016;91:645649.
34. Holmboe ES, Sherbino J, Englander R, Snell L, Frank JR; ICBME Collaborators. A call to action: The controversy of and rationale for competency-based medical education. Med Teach. 2017;39:574581.
35. Warm EJ, Held JD, Hellmann M, et al. Entrusting observable practice activities and milestones over the 36 months of an internal medicine residency. Acad Med. 2016;91:13981405.
36. Warm EJ, Mathis BR, Held JD, et al. Entrustment and mapping of observable practice activities for resident assessment. J Gen Intern Med. 2014;29:11771182.
37. Kelman CW, Bass AJ, Holman CD. Research use of linked health data—A best practice protocol. Aust N Z J Public Health. 2002;26:251255.
38. Vatsalan D, Sehili Z, Christen P, Rahm E. Zomaya AY, Sakr S. Privacy-preserving record linkage for big data: Current approaches and research challenges. In: Handbook of Big Data Technologies. 2017:Cham, Switzerland: Springer; 851895.
39. Azaria A, Ekblaw A, Vieira T, Lippman A. MedRec: Using blockchain for medical data access and permission management. Paper presented at: 2nd International Conference on Open and Big Data; August 22–24, 2016; Vienna, Austria. Accessed February 27, 2018.
40. Gillespie C, Zabar S, Altshuler L, et al. The Research on Medical Education Outcomes (ROMEO) registry: Addressing ethical and practical challenges of using “bigger,” longitudinal educational data. Acad Med. 2016;91:690695.
41. Ontario Physician Human Resources Data Centre. Ontario Physician Human Resources Data Centre (OPHRDC) website. Accessed February 27, 2018.
42. Institute for Clinical Evaluative Sciences. Institute for Clinical Evaluative Sciences (ICES) website. Accessed February 27, 2018.
43. Tu JV, Donovan LR, Lee DS, et al. Effectiveness of public report cards for improving the quality of cardiac care: The EFFECT study: A randomized trial. JAMA. 2009;302:23302337.
Copyright © 2018 by the Association of American Medical Colleges