Secondary Logo

Journal Logo

Innovation Reports

The Research on Medical Education Outcomes (ROMEO) Registry: Addressing Ethical and Practical Challenges of Using “Bigger,” Longitudinal Educational Data

Gillespie, Colleen PhD; Zabar, Sondra MD; Altshuler, Lisa PhD; Fox, Jaclyn; Pusic, Martin MD; Xu, Junchuan MD, MS; Kalet, Adina MD, MPH

Author Information
doi: 10.1097/ACM.0000000000000920



Efforts to evaluate and optimize the effectiveness of medical education—particularly graduate medical education (GME)—have consistently been limited by the difficulty of designing education research that adequately supports causal attributions, permits generalizations, and includes meaningful outcomes. Medical education, as a field, therefore lacks a strong evidence base for establishing the most effective educational approaches.

Initial responses to the need for stronger evidence focused on calling for more tightly controlled research designs. Results from studies addressing the calls, while convincing in terms of causality, have limited generalizability. This has led to increased appreciation for a broader range of study designs and analytics, including observational, natural history, case–control, cohort, and quasi-experimental studies; time and interrupted time series designs; and data mining and predictive modeling techniques associated with big data.1,2 In addition, given the difficulty of linking educational efforts to patient outcomes, there is a growing consensus that we must refine our causal models to focus on intermediate, more proximal outcomes that can be measured with greater precision.3,4 While improving measures continues to be an important goal, the availability of resources (growing increasingly scarce) and the state of assessment science (improving but with many unresolved challenges) limit precision, suggesting that we should make better use of the full matrix of education and outcomes data routinely collected in our academic health center environments.1–5

These new analytic approaches require more data, on more individuals, from a greater number of sources, collected over more time points and across more settings than has been possible to date. Given the wealth of data academic health centers collect on residents and medical students, it should be possible to compile data on a wide range of relevant educational processes and outcomes, from many different data sources, into a single institution-wide relational database.1 This database would permit tracking of progress over time and identification of associations between and among educational processes and both education- and health-related outcomes—a truly epidemiological view.1 Linking multiple cohorts of trainees—medical students and especially residents and fellows (given their smaller numbers)—together would dramatically increase a study’s sample size to the point where it could represent close to the entire population of interest and allow for less biased and more complete understandings of the natural history of medical education.2,4,5

“Big data” is commonly defined as a collection of datasets from multiple sources that are so large and complex as to require new analytic techniques and approaches. The analytic approaches being used successfully in the private sector to make sense of large, heterogeneous, relatively undefined sets of data provide new methods for identifying patterns and testing predictions while making fewer assumptions about sample characteristics and estimation. These strategies challenge many of the traditional ways we have approached the study of both education and health care and allow us to reduce our need for highly curated and prospectively defined data and instead make maximum use of readily available data.2 While education data may not yet rise to the “big data” level of size and complexity, our “bigger” data nonetheless share many of the same challenges, albeit on a smaller scale.

A critical barrier to harnessing these bigger-data, longitudinal epidemiological approaches in medical education research is ensuring the privacy, confidentiality, and ethical use of educational data that goes beyond individual, prospectively designed studies. Compiling educational data across multiple sources and over time presents new ethical challenges. In this report, we describe our approach to solving this problem: the Research on Medical Education Outcomes (ROMEO) Registry, the educational data registry we established at New York University (NYU) School of Medicine (SOM) in 2008 to facilitate the conduct of medical education outcomes research. We broadly outline how we implemented the ROMEO Registry, and we share the research outcomes we have achieved thus far. We conclude by discussing the next steps for scaling up this innovation for the broader medical education research community.


When applied to medical education, research registries can provide a naturalistic view of the complex process of education, allow for estimating and improving the quality of measures, directly link process and outcome variables, and explicitly incorporate a developmental, longitudinal, epidemiological view.1

The ROMEO Registry is a deidentified, longitudinal database of learning, performance, quality assurance/improvement, and clinical practice assessments of NYU SOM students, residents, and fellows. We initially established it in 2008 for resident and fellow data; we first incorporated medical student data in 2010. The goals of the ROMEO Registry are to

  1. build the evidence base for medical education by supporting ongoing evaluation of the undergraduate medical education (UME) and GME curricula;
  2. assess the quality of routinely collected measures of educational outcomes;
  3. facilitate medical education research by providing access to routinely collected educational data on medical students, residents, and fellows; and
  4. enable research on the longitudinal development of clinical competence and the links between educational process and outcome variables.

The ROMEO Registry was established with federal funding support from the Health Resources and Services Administration. It is maintained by the ROMEO unit of the NYU SOM’s Program for Medical Education Innovations and Research. We developed the registry in close collaboration with our institutional review board (IRB) and directly modeled it on the IRB’s templates and processes for clinical research registries. It is modeled after patient registries in that it

  1. compiles existing data, collected in a naturalistic manner for operational purposes;
  2. provides these data for answering not-yet fully determined research questions;
  3. does not have restrictive inclusion/exclusion criteria;
  4. combines data from multiple sources and at multiple levels (e.g., individual, setting, system);
  5. includes measures of both processes and outcomes; and
  6. follows registry participants over time and, when possible, across settings/systems.

As of the time of writing in January 2015, the ROMEO Registry includes 1,225 residents and fellows enrolled over the past seven years across 12 residency programs (of a possible 1,735 residents/fellows; 71% consent rate) and 841 medical students enrolled over the past four years (of a possible 980 medical students; 86% consent rate) (see Table 1 for additional details). On the resident consent form, we ask residents who graduated from NYU SOM for permission to link their UME and GME data. We have such linked data on more than 250 individuals.

Table 1
Table 1:
Trainees Consenting to Enrollment in the New York University School of Medicine Research on Medical Education Outcomes (ROMEO) Registry by Program and Cohort as of January 2015

Data inclusion guidelines

All data routinely collected as part of trainees’ educational experiences can be included in the ROMEO Registry. In addition, data on the quality of care provided by residents (and, to some degree, students)—including patient satisfaction, chart review, and clinical systems data—are covered provided that such data are collected for and serve educational purposes. For a full description of registry data elements, see Table 2.

Table 2
Table 2:
Types of Data Included in the New York University School of Medicine Research on Medical Education Outcomes (ROMEO) Registry

The following guidelines determine whether data can be included in the registry:

  1. Data must be routinely collected as part of trainees’ educational experience. The registry does not cover data collected solely for the purposes of research.
  2. Data must be collected on all participating trainees, and all trainees must have access to the same curriculum. The registry does not include data from experimental (e.g., random) assignment of participants to curricular experiences.
  3. Data cannot be anonymous. An individual’s data can only be incorporated into the registry if the consent status of that individual can be determined.

ROMEO Registry procedures

Obtaining consent.

Many IRBs consider medical students and residents to be vulnerable populations; therefore, minimizing risks associated with choosing not to participate is a critical element of our consent and data management procedures. Informing potential participants about these procedures is critical to ensuring fully voluntary consent.

To avoid actual or perceived coercion and to preserve the privacy and confidentiality of the decision to participate in the registry, the process of obtaining informed consent is conducted by the ROMEO Registry research staff (i.e., individuals who do not have responsibility for the education and evaluation of trainees).

We present the ROMEO Registry to cohorts of trainees in groups: We schedule recruitment presentations for residents during orientation and for first-year students during one of the early sessions of their Practice of Medicine course. In the recruitment sessions, we review the consent form and answer questions; then, to ensure the privacy of potential participants’ decisions regarding consent, we provide each trainee with an envelope and the consent form. The consent form requires trainees to sign if they agree to consent or if they do not agree so that merely signing the form will not publicly disclose the decision. All trainees are also given a copy of the consent form for subsequent review, and it contains relevant contact information. The greatest challenge to obtaining consent is tracking down individuals who do not attend the session where recruitment occurs: Consent rates based on the denominator of trainees present usually exceed 90%, but representation of the full population is lower because of the challenge of recruiting trainees who did not attend (see Table 1).

Data storage, access, management, and mapping.

The ROMEO Registry team of investigators and research staff maintains a distinct, secure database of consenting participants that is compliant with the Health Insurance Portability and Accountability Act of 1996 and the Family Educational Rights and Privacy Act of 1974. The team feeds educational data into the registry only for those students, residents, and fellows from whom we have obtained informed consent. A link between registry participants’ names and a unique registry ID in the consent database is maintained in a secure location; this information is used only to match individuals across data sets and over time. Information on which individuals consented to be in the registry is not accessible to anyone other than registry staff, which helps maintain participant confidentiality and minimize risk to trainees.

We have created a data request/use certification form that both records the data request and summarizes the various policies and procedures of the ROMEO Registry. In general, as in patient registries, medical education researchers at our institution who wish to routinely access registry data become co-investigators on the registry. Other NYU investigators interested in accessing registry data submit a brief application, complete the data use agreement, and then submit an exempt application to our local IRB. Researchers from outside NYU file an IRB application with their host institution and then complete a data use agreement with NYU.

To fully harness the power of the ROMEO Registry’s educational data, we have devoted considerable human resources to cleaning, organizing, defining, and merging data elements and data sets. Data come in many forms, including paper-based checklists and questionnaires that must be entered; spreadsheet files with little data definition; and highly structured, nested data sets. We have a minimum of three registry research staff members—including an intern, an entry-level data manager, and a data analyst—supervised by our team of educational investigators, working to maintain a consistent identification scheme to preserve confidentiality, to define discrete variables consistently across data sets, to restructure data sets, and to develop a data catalog to support external use of our data. We have found REDCap6—a secure Web-based application for building and maintaining online surveys and databases, freely available through the Clinical and Translational Science Awards program of the National Center for Advancing Translational Sciences—to be an invaluable resource for collecting, storing, managing, and defining data. REDCap is user-friendly; has mechanisms both for collecting data through forms and surveys and for importing data files; has built-in data definition and code book functions; provides multiple levels of user access to protect confidentiality; has auditing capabilities; and exports data in all of the most useful formats.

We have also leveraged NYU Health Sciences Library resources to collaborate on a data catalog, to take advantage of data management training materials, and to harness the skills of an experienced medical ontologist (J.X.) to engage in an ontology mapping process to ensure the syntactic and semantic interoperability necessary for integration of information and analysis across different systems. This mapping, in turn, will allow us to link educational activities and assessments to standard terminologies used to organize clinical and competency-based data.


To date, 72 studies based on ROMEO Registry data have been presented and/or published. These studies have used UME (19; 26%), GME (50; 69%), and UME/GME (3; 4%) data. By focus, they include curriculum evaluations (19; 26%), competency assessments (16; 22%), needs assessments (12; 17%), skills development studies (8; 11%), studies focused on assessment methodology (5; 7%), and studies about remediation (2; 3%). A small but growing number of studies are incorporating clinical data; these studies include 4 (6%) focused on patient outcomes (e.g., patient weight loss), 4 (6%) assessing quality of clinical care, and 2 (3%) focused on transfer of skills from educational to practice settings. (For a list of studies by focus, see Supplemental Digital Appendix 1 at

The registry has helped support considerable educational scholarship. To date, 51 individuals, including residents, fellows, and faculty members, have published and presented studies using ROMEO Registry data. We believe the availability of the registry data as well as the collaboration and technical assistance fostered as by-products of sharing and making use of shared data have enhanced our faculty members’ productivity and professional development in medical education research.

Our experience with this registry at NYU SOM over the last seven years has demonstrated that the registry not only is acceptable to trainees and those concerned with research ethics but also has provided a welcome approach for facilitating medical education scholarship among our faculty. The registry has enabled researchers from various perspectives to conduct medical education research ethically, efficiently, and using longitudinal, multiple-source data. By lowering the barriers to and supporting faculty development in the study of new curriculum or assessment measures, we have been able to create a community of medical education practice—that is, a joint enterprise to develop a shared social structure, common values, and resources.7 However, much more work needs to be done to ensure that we use the registry to its fullest, including further dissemination beyond our local medical education research community.

Next Steps

In a “big data” paradigm, the database is only as good as the comprehensiveness and quantity of its data; in a standard research paradigm, data quality is paramount. In the ROMEO Registry, we are seeking to maximize both quantity and quality in parallel, which is difficult work. Achieving this will only be possible through close collaboration with educators to ensure that routine medical education assessments for ongoing trainee evaluation and feedback are as high in quality as possible and include critical variables. For example, our medical educators are working to incorporate more workplace-based assessment and feedback (including using unannounced standardized patients to assess actual clinical practice, expanding interprofessional multisource feedback measures to include more data points, and incorporating more proximal measures of patient-important outcomes). We are working closely with them to ensure a consistent, institution-wide approach.4

Securing continuing financial support for the registry and the work of maintaining it is an ongoing challenge. Full-time staff with data management, quality assurance, data definition, analysis, and reporting skills are required, and advanced data modeling and mining analytics skills have become increasingly vital to managing and analyzing the registry’s complex data.

Finally, we aim to expand the ROMEO Registry across institutional boundaries to create medical education research collaborations in which academic health centers agree to use shared measures and then share data to arrive at generalized understandings of the impact of medical education on practice and on patients. We are pleased to discuss our approach and share our materials; for example, our consent form is available as Supplemental Digital Appendix 2 at We encourage others with similar registries to do the same. We are seeking additional grant support to facilitate the sharing of registry data with researchers across more departments within our institution and with researchers outside our institution.

Ultimately, a well-maintained, multi-institutional medical education registry will allow medical education researchers to conduct increasingly meaningful, large-scale research capable of assessing the relative impact of education on distant health outcomes, accounting for individual-, institution-, and system-level contributions. Here, we have reported on our early experience of establishing an ethical, pragmatic approach to studying the medical education continuum at NYU SOM. This foundational work needs to be scaled up to maximize its potential impact. The concept of a learning health care system8—with its integration of academic medicine’s research, education, and clinical service missions—may be one point of leverage for ensuring necessary resources to support the registry. Significant resources will be required to achieve the goal of linking medical education to the public’s health, a goal that must be achieved to justify the societal investment in medical training and to ensure the provision of high-quality medical care. Creating and maintaining medical education research registries like ours is one important step toward achieving that goal.

Acknowledgments: The authors would like to acknowledge the invaluable assistance of Elan Czeisler, the director of the New York University School of Medicine institutional review board, in the initial conceptualization and development of this educational registry.


1. Carney PA, Nierenberg DW, Pipas CF, Brooks WB, Stukel TA, Keller AM. Educational epidemiology: Applying population-based design and analytic approaches to study medical education. JAMA. 2004;292:10441050.
2. Ellaway RH, Pusic MV, Galbraith RM, Cameron T. Developing the role of big data and analytics in health professional education. Med Teach. 2014;36:216222.
3. Cook DA, West CP. Perspective: Reconsidering the focus on “outcomes research” in medical education: A cautionary note. Acad Med. 2013;88:162167.
4. Kalet AL, Gillespie CC, Schwartz MD, et al. New measures to establish the evidence base for medical education: Identifying educationally sensitive patient outcomes. Acad Med. 2010;85:844851.
5. Cook DA, Andriole DA, Durning SJ, Roberts NK, Triola MM. Longitudinal research databases in medical education: Facilitating the study of educational outcomes over time and across institutions. Acad Med. 2010;85:13401346.
6. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42:377381.
7. Wenger E, McDermott R, Snyder W. Cultivating Communities of Practice: A Guide to Managing Knowledge. 2002.Boston, Mass: Harvard Business School Press.
8. Olsen LA, Aisner D, McGinnis JM. Institute of Medicine Roundtable on Evidence-Based Medicine; The Learning Healthcare System: Workshop Summary. 2007. Washington, DC: National Academies Press; Accessed July 24, 2015.
Copyright © 2016 by the Association of American Medical Colleges