Knowledge generation, a complex multistep process, is the core task of research and the basis for scientifically founded public health and medicine. When the same study redone under the same conditions gives the same results, it is called reproducible. A related but different question is to what extent patterns, magnitudes of associations, and scientific inferences are replicable in new studies addressing the same scientific question in a different population, under different conditions, or with different approaches.1,2 In parallel to the distinction between reproducibility and replicability as defined by National Academies of Sciences, Engineering, and Medicine,2 the terms of methods reproducibility, results reproducibility, and inferential reproducibility have been proposed.3 Lack of reproducibility or replicability may stem from cognitive biases, insufficient training in research methods or absence of methodologists on research teams, and poor or selective reporting stimulated by skewed academic reward systems.4–8
To increase reproducibility, several fundamental changes have been proposed, including the use of reproducible reporting software during statistical analysis,1,9–11 sharing of study data12,13 and analysis code,14–17 guidelines for manuscript standardization and reporting,18,19 and combinations of these elements.8,20,21
The primary emphasis to enhance research reproducibility has focused on how research results are documented and reported. However, few initiatives address how biomedical and public health research is conducted starting with conception of study ideas. An exception is that the National Institutes of Health (NIH) now mandates text sections on rigor and reproducibility in grant proposals and requires online and classroom training modules.5 Methodologic resources provided by the NIH focus on prespecifying techniques such as randomization and blinding as well as analytic plans, many of which are applicable to experimental studies like clinical trials and some nonhuman studies.22,23 However, most epidemiologic and clinical research is observational, not experimental.
Research integrity as discussed here refers to the conduct of high-quality, relevant research that requires commitment from researchers, institutions, and the scientific ecosystem, not just an absence of fraud, fabrication, and plagiarism.24 A comprehensive approach to promoting integrity of observational research needs to address all steps of the knowledge generation process, long before disseminating end results. Integrating training components in the conduct of studies could make research integrity part of a “hidden curriculum” that promotes high-quality research practices throughout an investigator’s career. Anchoring such an approach in an institutional culture may promote its sustainability. Here, we describe a decades-long experience with a research community-based approach to conducting observational research aimed at promoting reproducibility and a culture of integrity.
SETTING AND COMMUNITY MEMBERS
The epidemiology research community described here is formed around five ongoing large-scale prospective cohort studies (Table), benefitting from participants who are exceptionally devoted to this research decades after initial enrollment.
Ongoing Prospective Cohort Studies
Researchers that constitute the community are both internal and external to the institutions that administer the cohort studies. These institutions are the Chronic Disease Epidemiology unit within the Channing Division of Network Medicine, a research division in the Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, and the Departments of Epidemiology and Nutrition at Harvard T.H. Chan School of Public Health. Researchers assess causes and consequences of many common chronic diseases; use questionnaire- and biomarker-based information on demographics, lifestyle, diet, genetics, and -omics; develop epidemiologic and statistical methods; contribute to international research consortia; and support educational programs based on classroom instruction, informal teaching, and mentoring, partly through training grants.
Researchers include masters and doctoral students, postdoctoral fellows, staff scientists, and faculty. Outside collaborations are welcome and common. Analysis and grant proposals can be submitted by any academic researcher. The application process and directions are provided on the public websites. Collaboration or co-authorship of internal investigators or cohort principal investigators is not required. Indeed, publications outside the scientific areas of the principal investigators generally do not include them as coauthors, and numerous publications have reanalyzed data underlying previous publications, sometimes using alternative frameworks or different analytical approaches, at times independent from the initial investigators, and occasionally with different inferential results. Applications from external investigators are not denied solely due to lack of merit or if they are duplicative of prior or ongoing research, though investigators are notified of overlap. With a cohort computing account, which requires documentation of training in human subjects protections, external investigators have access to study data and code in the same manner and to the same degree as local investigators. Previous predoctoral or postdoctoral trainees at the affiliated departments who gained familiarity with the complex longitudinal cohort data and analytical approaches during their training often continue as external investigators.
A central meeting point for the research community are the cohort meetings, held biweekly for each of the cohorts. In addition to the research teams, cohort meetings are attended by technical, administrative, and programming staff of the cohorts, principal investigators of the cohorts, other faculty members, and trainees. Participation of a faculty member from the research team is required if the presenting team member is a trainee.
ELEMENTS OF RESEARCH INTEGRITY: THE LIFE COURSE OF A RESEARCH PROJECT
Analysis Proposals and Grant Proposals
The approach to research integrity (Figure) starts at the origin of the research process with hypotheses and study plans. In this research community, both internal and external researchers submit a proposal of approximately two to three pages. This step applies to both grant proposals or substudies and analysis proposals.
Cohort meetings discuss all grant proposals and any substudy requiring direct contact with participants, such as studies collecting additional data (e.g., supplemental surveys, assays on prospectively collected biospecimens, validation studies), for which funds are typically required to conduct new data collection. The meetings address potential burden on participants and other human subjects considerations. Analyses based on biospecimens undergo closer scrutiny for scientific merit and feasibility, given that those resources are irreplaceable. Proposals typically present the grant specific aims as well as relevant preliminary data (e.g., pilot results for novel assays) and questionnaires as applicable.
Analysis proposals outline the scientific background, hypotheses of the study, study design, exposure definitions, outcomes (primary and secondary), covariates (confounders and effect modifiers), and a brief statistical analysis plan. Analysis proposals are either presented at cohort meetings held biweekly or sent via biweekly e-mails to the research community for review and input. The extent of discussion is highly variable. Recommendations pertain both to methodology and subject matter. Typical comments include recommendations to consider prior or ongoing efforts with potentially synergistic or complementary approaches to the same questions; these recommendations are not binding. Other comments might be suggestions to include investigators familiar with the exposure or outcome data to add to the study team’s expertise; to consider additional exposure or outcome definitions, confounders, and effect modifiers; and to consider alternative analytical approaches and/or including methodologists in the study team. Meeting minutes, which include the proposals, and brief notes on any recommendations that arose during the meeting are circulated to affiliated researchers and staff and are archived in a searchable database.
For analysis proposals, very early presentation is strongly encouraged, although evaluation of feasibility and achievable precision or power is encouraged in advance. Cohort principal investigators plus designated faculty members briefly review grant proposals for feasibility and support needs. Proposals are then presented at least 4–6 weeks in advance of institutional deadlines to allow for separate budget discussions with principal investigators after the meeting presentations and before grant submission to ensure appropriate funds will be available for the work. Researchers are encouraged to familiarize themselves with basic exposure and outcome data, for example, through prior publications, to consider data collection processes and whether exposure and outcome are frequent enough to yield a meaningful study.
Researchers return to cohort meetings for results presentations, including draft abstracts, tables, and figures as appropriate. Researchers are not expected to have finalized the manuscript, and indeed authors are encouraged to present when they can still take into account the suggestions that arise. Discussions of analysis results revolve around similar topics as analysis proposals but are typically longer. Rarely, researchers may be asked to report again at a subsequent cohort meeting if the presentation was unclear or if major additional analyses or interpretations are suggested. Analysis proposals and results that build on more than one cohort are presented at only one of the cohort meetings and are circulated at the others.
After presentation at a cohort meeting and when the manuscript is completed, but before submission for publication, researchers are required to complete Data ID forms that document the localization (path) to read in data files, analytical code, and output from statistical analysis software. The cited code outlines the study design, core assumptions in the code, details of variables used in the code, and the analytic framework. Unix-based server systems at the Channing Division of Network Medicine or at Harvard’s Faculty of Arts and Sciences serve as programming environments for all researchers and as long-term storage. Data are managed by cohort programmers, and pseudonymized data versions for analyses are stored as fixed-width, comma-separated, or SAS binary files, with version control typically by including dates in file names. Although most existing analytical code is reliant on SAS, more recent analyses frequently employ R. The source for every number, either by table or figure or by page number in the text, within the manuscript is documented in the Data ID form, referring to the respective software log file. All researchers have access to all cohort-related data that have been collected (e.g., questionnaire data, derived variables, assay data from biospecimens, etc.), except participants’ protected health information. Moreover, all analytical code from all prior studies is freely available to everyone with access to the computer system. Although implementations of git and GitHub for version control are available and now mandatory, most data and code predate their use. For analytical projects, no requirements are currently in place for specific file structures or use of reproducible reporting tools such as markdown. Prior analyses, referenced by their Data ID forms, are searchable electronically since 2008; older projects are archived on paper.
A dedicated expert cohort programmer reviews all the programming code related to the specific study. Since 2017, the data analyst for a project is asked to complete a self-directed checklist to preempt common issues encountered on review. During program review, special emphasis is placed on logic, consistency, reproducibility, read-in of original datafiles rather than intermediate or temporary versions, and inclusion of in-house developed software modules (e.g., SAS macros or R functions) for commonly used data management operations and analyses, rather than custom code fulfilling similar functions, unless necessary for the analysis. Researchers are asked to correct programming code if errors are identified and to then update the Data ID form. Because datasets are dynamic, ever growing as new exposures and outcomes are ascertained, each version of the dataset is documented and archived so that all results can be reproduced, even after an interval of decades.
Over time, the reviewing programmer initiates changes to reduce the risk of repeated errors by adding warning or explanatory notes for particular data files that are prone to being used erroneously. In addition, the programming group develops new macros for oft-used programming steps to minimize error. Model programs are provided for the most common types of analyses and some research groups have developed “common code” that includes appropriate coding of known or putative risk factors as well as disease characteristics to ensure comparability across studies. Also, at periodic faculty meetings, the reviewing programmer presents common errors and ways they can be avoided.
Concordance of results presented in the manuscript and the analytical software’s output log is verified by a member of the primary team other than the author who wrote the code and drafted the manuscript, typically the second author. Each number in the paper is checked against the computer output, and the manuscript is checked to be sure that all numbers in the text match those in the tables, percentages are accurate, and the data are consistent within the manuscript. Code has been developed to directly output results into tabular formats to reduce transcription errors.
Researchers submit the complete manuscript after review by all coauthors, Data ID form, and a checklist (Manuscript Checklist form) for final review. Administrative review, done by an administrative assistant, coordinates the final review, verifies completion of the steps described above, documentation of institutional review board approval, and inclusion of funding acknowledgments. Scientific review is conducted by a senior faculty member and focuses on the abstract, main findings, and their interpretation. In comparing with notes from the discussion of the results presentations at cohort meetings, the manuscript is occasionally returned to the authors if corrections or clarifications are felt necessary. The manuscript is then given formal approval for submission to scientific journals and the abstract is sent, noting a request for confidentiality, to the entire research community to facilitate communication.
Researchers are requested to submit a copy of the final manuscript for local archiving and to ensure submission to PubMed Central. If analyses are revised or expanded during the external peer-review process, researchers are requested to submit an updated Data ID form. If major new analyses are done, a new program review may be needed.
TRAINING AND COMMUNICATION
Many studies are led by scientists-in-training who typically have had rigorous classroom training in observational study design and analysis, but who may have little to no prior practical experience with large-scale cohort studies or programming. Design of the cohorts, questionnaires, biospecimens, validation studies, shared code, and forms are documented on intranet pages (“CohortDocs”) accessible to all users, both internal and external, with a cohort computing account. A detailed beginner’s guide on “Conducting analyses” is provided as well, which covers common issues such as creating analytical plans, defining study populations and outcomes, defining variables, and handling missing data. Analysis proposals, Data ID forms, and all manuscripts are archived in a searchable intranet database. This archive provides a major efficiency, because investigators can modify previously written code from related analyses, rather than start from scratch each time.
The five cohort studies have also been longstanding contributors of data to consortia and pooled analyses across cohort studies where data analyses are conducted offsite. Such projects are handled similarly to the process described above, with approval and assignment of an investigator assigned to review participation in specific analyses for that consortium to ensure appropriate data are available, regulatory compliance, and process adherence. The main difference is that, for those consortia, a limited dataset is prepared and exported, subject to data use agreements.
The community of affiliated researchers is connected through cohort-specific mailing lists. Cohort meeting agendas with titles of proposals and results are circulated in advance of meetings. Meeting protocols summarize presentations and discussion items. Abstracts from studies are circulated on all mailing lists after they have gone through final review, noting their unpublished and confidential status. This process serves to keep the community informed of new findings and analyses, and sometimes elicits additional suggestions. Mailing lists also serve to disseminate other updates, such as extended follow-up data for disease and mortality outcomes, updated derived variables, or major changes to shared analysis programs.
We describe a research community-based approach to promoting integrity in observational research that has been implemented in our academic setting for decades. Rather than focusing on end products of science such as analysis results and published articles, the approach considers all steps of scientific inquiry, starting with hypotheses and study design. At the heart of the approach are team-based discussions, education, and sharing findings and institutional memory throughout the process. The technical core encompasses shared computing environments with mandatory code review and long-term archival of analytical code and results, strengthening reproducibility of results published in thousands of research articles.
Some elements of the approach described here bear resemblance to research consortia, where researchers submit proposals and obtain approval before final manuscript submission. Other elements, such as archiving or reviewing study data and analytical code, are elements of reproducible reporting suggested by others.1,11,25,26 What most characterizes our approach is its array of efforts jointly aimed at research integrity, education, integration in an academic community, and our long-term experience. Clearly, our approach does not focus on a single numeric end result of a given hypothesis, even when statistical issues are considered at multiple steps of the process. Computing platforms with long-term storage are essential, but technical or software solutions are just one ingredient. Instead, our approach accompanies the entire research process early on, a practice previously termed “prevention,” in contrast to a “medication” angle to research integrity that would not start until a finalized manuscript undergoes external peer review.27 Our approach is meant to allow for the complexity of scientific questions, data, and methodologies that are central to observational research. It promotes reproducibility of every result, no matter if decades old, perhaps one of the most basic features of research integrity. However, neither our approach nor any other could ensure that findings will always be replicable in other studies, because replication is simply not always to be expected nor would it always be desirable.28,29
Some additional potential benefits of the approach are noteworthy. First, cohort meetings provide forums for scientific discussion beyond individual research teams, fostering scientific discourse and shaping belonging within the broader research community. These forums also provide stimuli to ensure adequate expertise on study teams, lead to closer intrainstitutional collaborations, and can stimulate novel research ideas across disease areas. Second, the value of reproducibility is underscored by the technical review of the manuscript being completed by a coauthor, usually the second author of a manuscript. Third, there are obvious efficiency gains for researchers who can build on previously reviewed code and results from previous projects that are retrievable through their publications and linked Data ID forms.
External collaborators are integrally included in the process through cohort meetings and on mailing lists, and they obtain similar access to original unabridged data, code, and expertise as local investigators. Providing researchers with personal logins (as opposed to public access) is critical, because the amount and granularity of longitudinal information makes reidentification of deidentified data a real threat to the privacy of cohort participants.12 This approach to data sharing could be compared with approaches that regard “data” as freestanding ingredients of single published papers, which then can more easily be shared on the Internet, inevitably in abridged versions. Our NIH-approved “data enclave” approach may seem more restrictive at first glance. Balancing data access and participant confidentiality, this approach is a partial solution to challenges related to data use agreements and institutional review board approvals for sharing of easily reidentifiable data. Yet the approach also ensures that researchers working with data benefit from full contextual information, enabling them not only to reproduce original findings but to advance scientific inquiry by harnessing the richness of contextual data, executing decades of analytical code in its computing environment, and the expertise of other investigators.
Articulation of all studies’ hypotheses and their fundamental study design is the first element of the research process described here.30 The clearest benefits are that study teams form, evaluate their priorities, and consider methodologic questions upfront rather than after data analysis. Preregistration for observational research is far more controversial than for experimental clinical research (clinical trials).13,31,32 Although preregistration might help avoid duplicative efforts in a few cases,33 it is unknown whether it increases reproducibility and replicability. Initial empirical data cast doubt that preregistration of observational studies in public databases would only have intended consequences.34
At first glance, our experience might be most relevant for large-scale cohort studies in epidemiology. Indeed, the long timespan of up to several decades between prospective exposure measurements and follow-up for outcomes probably helped promote continuity and consistency, which, in turn, benefitted the research. However, nearly every step implemented in our approach might also be informative for the conduct of observational studies in other settings, such as clinical medicine. Instead of a cohort, a unit of organization might be a specialty division or a disease-oriented group. Expertise and formal training in study design and analysis (i.e., in epidemiology), a skill shared in our research community, could be important for such implementation. Indeed, our approach addresses some of the elements for reproducible research proposed for “real-world evidence” by the U.S. Food and Drug Administration35 and expert societies.36,37
The long tradition of our approach makes it difficult to evaluate which elements of the approach are most beneficial for research integrity, which could be eliminated without much negative effect, and which may be counterproductive. For example, program reviews prevent some errors that would alter scientific results, but they add days to weeks of review time to the timeline of each project, and the evolution of analytical software, for example, for high-dimensional data, can be a bottleneck for reviews. New implementations of similar efforts at other institutions might be well suited for formal process evaluations, where stakeholders may be more likely to provide fundamental feedback than when the approach is engrained in the community as it is the case in our example. We hope our that our experiences, perhaps together with accounts from other research communities, could be empirically assessed in such settings. Nevertheless, our experience clearly demonstrates that a multimodal approach combining several elements to reproducibility and replicability independently proposed by others is feasible and sustainable. Of course, our approach does not guarantee that every result will be “right.” Confirmation bias, a general unintended consequence of reproducibility, is related to the risk of perpetuating analyses that consistently result in reproducible yet incorrect results, as compared with a (generally unknown) truth. The main goal of our approach is that reported results be reproducible, and that inevitable errors that arise be detectible at multiple checkpoints before manuscripts are published and before they have the chance to inform health decision-making. Advancing scientific inquiry may benefit from first being able to reproduce what was done previously and then making explicit changes to the approach, rather than simply getting a different result when starting from scratch. Our approach does not control what researchers do along the way. The currently implemented approach also does not check whether analyses have begun before analysis proposals are presented, or require that all secondary outcomes be prespecified, that analyses adhere closely to the analysis proposal, or that projects be completed and published. We believe that process elements such as presenting projects at two different stages of completion coupled with the ability to retrieve prior work creates a feeling of belonging and accountability in the research community that might go beyond the effect of strictly enforced rules and stipulations. This aspect is one of many that might deserve further empirical evaluation.
In summary, we describe how an entire research community can conduct a process of multiple steps that jointly foster integrity of observational research. These experiences might be helpful for other institutions conducting observational research that consider how value can be added to the research process beyond mere reproducibility. New implementations of such approaches are probably best accompanied by empirical evaluations. Gains in consistency and efficiency achieved may be most apparent in the long term. On a note of caution, we are finding it increasingly challenging to fund efforts described here in times of ever-tighter federal infrastructure budgets. We believe it would be preferable if such costs could be fully covered through cohort infrastructure grants. Requiring internal and external investigators to budget for a share of these costs on each research project, as we are currently forced to request, creates barriers to investigators and adds administrative burden. In any case, with substantial personal efforts and demands on computing infrastructure, research reproducibility and integrity do not come for free, unlike perfunctory reproducibility checklists. Our experiences suggest that such investments are well-spent, particularly when they contribute to strengthening a research community. The most important impact may be trust building—both in the integrity of research results and within a research community.
We thank the participants of the cohort studies for their dedicated and generous contributions, which are the foundation of this research community, and our cohort staff for their expert help. We also thank focus group participants, Brenda Birmann, Daniel Wang, Hari Iyer, Kevin Kensler, and Samantha Molsberry, for sharing their perspectives.
1. Peng RD, Dominici F, Zeger SL. Reproducible epidemiologic research. Am J Epidemiol. 2006;163:783–789.
2. National Academies of Sciences, Engineering, and Medicine. Reproducibility
and Replicability in Science. The National Academies Press; 2019.
3. Goodman SN, Fanelli D, Ioannidis JP. What does research reproducibility
mean? Sci Transl Med. 2016;8:341ps12.
4. Begg CB, Berlin JA. Publication bias and dissemination of clinical research. J Natl Cancer Inst. 1989;81:107–115.
5. Collins FS, Tabak LA. Policy: NIH plans to enhance reproducibility
. Nature. 2014;505:612–613.
6. Franzoni C, Scellato G, Stephan P. Science policy. Changing incentives to publish. Science. 2011;333:702–703.
7. O’Boyle EH, Banks GC, Gonzalez-Mulé E. The Chrysalis Effect: how ugly initial results metamorphosize into beautiful articles. J Manag. 2016;43:376–399.
8. Munafò MR, Nosek BA, Bishop DVM, et al. A manifesto for reproducible science. Nat Hum Behav. 2017;1:0021.
9. Fowler J, San Lucas FA, Scheet P. System for quality-assured data analysis: flexible, reproducible scientific workflows. Genet Epidemiol. 2019;43:227–237.
10. Goecks J, Nekrutenko A, Taylor J, Galaxy T. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11:R86.
11. Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol. 2013;9:e1003285e1003285.
12. Colditz GA. Constraints on data sharing: experience from the nurses’ health study. Epidemiology. 2009;20:169–171.
13. Poole C. A vision of accessible epidemiology. Epidemiology. 2010;21:616–618.
14. Assel M, Vickers AJ. Statistical code for clinical research papers in a high-impact specialist medical journal. Ann Intern Med. 2018;168:832–833.
15. Goldstein ND. Toward open-source epidemiology. Epidemiology. 2018;29:161–164.
16. Goldstein ND, Hamra GB, Harper S. Are descriptions of methods alone sufficient for study reproducibility
? An example from the cardiovascular literature. Epidemiology. 2020;31:184–188.
17. Goldacre B, Morton CE, DeVito NJ. Why researchers should share their analytic code. BMJ. 2019;367:l6365.
18. Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. PLoS Med. 2007;4:e297e297.
19. Chambers C. The registered reports revolution lessons in cultural reform. Signif. 2019;16:23–27.
20. Laine C, Goodman SN, Griswold ME, Sox HC. Reproducible research: moving toward research the public can really trust. Ann Intern Med. 2007;146:450–453.
21. Nosek BA, Alter G, Banks GC, et al. SCIENTIFIC STANDARDS. Promoting an open research culture. Science. 2015;348:1422–1425.
22. Landis SC, Amara SG, Asadullah K, et al. A call for transparent reporting to optimize the predictive value of preclinical research. Nature. 2012;490:187–191.
23. National Institutes of Health. Research Methods Resources, 2019. Available at: https://researchmethodsresources.nih.gov/
. Accessed 14 September 2019.
24. Research integrity
is much more than misconduct. Nature. 2019;570:5.
25. Vable AM, Diehl SF, Glymour MM. Code review
as a simple trick to enhance reproducibility
, accelerate learning, and improve the quality of your team’s research. Am J Epidemiol. 2021;190:2172–2177.
26. Platt RW. Code Review
: an important step towards reproducible research. Am J Epidemiol. 2021;190:2178–2179.
27. Leek JT, Peng RD. Opinion: reproducible research can still be wrong: adopting a prevention approach. Proc Natl Acad Sci U S A. 2015;112:1645–1646.
28. Lash TL. The harm done to reproducibility
by the culture of null hypothesis significance testing. Am J Epidemiol. 2017;186:627–635.
29. Lash TL, Collin LJ, Van Dyke ME. The replication crisis in epidemiology: snowball, snow job, or winter solstice? Curr Epidemiol Rep. 2018;5:175–183.
30. Cole P. The hypothesis generating machine. Epidemiology. 1993;4:271–273.
31. The Editors of Epidemiology. The registration of observational studies--when metaphors go bad. Epidemiology. 2010;21:607–609.
32. Lash TL, Vandenbroucke JP. Should preregistration of epidemiologic study protocols become compulsory? Reflections and a counterproposal. Epidemiology. 2012;23:184–188.
33. The Editors of Epidemiology. On the death of a manuscript. Epidemiology. 2002;13:495–496.
34. Boccia S, Rothman KJ, Panic N, et al. Registration practices for observational studies on ClinicalTrials.gov
indicated low adherence. J Clin Epidemiol. 2016;70:176–182.
35. U.S. Food and Drug Administration. Framework for FDA’s Real-word Evidence Program, 2018. Available at: https://www.fda.gov/media/120060/download
. Accessed 20 March 2022.
36. Berger ML, Sox H, Willke RJ, et al. Good practices for real-world data studies of treatment and/or comparative effectiveness: recommendations from the joint ISPOR-ISPE Special Task Force on real-world evidence in health care decision making. Pharmacoepidemiol Drug Saf. 2017;26:1033–1039.
37. Wang SV, Schneeweiss S, Berger ML, et al. Reporting to improve reproducibility
and facilitate validity assessment for Healthcare Database Studies V1.0. Pharmacoepidemiol Drug Saf. 2017;26:1018–1032.