Horn, Susan D. PhD; Gassaway, Julie MS, RN
Comparative effectiveness research (CER) examines relationships between medical treatments and patient outcomes. To make meaningful comparisons of medical interventions, one must consider clinical heterogeneity of patient populations, intervention combinations, and outcomes. Conventional randomized controlled trials (RCTs) are of limited generalizability as they severely restrict enrollment to ensure a homogenous study cohort. Also, RCTs analyze treatment effects individually, rather than in combinations that reflect clinical practice.1–4 CER offers a way to hasten the discovery of best practices and contributes to personalized medicine by accounting for patient differences and treatment combination variations.5
Researchers, clinicians, and policy makers seek alternatives to RCTs to compare effectiveness of overlapping and diverse treatments in real-world patient populations.5 Practice-based evidence (PBE) research methodology is an alternative that can be applied in many clinical settings and specialties.1,4,6–13 PBE is a prospective, observational cohort design and shares some similarities with registries, cohort, and other observational studies. However, PBE studies do not limit the numbers and types of interventions or artificially restrict patient variability. PBE accommodates multiple concurrent interventions and patient characteristics that reflect actual clinical practice, using data from natural settings to describe the content and timing of treatments that are associated with better outcomes (including patient reported outcomes) for patients with specific characteristics. Medical care is neither suspended nor altered in PBE studies, comprehensive and multidimensional patient severity descriptors are included in analyses, and variability is controlled statistically rather than through enrollment criteria and randomization. PBE designs have high external validity because they include virtually all patients with or at risk for the condition of interest, as well as potential confounders that could alter treatment responses.
PBE study designs address comparative effectiveness by creating a comprehensive set of patient, treatment, and outcome variables and analyzing them to identify treatments that are associated with better outcomes for specific types of patients. This methodology incorporates 4 elements of practical clinical trials critical to increase value of clinical research: comparison of clinically relevant alternative interventions, inclusion of diverse study populations, participant recruitment from heterogeneous practice settings, and data collection covering a broad range of health outcomes.14 Most PBE projects select study sites that are diverse in terms of geographic location, setting type (eg, academic, community, rural), and race/ethnicity of patients.
PBE is an example of “participatory action research”4,12 as it involves clinicians engaged in actual patient care. A key step in a PBE study is assembling a diverse project team with multiple areas of relevant expertise: multidisciplinary care providers; administrative, technological, and research experts; and consumer groups. For example, the project team in a PBE spinal cord injury rehabilitation study included physicians, nurses, therapists, psychologists, social workers, information technology experts, and patients who had survived a spinal cord injury.
Cooperation is engendered by engaging front-line clinicians in all aspects of PBE projects. The project team defines the variables to be included in the PBE project based on initial study hypotheses, a literature review, and their clinical experience and training. Many relevant details about patients, treatments, and outcomes are recorded in patient medical records; however, the project team often identifies additional critical variables that must be collected in supplemental standardized documentation developed specifically for the PBE study.
Here, we provide an overview of how PBE measures and controls for heterogeneity of patients, treatments, and outcomes seen in day-to-day clinical settings, with reference to 4 example PBE studies we have conducted (Table 1), followed by a discussion of measurement error issues associated with PBE data sources, and appropriate analytic techniques.
PBE addresses patient heterogeneity by measuring a wide variety of patient characteristics that go beyond race, gender, age, payer, and other variables in administrative databases. For some studies, it is important to incorporate genetic or genomic/proteomic information. The goal is to measure all variables that contribute to a comprehensive patient description and control for treatment selection bias: individually, in combination, or through assimilation into an overall measure of clinical complexity that medical personal take into account when treating a patient.
The Comprehensive Severity Index (CSI)26–29 provides a comprehensive measure of how ill (extent of deviation from “normal”) a patient is at the time of presentation to the healthcare system and over time within the system. CSI is age- and disease-specific, independent of treatments, and provides an objective, consistent method to define patient severity of illness levels based on over 2100 signs, symptoms, and physical findings related to a patient's disease(s), not just diagnostic information (ICD-9-CM coding) alone. CSI calculation uses weighting algorithms based on ICD-9 codes and data management rules to calculate severity scores for each patient overall and separately for each of a patient's diseases (principal and each secondary diagnosis). These severity scores permit researchers to ask “how sick is the patient?” in clinically relevant ways during analysis.
Often, it is appropriate to use the full “patient as a whole” CSI score because of the large effect that comorbidities have on patient outcomes.9,26–29 For example, a patient admitted to a hospital with pneumonia typically is more challenging to return to baseline health if that patient also has complicating comorbidities such as severe congestive heart failure or renal failure, or develops complications such as sepsis or adult respiratory distress syndrome. Researchers can examine the overall severity score for the patient that includes all of the patient's complications and comorbidities, or they can look at the admitting diagnosis of pneumonia or each secondary diagnosis, depending on what is most relevant for specific analytic questions. Upon initiation of each new PBE study, CSI criteria for the study condition, including the 4-level categorization of individual variables (level 1: normal to mild; level 2: moderate; level 3: severe; level 4: catastrophic or life threatening), are reviewed and edited if the team thinks it appropriate.9
CSI is used along with other measures of patient differences such as level of spinal cord injury, tumor stage and type for cancer, etc, to control for patient differences and can help to assess treatment selection bias when that is a question. CSI has been validated extensively for more than 25 years in numerous studies in inpatient (adult and pediatric), ambulatory, rehabilitation, and long-term care settings.26–29
While clinical guidelines strive to standardize care for specific conditions, treatment in clinical settings is often determined by facility standards, regional differences, and clinician training. Therefore, like patient heterogeneity, treatment heterogeneity must be measured and controlled during PBE study analyses. PBE project teams identify and define treatment variables to be measured and specify how to collect them in a standard manner for all participating sites. Consistent measurement standards are important to minimize the influence of documentation practices and completeness on data quality. PBE does not require providers to follow treatment protocols or exclude certain treatment practices. It uses variation to identify better practice by examining different approaches to care while controlling for patient variables.
Some treatment processes are described consistently in conventional medical chart documentation. Medication administration, equipment use, and surgical approaches are examples of variables that, if relevant to a particular study, can be found in patient medical records.
Supplemental documentation strategies may be necessary to measure all variations in treatment practiced in participating sites. Each PBE project team determines if supplemental documentation is necessary, identifies variables for inclusion, and defines each variable so that clinicians in all participating centers interpret and use each variable consistently. For example, in PBE studies in rehabilitation settings an important focus of documentation is content of therapy, in addition to duration information that is documented conventionally. In some rehabilitation facilities, physical therapy (PT) may occur in 2 standardized 20 to 30 minute sessions each day; others provide daily hour-long sessions.13 Standardized point-of-care documentation captures details of therapy provided and time can be quantified by activity, session, day, or week as deemed clinically appropriate to account for different practices among centers. Where content of therapy has been measured, it has allowed researchers to uncover significant associations between specific content and better outcomes.8 Such studies move beyond previous work that made no distinctions among therapy components and recorded only total time spent in PT or occupational therapy per day.30
Pilot testing is used to ensure that point-of-care documentation and other chart data collection capture all elements that clinicians believe may affect the outcomes of their patients. If a variable is not measured, it cannot be used for statistical control or as a confounder. PBE studies require effort from all types of clinicians involved with a patient to ensure that data acquisition is as comprehensive as necessary.
Once a PBE project team decides to include supplemental standardized documentation, it must determine the most appropriate process to implement the documentation. These standardizations have been incorporated into a variety of data collection tools, including (1) optical character recognition paper forms that are scanned into a database, (2) paper forms that are key-entered into a web-based system, (3) electronic applications that are loaded to personal digital assistants (PDA), and (4) electronic health records systems.
Selection of a data collection method depends in part on the detail required. For example, a study requiring very detailed process information is better served by a menu-driven PDA application than paper forms, because the latter is typically limited to 1 to 2 pages. Examples of process details included in specific PBE projects are included in Table 1. In the spinal cord injury study, the PDA approach was chosen to allow documentation of many details of functional treatments.18,19 In contrast, the traumatic brain injury project used a paper-based/web-entry system, since less detailed descriptions of therapy content were needed. Finally, the stroke rehabilitation study used an optical character recognition form, which captured required details about poststroke rehabilitation therapies, such as use and timing of feeding,15 early introduction of gait therapy by PT,8,16 and use of atypical antipsychotic medications.8,17 The nursing home pressure ulcer healing project incorporated standardized documentation of pressure ulcer characteristics and treatments into existing electronic health record systems in nursing homes; wound care nurses from sites across the US standardized pressure ulcer assessment documentation (including ulcer length and width, color, drainage, odor, etc.) and treatments, and designed feedback reports based on these data to monitor and improve outcomes.
Project-specific outcome variables are identified and defined by the project team. PBE studies are not limited to assessing a single outcome; they can include multiple outcomes. Where appropriate standardized outcome measures exist, PBE studies can use them to take advantage of industry-wide measures and training programs. For example, the Functional Independence Measure (FIM) is a widely used measure of performance across 13 motor and 5 cognitive areas with “acceptably high” interrater reliability.31–33 Inpatient rehabilitation settings use FIM to determine reimbursement, so employees nationwide routinely receive standardized training in FIM scoring.34 If PBE studies consider functional status an important outcome, improvement in FIM ratings can be used to determine “success” or “failure” of a given intervention without requiring significant additional training or change in documentation for participating sites. FIM also can be a control variable because it is measured routinely on admission to a healthcare setting to identify level of patient disability. Thus, FIM can help identify homogeneous groups of patients on admission for comparison of interventions and outcomes.
The Braden Scale for risk of pressure ulcer development is another industry-wide standard measure that has been used as both a control and an outcome variable in PBE studies. Because the Braden Scale is done at multiple time points, increases in scores are an indicator that interventions are reducing pressure ulcer risk.22–24 A third example is CSI, which may be used as a stratification or control variable on admission as well as an outcome variable (either at discharge or as a longitudinal measure when documented at various time points throughout the episode of care). Changes in CSI scores over time indicate improving (lower severity) or worsening (higher severity) status of patients.
Although PBE teams strive to incorporate as many standard measures as possible, they typically include other outcome measures specific to the study topic. Some outcomes, such as discharge destination (home, community, institution), length of stay, or death are common in administrative databases. However, for project specific outcomes, the project team must determine both the definition and how to capture the variable. Many variables (eg, repeat stroke, deep vein thrombosis, electrolyte imbalance, anemia), are found in traditional chart documentation. Such variables, however, typically are available only up to discharge from the care setting.
For PBE studies that seek to follow patients after discharge, patient reported outcomes (PROs) can provide measures of the influence of inpatient treatment on continued treatment, participation in community-based activities, and other outcomes. Validated PRO instruments such as the Craig Hospital Assessment and Reporting Technique,20,21 36-Item Short Form Health Survey (SF-36),35–37 and Edmonton Assessment Scale,38 are valuable tools where appropriate to the study topic. Additional PRO variables may be defined by PBE project teams and may include utilization variables (eg, hospitalizations, emergency department visits, outpatient therapy), participation variables (involvement in community or religious organizations), and other healthcare-type variables (eg, bowel and bladder management techniques, assistive devices/durable medical equipment, medications).
A PBE study draws data from multiple sources; the goal with each source is to have as little measurement error as possible. If front-line clinicians decide and agree on definitions of variables to be included for data collection, measurement error is less likely than if variables are defined by a researcher alone. Regardless of data source, all variables in a PBE study should be checked for reliability.
For data obtained through medical record abstraction, data abstractors at participating sites are trained through instructive and practice sessions in the process of abstracting project-specific PBE data (eg, CSI severity of illness and other patient, process, and outcome data in medical records). After completion of the training program, the trainer conducts reliability testing by thoroughly reviewing blinded copies of each data abstractor's first 4 charts, as well as a random selection of charts throughout the project. A 90% agreement rate between the data abstracter and trainer is required; if necessary, additional training is provided until 90% agreement is achieved.
For clinician point-of-care documentation, the project team assists discipline teams to write patient scenarios that depict typical treatment sessions and include all aspects of the documentation system relevant to that discipline. The clinician within each discipline involved in the study's point-of-care documentation reads the scenario and documents it as he/she would document an actual patient treatment session. Documentation entries are compared with expected entries drafted by the team that wrote the scenario; again, 90% agreement is expected.
PBE studies use multivariable analyses to identify variables that are most strongly associated with outcomes. Detailed characterization of patients and interventions allows researchers to unravel relationships that might not otherwise become apparent. CSI (overall or its components) is used in data analysis to balance the effect of comorbid and co-occurring conditions with the principal diagnosis. If a positive outcome is found to be associated with a specific intervention or combination of interventions, the subsequent statistical approach is to add or remove confounding patient variables or combinations of variables in an attempt to make that association disappear. The association may remain robust or other variables may be identified that explain the outcome more adequately. Interactions between interventions or between an intervention and specific patient characteristics are explored through multivariable statistical analyses (multiple regression, Cox proportional hazards regression, hierarchical regression, and other models); these associations may not be uncovered using less detailed or less comprehensive databases. Large numbers of patients (usually >1000) and considerable computing power are required to perform PBE analyses. When multiple outcomes are of interest and there is little information on effect size of predictor variables, sample size is based on the project team's desire to find small, medium, or large effects of patient and process variables on outcomes.39
The PBE approach does not infer causality directly; strength of evidence is built through the research process: (1) alternative hypotheses regarding possible cause and effect are tested using the large number of available variables to identify mediating and moderating influences on outcomes. Results can be used to eliminate potential hypotheses regarding causality, and to generate additional specific analytic questions. Analyses continue until the project team cannot suggest any other variables to explain outcomes, and the overall equation explanatory power is higher than in previous known research; (2) predictive validity of significant findings can be ascertained by introducing findings into clinical practice and evaluating their outcomes—when treatments change, do outcomes change as predicted? For example, in a PBE study to prevent pressure ulcers in “at-risk” nursing home residents, findings of better practices to treat incontinence (using disposable briefs), weight loss (using standard medical nutritional supplements), and disruptive behaviors (using combination of newer selective serotonin reuptake inhibitors and antipsychotics) were implemented and were followed by a significant decrease in development of pressure ulcers22,25; (3) studies are repeated in different healthcare settings and essentially have the same findings.
PBE methodology addresses comparative effectiveness by creating a comprehensive set of patient, treatment, and outcome variables, and analyzing them to identify treatments that are associated with better outcomes for specific types of patients. A PBE study depends heavily on knowledge of front-line practicing clinicians who design the PBE study and inform project analyses. They base project decisions on initial study hypotheses, additional hypotheses that develop during the study, literature reviews, and their training and clinical experience. This multidisciplinary approach ensures inclusion of a wide spectrum of variables so that heterogeneity among patients, treatments, and outcomes is measured and controlled. Inclusion of heterogeneous characteristics helps to reflect patient populations seen in routine clinical practice and to determine which of the multiple arrays of treatments provided to these patients are associated with better outcomes. Measurement is a key; the goal of PBE studies is to have as few unmeasured confounders as possible. For all measured variables, it is critical to minimize measurement error and missing documentation. After data collection begins, the project team tests reliability of data collected and provides retraining in areas where reliability is deficient.
PBE studies and RCTs can be considered complementary study designs, with design choice optimized to answer important questions about what works in health care. Findings from PBE studies are easily translated into practice because the studies include a wide range of patient types and also a wide range of treatments used for a specific condition. PBE studies also can be used to produce evidence needed before investing major resources in a RCT, since they “[focus] on actionable findings that can be implemented to improve effectiveness of care.”4 Thus, PBE studies can be both hypothesis-testing and hypothesis-generating.
Although not as expensive as most large RCTs, large PBE studies can be costly, since many variables are obtained from abstraction of existing paper medical records or from new standardized documentation. However, PBE studies can answer many more questions than most RCTs that are powered to predict a single outcome. And, as more healthcare systems move toward electronic records, acquisition of data may become easier and less costly.
Observational studies are hardly new, but there is growing recognition of their importance in medical research, provided they contain sufficient details about patient and process differences. Their resurgence has been facilitated by advances in severity of illness measurement, statistical methods, and computing power that have elevated observational designs in their degree of rigor and in the amount and type of information that can be gleaned from them.1 Pragmatic clinical trials were described recently as “blue highways on the NIH roadmap”3 with major highways, typically depicted in red on maps, analogous to RCTs. Blue lines extend from highways to communities where patients live and medical care typically is administered. Each road type, like each trial design, has its unique role and importance, but all must interconnect to provide a complete picture. Given inadequate and fragmented evidence to support clinical practice in complex care environments with diverse populations, along with recent developments in nonrandomized research methodologies and growing recognition of their value, we propose that the next step to advance medical science and comparative effectiveness research is to conduct more prospective large scale observational cohort studies with the rigor described here for PBE studies. This rigor provides controlled measurement of outcomes related to multiple interventions and a variety of patient characteristics in diverse clinical settings.
Because healthcare is complex, study designs other than RCTs are needed to determine what works best for specific patient types and provide clinicians with a rational basis for treatment recommendations for individual patients. There is not enough money or time to examine each treatment step or combination of treatments using RCTs. PBE study designs provide a holistic picture of patients, treatments, and outcomes with no preset limits to the number of variables that can be included. PBE studies can address priority areas for comparative effectiveness research funded by government agencies and others and support buy-in and dissemination by clinical providers. Such an approach is needed for high quality comparative effectiveness research.
1. Horn S, DeJong G, Ryser D, et al. Another look at observational studies in rehabilitation research: going beyond the holy grail of the randomized controlled trial. Arch Phys Med Rehabil. 2005;86(12 suppl 2):S8–S15.
2. Berwick D. The John Eisenberg lecture: health services research as a citizen in improvement. Health Serv Res. 2005;40:317–336.
3. Westfall J, Mold J, Fagnan L. Practice-based research—“blue highways” on the NIH roadmap. JAMA. 2007;297:403–406.
4. Horn S, Gassaway J. Practice-based evidence study design for comparative effectiveness research. Medical Care. 2007;45(10 suppl 2):S50–S57.
5. Garber A, Tunis S. Does comparative-effectiveness research threaten personalized medicine? New Engl J Med. 2009;360:1925–1927.
6. Willson D, Horn S, Hendley J, et al. The effect of practice variation on resource utilization in infants hospitalized for viral lower respiratory illness (VLRI). Pediatrics. 2001;108:851–855.
7. Willson D, Landrigan C, Horn S, et al. Complications in infants hospitalized for bronchiolitis or respiratory syncytial virus pneumonia. J Pediatr. 2003;143(suppl 5):S142–S149.
8. Horn S, DeJong G, Smout R, et al. Stroke rehabilitation patients, practice, and outcomes: is earlier and more aggressive therapy better? Arch Phys Med Rehabil. 2005;86(12 suppl 2):S101–S114.
9. Gassaway J, Horn S, DeJong G, et al. Applying the clinical practice improvement approach to stroke rehabilitation: methods used and baseline results. Arch Phys Med Rehabil. 2005;86(12 suppl 2):S16–S33.
10. Connor S, Horn S, Smout R, et al. The National Hospice Outcomes Project (NHOP): development and implementation of a multi-site hospice outcomes study. J Pain Symptom Manag. 2005;29:286–296.
11. Antonow J, Smout R, Gassaway J, et al. Variation among ten pediatric hospitals: sepsis evaluations for infants with bronchiolitis. J Nurs Care Qual. 2001;15:39–49.
12. Horn S. Real-world efficacy: alternative evidence-based study designs as measures of comparative effectiveness. Drug Discov Dev. 2007;10:30–32.
13. DeJong G, Hsieh C, Gassaway J, et al. Characterizing rehabilitation services for patients with knee and hip replacement in skilled nursing and inpatient rehabilitation facilities. Arch Phys Med Rehabil. 2009;90:1269–1283.
14. Tunis S, Stryer D, Clancy C. Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy. JAMA. 2003;290:1624–1632.
15. James R, Gines D, Menlove A, et al. Nutrition support (tube feeding) as a rehabilitation intervention. Arch Phys Med Rehabil. 2005;86(12 suppl 2):S82–S92.
16. Latham N, Jette D, Slavin M, et al. Physical therapy during stroke rehabilitation for people with different walking abilities. Arch Phys Med Rehabil. 2005;86(12 suppl 2):S41–S50.
17. Conroy B, Zorowitz R, Horn S, et al. An exploration of central nervous system medication use and outcomes in stroke rehabilitation. Arch Phys Med Rehabil. 2005;86(12 suppl 2):S73–S81.
18. Gassaway J, Whiteneck G, Dijkers M. Clinical taxonomy development and application in spinal cord injury rehabilitation research: the SCIRehab Project. J Spinal Cord Med. 2009;32:260–269.
19. Whiteneck G, Dijkers M, Gassaway J, et al. SCIRehab: a new approach to study the content and outcomes of spinal cord injury rehabilitation. J Spinal Cord Med. 2009;32:251–259.
20. Whiteneck G, Brooks C, Charlifue S, et al, eds. Guide for Use of CHART: Craig Hospital Assessment and Reporting Technique. Englewood, CO: Craig Hospital; 1992.
21. Whiteneck GG, Charlifue SW, Gerhart KA, et al. Quantifying handicap: a new measure of long-term rehabilitation outcomes. Arch Phys Med Rehabil. 1992;73:519–526.
22. Horn S, Bender S, Ferguson M, et al. The National Pressure Ulcer Long-term Care Study (NPULS): pressure ulcer development in long-term care residents. J Am Geriatr Soc. 2004;52:359–367.
23. Bergstrom N, Horn S, Smout R, et al. The National Pressure Ulcer Long-Term Care Study (NPULS): outcomes of pressure ulcer treatments in long-term care. J Am Geriatr Soc. 2005;53:1721–1729.
24. Bergstrom N, Smout R, Horn S, et al. Stage 2 pressure ulcer healing in nursing homes. J Am Geriatr Soc. 2008;56:1252–1258.
25. Horn SD, Sharkey SS, Hudak S, et al. Pressure ulcer prevention in long term-care facilities: a pilot study implementing standardized nurse aide documentation and feedback reports. Adv Skin Wound Care. 2010;23:120–131.
26. Averill R, McGuire T, Manning B, et al. A study of the relationship between severity of illness and hospital cost in New Jersey hospitals. Health Serv Res. 1992;27:587–617.
27. Horn S, Sharkey P, Buckle J, et al. The relationship between severity of illness and hospital length of stay and mortality. Med Care. 1991;29:305–317.
28. Ryser D, Egger M, Horn S, et al. Measuring medical complexity during inpatient rehabilitation following traumatic brain injury. Arch Phys Med Rehabil. 2005;86:1108–1117.
29. Willson D, Horn S, Smout R, et al. Severity assessment in children hospitalized with bronchiolitis using the pediatric component of the Comprehensive Severity Index (CSI). Ped Crit Care Med. 2000;1:127–132.
30. Heinemann A, Hamilton B, Linacre J, et al. Functional status and therapeutic intensity during inpatient rehabilitation. Am J Phys Med Rehabil. 1995;74:315–326.
31. Fielder R, Granger C. Functional independence measure: a measurement of disability and medical rehabilitation. In: Chino N, Melvin JL, eds. Functional Evaluation of Stroke Patients. Tokyo, Japan: Springer-Verlag; 1996:75–92.
32. Hamilton BB, Laughlin JA, Fiedler RC, et al. Interrater reliability of the 7-level functional independence measure (FIM). Scand J Rehabil Med. 1994;26:115–119.
33. Heinemann AW, Linacre JM, Wright BD, et al. Prediction of rehabilitation outcomes with disability measures. Arch Phys Med Rehabil. 1994;75:133–143.
34. UB Foundation Activities Inc. Section III—-the FIM instrument: underlying principles for use of the FIM™. In: IRF-PAI Training Manual. Buffalo, NY: UB Foundation Activities Inc; 2004.
35. MacKenzie EJ, McCarthy ML, Ditunno JF, et al. Using the SF-36 for characterizing outcome after multiple trauma involving head injury. J Trauma. 2002;52:527–534.
36. Findler M, Cantor J, Haddad L, et al. The reliability and validity of the SF-36 health survey questionnaire for use with individuals with traumatic brain injury. Brain Inj. 2001;15:715–723.
37. Corrigan JD, Smith-Knapp K, Granger CV. Outcomes in the first 5 years after traumatic brain injury. Arch Phys Med Rehabil. 1998;79:298–305.
38. Chang VT, Hwang SS, Feuerman M. Validation of the Edmonton Symptom Assessment Scale. Cancer. 2000;88:2164–2171.
39. Cohen J, ed. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates Inc; 1988.
© 2010 Lippincott Williams & Wilkins, Inc.