THIS ARTICLE, the second in a series of articles designed to acquaint advanced practice (AP) emergency nurses with essential knowledge and skills for implementing evidence-based practice (EBP), focuses on the process of critical appraisal, a vital step in harvesting knowledge from the literature to guide the EBP change. In this article, we purposefully differentiate critical appraisal from research critique and illustrate how it relates to the implementation of EBPs. Expanding on the framework developed by Craig and Smyth (2007), this article provides tools to assist the AP emergency nurse in answering three fundamental questions: (a) When is the evidence good enough to use the results? (b) Are the findings applicable in my setting? and (c) If I adopt this practice, what will it mean to my patients? The article ends with an introduction to the process of synthesizing the evidence, determined through appraisal to be important and useful, into a proposed EBP change.
The first article in this series (Shapiro, 2007) described the essential components of the EBP process, that is, defining the problem; searching, critically appraising, and synthesizing the relevant evidence; identifying the gaps between the current practice and the preferred practice; and designing, implementing, and evaluating a small test of the change in practice. Most AP nurses are taught the fundamentals of research critique in their graduate program, but although research critique and critical appraisal are closely related, these two processes are materially different. Research critique, as the name implies, is concerned primarily with the internal and external validity of research reports in an effort to systematically determine the quality of the study and the credibility of the results. The focus of research critique is to identify all possible sources of bias and threats to internal and external validity in general, whereas the aim of critical appraisal is to assess the general strength of the evidence in relation to a particular clinical practice issue. Critical appraisal focuses on the applicability of the reported findings in a particular practice setting, whereas research critique focuses on the methodological integrity of the reported study and the broad usefulness of the results across many settings. Whereas research critique is a pivotal step in examining the strength of a scientific source of evidence, critical appraisal is the first step in the EBP process of synthesizing a body of evidence to identify an evidence-based, best practice solution to a specific clinical problem.
Despite the differences in focus and scope between research critique and critical appraisal, many of the skills learned in research critique coursework are directly applicable to critically appraising evidence about new or improved healthcare practices. As will be seen, determining whether the reported evidence is “good enough” to use relies heavily on assessments of the strength and limitations of research design, sample size and selection, and the validity and reliability of measurement tools. However, this is just one part of the critical appraisal process. The subsequent steps, that is, the applicability of the research results in a particular practice setting, the feasibility and fit of the intervention, and the unique considerations related to patient population and setting, further distinguish the critical appraisal process from research critique.
THE CRITICAL APPRAISAL PROCESS
Is the Evidence Good Enough to Use?
The first challenge to the AP emergency nurse embarking on the critical appraisal of a body of evidence is to separate research evidence from clinical reports. This has become much more challenging in recent years as more and more clinicians are publishing the results of their own unit or facility-based performance improvement (PI) projects, and as the designs of these projects—along with the statistics used to measure their impact—have become more sophisticated (Newhouse, Pettit, Poe, & Rocco, 2006; Shojania & Grimshaw, 2005). However, there are differences between research studies and PI projects that cover issues ranging from investigator intent, through study design and analysis, to human subjects considerations (Cassaret, Karlawish, & Sugarman, 2000; Newhouse et al., 2006). Briefly, research studies are designed to test new knowledge or confirm accepted knowledge, with a goal of producing generalizable results. While there may be other valid sources of evidence such as hospital- or unit-level PI projects, clinical narratives, or expert opinions (Melnyk & Fineout-Overholt, 2005), this article focuses on the critical appraisal process as applied to research results.
To critically appraise a research article, the AP emergency nurse must understand the key concepts related to research design and analysis. Some of these are described briefly in Appendix 1 and can be referred to throughout this article. Critical appraisal starts with individual articles and reports with a goal of assembling a body of evidence related to a single practice issue. The strength of the evidence is gauged on the basis of the study design itself, the methods used to conduct the study, and the way the data are analyzed and reported. Certain research designs such as the randomized controlled trial (RCT) are considered inherently strong because the design itself limits as many sources of bias as possible, including selection bias and investigator bias. The rigor of the RCT design maximizes the internal validity of a study, making it more likely that the results reported were due to the intervention(s) being tested and not to some other extraneous factor such as the way subjects were assigned to the treatment or control group. The oft-cited hierarchy of evidence (Figure 1) is a visual representation of how well various designs do in limiting possible sources of bias. In general, prospective studies and those with control groups do a better job in controlling bias than observational or retrospective studies; however, that does not mean that observational or retrospective studies are without value in building a body of evidence. Depending on the research question, it may not be ethically or practically possible to conduct an RCT. Furthermore, a well-designed, well-conducted prospective cohort study may provide more useful evidence than that provided by a poorly designed RCT.
Reading research reports takes practice as well as a basic understanding of design and statistics. There are many resources available to guide the AP emergency nurse through the process of critical appraisal, three of which (Craig & Smyth, 2007; Crombie, 1996; Melnyk & Fineout-Overholt, 2005) the author has used in teaching these skills, and there are many more available online. Tools and guides such as these provide a template for a thorough appraisal of each section of a study in greater detail than is possible in an article of this length. However, there are two important aspects of the critical appraisal that deserve special attention. The first is the internal consistency of the report; that is, did the author(s) answer the question(s) they said they were going to? Every research study states the purposes, aims, questions, and/or hypotheses the investigators seek to answer/clarify, and every other part of the study, from design through conclusions, and especially the statistical analyses, should refer back to, and be consistent with, those initial statements. Some questions that might be asked in relation to this are as follows: (a) Did the investigators measure what they said they were going to measure, and were those measurements valid and reliable? (b) Were the analyses appropriate to the kinds of data that were collected? (c) Were the conclusions reasonable given the reported results, and were they appropriately connected to the initial purposes/aims/questions? Our experience has been that staff nurses in clinical practice and their AP colleagues can typically answer these questions well with the insight and expertise gleaned from their rich clinical background.
The second important aspect of any research evidence being used to inform practice is an assessment of the clinical significance, or impact, of the findings. All research reports include statements regarding the statistical significance of their findings, but this only refers to the probability that their findings were due to the effects of the intervention rather than due to chance alone. Perhaps more important is some estimate of the clinical significance or impact on patients/clinical operations. This is discussed more fully below, but it is only recently that measures of clinical impact have begun appearing in the literature. Advanced practice emergency nurses may not be familiar with some of these measures, such as risk ratios, odds ratios, and numbers-needed-to-treat (see Appendixes 1 and 2), but it is worth taking the time to learn about them; in some ways they are easier to understand conceptually than p values and statistical significance. At the very least, readers of research should always ask themselves the “so what?” question whenever presented with statistically significant results, for example, is a statistically significant drop in systolic blood pressure of 4 mmHg clinically important? Is a decrease in throughput of 5 min important to our patients? If the answer to the question is that the difference is clinically or operationally inconsequential, then regardless of the statistical significance, it may not be worth making any change in the practice. If the evidence is limited, but the reported benefit to patients is high and the risk of implementation, from both institutional and patient perspectives, is low, then careful adoption of a practice change with concurrent evaluation might be clinically important and support EBP quality improvement.
Clearly, there is no perfect study: the pragmatics of research are such that even the best-designed studies, geared to answer important research questions, will be imperfect. Thus it is important to review and consider a cluster of studies that constitutes a body of evidence, rather than a single study, when considering the evidence to support a possible clinical practice change. Fortunately, there are many organizations currently engaged in critically reviewing bodies of evidence on a variety of nursing practice issues, and distilling them into clinical practice guidelines or authoritative recommendations. As reported previously (Shapiro, 2007), these include organizations such as the Cochrane Collaborative (2007), the Agency for Healthcare Research and Quality (2007), specialty nursing organizations, and many others. That is good news for AP emergency nurses. The caution here is that it is as important to critically appraise published guidelines as it is to appraise individual studies; fortunately, there are resources available to help with this as well (Grading Working Group, 2004; Guyatt, Gutterman, et al., 2006; Guyatt, Vist, et al., 2006; Helfand, 2005).
Are the Findings Applicable in My Setting?
Along with evaluating the study design and results, the AP emergency nurse needs to ask whether the reported results are truly applicable to his or her practice setting. Whereas the first question focused mostly on the internal validity of the study, the question of applicability to a particular clinical practice is an assessment of the external validity of a study, or the extent to which the findings from the study can reasonably be transferred to a different setting and expected to produce similar results. There are several factors that influence the generalizability or external validity of a study, including where the study sample came from, how complex the intervention was, and how similar the study environment was to the environment in which the practice change will occur.
Similarity of the Sample
There are very few health-related studies that occur with a random sample of subjects; almost all published studies are conducted on a convenience sample of one sort or another. A truly random sample means that all people in the entire population being studied had an equal probability of being included in the study sample. Because most studies take place in just a few sites, not every possible potential subject from all possible sites has an equal probability of being chosen to participate. Thus it is important to identify where the study sample came from, and to determine how that pool of eligible subjects might be similar or different from the patients at your own facility. Consider, for example, how similar or different the following patients may be to your own: patients at Veteran's Administration facilities; patients in rural or frontier settings; patients from ethnically diverse, densely populated urban settings; patients from Indian Health Services facilities. Each of these groups of patients likely has characteristics that are systematically different from those of other subpopulations, limiting the extent to which findings from such samples may be broadly generalized. In addition, some eligible patients will choose to participate in a study, and others will decline for a wide variety of reasons. Thus it is a very particular sample of patients from a limited group of eligible patients that participates in any clinical research study, making it important to determine just how similar the setting and sample were to one's own clinical setting and patients. On the other hand, once a subject is enrolled in a study, especially one that is testing an intervention using comparison groups, truly random assignment to intervention or control groups effectively reduces the sources of bias that might, ultimately, impact the results of the study. Critical appraisal, then, considers how the study sample was selected, how subjects were assigned to study groups, and how the characteristics of the subjects affect the applicability of the results to another practice setting.
Complexity of the Intervention
The complexity of a study intervention is another factor to be considered in a critical appraisal. Rogers (2003) includes both complexity and compatibility as factors that influence the attitude of people toward a proposed change. Highly complex interventions, or those that are either new to practice or radically different from previous practice, are inherently more difficult to implement. One of the recognized limitations of randomized trials, for example, is that the interventions are usually highly circumscribed and precisely implemented. A great deal of effort is paid to ensure that all study participants receive the intervention or control in precisely the same way by the study personnel who implement it according to strict guidelines. While this increases the internal validity of the study, it limits the external validity, as it is not likely that such precision will be achieved in any “real world” practice setting. The complexity of the intervention may also be related to its cost, and although evaluating cost is not traditionally included in the critical appraisal process, when one is undertaking a literature review and synthesis with the intent of implementing a practice change, cost will always be an important consideration.
Similarity in Practice Settings
The applicability of study findings to one's own setting begins with both the similarity of the study sample to one's own patients and the complexity of the proposed practice change but also includes system similarities such as staffing patterns (including nurses, physicians, and other staff); payor mix; emergency medical services system configuration; and so forth. What system attributes need to be considered depends on the nature of the proposed practice change. The AP emergency nurse is in an ideal position to synthesize all these elements into an overall appraisal of the feasibility of applying the study findings into his or her own practice setting.
What the Practice Change Would Mean to Patients
Both Craig and Smyth (2007), and Melnyk and Fineout-Overholt (2005) spend considerable time addressing the appraisal of the clinical impact of a study. It is unfortunate that so few investigators include these measures in their reports as they provide numerical evidence of the impact of the intervention on patient outcomes. Ideally, whenever an investigator achieves statistically significant findings, that is, the outcomes were likely due to the intervention rather than chance, he or she should proceed to calculate a measure of clinical impact. Appendix 2 illustrates how a 2 × 2 contingency table may be used to calculate measures such as the risk ratio (also known as relative risk), odds ratios, and numbers needed to treat (NNT). The calculations involve only simple arithmetic, yet the results are powerful, and if such measures are not included in a research report, the AP emergency nurse may want to develop the table on his or her own, and at least calculate the NNT; the smaller the NNT, the greater the positive impact on practice.
The importance of this component of the critical appraisal cannot be overemphasized. Once one recognizes the difference between statistical significance and clinical impact, one begins to question the wisdom of adopting any change in practice unless the clinical impact can be demonstrated as consistently as the statistical significance. This is especially true when the proposed change is complex or radically different. Such radical, complex, or expensive interventions can be justified only when the clinical impact has been clearly demonstrated in the body of literature and can be effectively reproduced in your clinical setting.
Synthesizing the Evidence
Critically appraising individual studies, systematic reviews, and EBP guidelines takes time, but it is a relatively straightforward skill that can be learned through repetition and practice. Synthesis, on the other hand, is a process that requires a high degree of immersion in the body of literature and a deep understanding of the strengths and limitations of the current evidence (Shapiro, 2007).
It is very helpful to begin the synthesis process by constructing an evidence table (Shapiro, 2007). The elements of the table—which form the columns—may vary somewhat, but they should include some identification of the study and the purpose; the methods, including sample selection, intervention, measurements, and analyses; results, including final sample size and clinical impact; conclusions, uniting the purpose of the study with the results and placing those results in the current science; and limitations, both those identified by the authors and those identified through the appraisal process (Table 1). The appraiser enters both a description of each section of the report as well as his or her assessment of the strengths and limitations in each column. The final column should be the appraiser's overall assessment of the study's quality and its applicability to the practice problem being addressed. As more and more evidence is entered into the table, it will be easier to see patterns of evidence, and which practices are being reported as most successful in specific settings. An example of an entry into an evidence table is provided in Table 2.
Evidence tables facilitate the synthesis process but do not complete it. Synthesis is actually high-level thinking in which patterns, trends, and themes emerging from a group of studies are viewed as a whole. Synthesis requires time to reflect on the evidence that has been assembled, to weigh each contribution carefully, and to identify patterns that emerge from among the many studies reviewed. The end product of the synthesis process is the identification of the most highly supported evidence-based practice; included in this synthesis is the potential benefit of adopting the identified change into one's own practice setting. The AP emergency nurse who has embarked on an EBP change becomes the expert on the topic—the person in that setting with the most current knowledge of the best available evidence. This acquired expertise, combined with an intimate knowledge of the practice setting, patients, and staff, makes the AP emergency nurse precisely the right person to champion the next step in the process, a small test of change to evaluate the proposed practice in the real-world work environment.
CRITICAL APPRAISAL AND EVIDENCE-BASED PRACTICE
Evidence-based practice is defined as the conscientious, explicit, and judicious use of the best available evidence, in combination with the professional's clinical expertise and the patient's preferences, in making decisions about care (Craig & Smyth, 2007; Jennings & Loan, 2001; Melnyk & Fineout-Overholt, 2005; Sackett, Straus, Richardson, Rosenberg, & Haynes, 2000). The critical appraisal process focuses on finding the best available evidence, not just in terms of the scientific strength but also in terms of the feasibility of implementing that practice and the likely results for patients and operations.
The focus of this discussion of critical appraisal has been limited to the appraisal of the published research. However, there are many other possible sources of evidence available to AP emergency nurses embarked on an EBP project (Melnyk & Fineout-Overholt, 2005). These include published reports of unit-based performance improvement projects, expert opinion, and clinical narratives, as well as evidence gleaned from the patient's medical and nursing history, physical examination, and test results. Advanced practiced emergency nurses and their clinical colleagues would do well to sharpen their critical appraisal of all the evidence they use, relying on the three questions that began this article, that is, is the evidence good enough to use? Are the results applicable in my setting? And what will the results mean to my patients? Taking time to answer these questions will help ensure that the care we provide is based on evidence that is valid and reliable, thus improving consistency in practice and the predictability of outcomes.
The AP emergency nurse contributes the professional expertise referred to in the definition of EBP. Some proposed practice changes, for example, the introduction of “needleless” intravenous locking devices and needle-guarded intravenous catheter devices, may be so obvious that they should be implemented ahead of any formal testing (Smith & Pell, 2003). The AP emergency nurse is in an ideal position to make such a decision, on the basis of the issues and the likely impact on clinical and system outcomes. Advanced practice emergency nurses have the opportunity to collaboratively engage their department leaders, in both medicine and nursing, in the search for the answers to questions regarding best practices for patient care and system operations. The practices most likely to succeed will be those identified by critically appraising and synthesizing the extant evidence for its strength, applicability to the particular practice setting, and likely impact on patient and system outcomes. Once that process is complete, the AP emergency nurse will be ready to embark on a small test of change—the topic to be addressed in the next article in this series.
Agency for Healthcare Research and Quality. (2007). About national guidelines clearinghouse.
Retrieved May 28, 2007, from http://www.guideline.gov/about/about.aspx
Cassaret, D., Karlawish, J. H. T., & Sugarman, J. (2000). Determining when quality improvement initiatives should be considered research: Proposed criteria and potential implications. JAMA
The Cochrane Collaborative. (2007). Retrieved June 28, 2007, from http://www.cochrane.org/index.htm
Craig, J. V., & Smyth, R. L. (2007). The evidence-based practice manual for nurses
(2nd ed.). Edinburgh: Churchill Livingston Elsevier.
Crombie, I. (1996). The pocket guide to critical appraisal
. London: BMJ Publishing Group.
Grading Working Group. (2004). Grading quality of evidence and strength of recommendations. British Medical Journal
Guyatt, G., Gutterman, D., Baumann, M. H., Addrizzo-Harris, D., Hylek, E. M., Phillips, B., et al. (2006). Grading strength of recommendations and quality of evidence in clinical guidelines. Chest
Guyatt, G., Vist, G., Falck-Ytter, Y., Kunz, R., Magrini, N., & Schunemann, H. (2006). An emerging consensus on grading recommendations? American College of Physicians Journal Club
Helfand, M. (2005). Using evidence reports: Progress and challenges in evidence-based decision making. Health Affairs, 24
Jennings, B. M., & Loan, L. A. (2001). Misconceptions among nurses about evidence-based practice. Journal of Nursing Scholarship
Melnyk, B., & Fineout-Overholt, E. (2005). Evidence-based practice in nursing & health
. Philadelphia: Lippincott Williams & Wilkins.
Newhouse, R. P., Pettit, J. C., Poe, S., & Rocco, L. (2006). The slippery slope: Differentiating between quality improvement and research. Journal of Nursing Administration
Motulsky, H. (1995). Intuitive biostatistics
. New York: Oxford University Press.
Polit, D.F., & Hungler, B.P. (1999). Nursing research: Principles and methods
(6th ed.). New York: Lippincott.
Rogers, E. (2003). Diffusion of innovations
(5th ed.). New York: Free Press.
Sackett, D. L., Straus, S. E., Richardson, W. S., Rosenberg, W., & Haynes, R. B. (2000). Evidence based medicine: How to practice and teach EBM
(2nd ed.). London: Churchill Livingston.
Shapiro, S. E. (2007). Evidence-based practice for advanced practice emergency nurses. Advanced Emergency Nursing Journal
Shojania, K. G., & Grimshaw, J. M. (2005). Evidence-based quality improvement: The state of the science. Health Affairs, 24
Smith, G. C. S., & Pell, J. P. (2003). Parachute use to prevent death and major trauma related to gravitational challenge: Systematic review of randomised controlled trials. British Medical Journal
Valanis, B. (1999). Epidemiology in health care
(3rd ed.). Stamford, CT: Appleton & Lange.
Appendix 1: Glossary of Some Important Concepts in Research Design and Statistics1
Bias: Anything that could distort the results of a study, reducing the likelihood that the findings are “true,” that is, due to the intervention/treatment/exposure. Different kinds of bias can reduce internal or external validity (see below).
Confidence interval: A range of values within which a population parameter, for example, a mean, is estimated to lie with a certain probability, for example, 95%. So a mean of 3 with a confidence interval of 1–6 (written “mean = 3 [95% CI, 1–6]” means that the authors are 95% confident that the true value of the mean lies somewhere between 1 and 6. This value is calculated from the variability of the sample around the mean.
Correlations: The degree to which the measurement of one variable changes in relationship to another. Correlations do not imply causation; because two variables are correlated does not mean that one is causing the other.
Efficacy: The extent to which an intervention produces the desired effect under ideal conditions, for example, a randomized controlled trial.
Effectiveness: The extent to which an intervention produces the desired effect in actual practice, for example, pre- and post-implementation studies in a clinical practice setting.
External validity: Aspects of design that make it more likely the results from one study can be applied to a different sample in a different setting and obtain similar results.
Internal validity: The likelihood that the results obtained in a study are due to the treatment, and not to some other factor. Better research designs have stronger internal validity.
Number needed to treat: NNT is a numerical estimate of the clinical impact of an intervention; it is the number of people who would need to have the intervention in order for one more person to be helped than would have been helped without the intervention. The smaller the NNT, the more effective the intervention.
p Value: The probability that the results obtained were due to chance alone. A p value of <.05 means there is less than a 5% probability that the results were due to chance alone, leaving a 95% probability that the results were due to the intervention.
Risk Ratio (Relative Risk)/Odds Ratio (RR/OR): Measures derived from classic 2 × 2 contingency tables describing risks/odds of an outcome, given an exposure/intervention. If the outcome is desirable, for example, survival, then RR/OR greater than 1 is desirable. If the outcome is undesirable, for example, death, then RR/OR less than 1 is desirable.
Statistical significance: A statement indicating that the reported results are unlikely to have occurred by chance, with some level of probability (see “p value,” above, for example). This is not the same as clinical significance, which is a matter of judgment on the part of a clinician.
Appendix 2 Classic 2 × 2 Contingency Table and Associated Measures2
Measures of effectiveness
Risk ratio/relative risk: [a/(a + b)]/[(c/(c + d)] (Risk of a good outcome among those in the intervention group divided by the risk of a good outcome among those in the control group). Applicable only where the incidence of the outcome is known, for example, randomized trials, cohort studies, and so forth.
Odds ratio:ad/bc (Odds of a good outcome among those in the intervention group divided by the odds of a good outcome among those in the control group). Odds ratios are used in case-control studies where it is not possible to calculate a relative risk. When the disease is rare (that is, prevalence is low), it is a good estimate of the risk ratio.
Number needed to treat: The number of people who would need to have the intervention in order for one more person to be helped than would have been helped without the intervention. Calculating NNT is a multistep process:
- Calculate absolute risk reduction (ARR) or absolute risk increase = the risk of developing the desired outcome among those in the intervention group minus the risk among those in the control group = [a/(a + b)] − [c/(c + d)]
- NNT = 1/ARR
Example of calculating measures of clinical impact
Measures of effectiveness
Risk ratio: [a/(a + b)]/[c/(c + d)] = (28/123)/(7/127) = 3.83
That is, people in the intervention group have a 3.83 times greater “risk” of the desired outcome than those in the control group.
Number needed to treat:
Step 1: the absolute risk reduction: [a/(a + b)] – [c/(c + d)] = 0.23 − 0.06 = 0.17
Step 2: NNT = 1/ARR = 1/0.17 = 5.88
That is, you would need to expose six people to the intervention before one more person had the desired outcome than would have had it without the intervention. This would be a very successful intervention, indeed!
1Adapted from Polit and Hungler (1999), Melnyk and Overholt (2005), Motulsky (1995), and Valanis (1999).
2Adapted from Melnyk and Fineout-Overholt (2005), and Motulsky (1995).