Secondary Logo

Journal Logo


Grading Evidence for Practice

Shapiro, Susan E. RN, PhD

Author Information
Advanced Emergency Nursing Journal: January 2010 - Volume 32 - Issue 1 - p 59-67
doi: 10.1097/TME.0b013e3181cad4ec
  • Free


THE Advanced Emergency Nursing Journal recently published a series of articles describing the steps of the evidence-based practice (EBP) process (Shapiro, 2007; Shapiro & Donaldson, 2008a, 2008b). Although these help explain the steps involved in the process—from identifying the need for practice improvement through planning, implementing, and evaluating a small test of change, they do not address the tough questions about whether and to what extent the evidence should be applied to practice. The purposes of this article are to summarize some of the current literature on evaluating evidence for practice and to provide advanced emergency nurses with a deeper understanding of how evidence rating systems and grading recommendations can be useful in evaluating evidence for practice.


Evans (2003) provides a good summary of the early efforts to evaluate published research evidence. Initial efforts resulted in the kinds of “hierarchies of evidence,” which were described in the article by Shapiro and Donaldson (2008a), focusing primarily on single reports of quantitative research studies, although systematic reviews and meta-analyses are included. With the widespread adoption of EBP principles into medicine and nursing over the past 15–20 years came the need for more robust structures for evaluating bodies of evidence that might be used to inform practice, especially with an eye toward developing best practice recommendations or guidelines. Emergency nurses—advanced and otherwise—are probably most familiar with this process through the Advanced Cardiac Life Support (ACLS) guidelines by the American Heart Association (AHA, 2005). Advanced life support procedures change regularly because published research evidence yields sometimes surprising or conflicting results. As a result, ACLS guidelines are updated regularly and the recommendations are now accompanied by detailed explanations of the evidence basis on which changes are being recommended (AHA, 2005).

There have been many articles recommending different systems for grading evidence, including published practice guidelines. In an early review, the Agency for Healthcare Research and Quality (AHRQ, 2002) found more than 100 articles describing various systems for evaluating systematic reviews and performance guidelines, all published before 2002. Other authors have proposed a general approach to grading evidence or practice guidelines (The AGREE Collaboration, 2001; Grade Working Group, 2004; Guyatt et al., 2006), or a more specialized approach for different forms of evidence such as clinical decision rules (McGinn et al., 2000), or a specific population such as nephrology (Uhlig et al., 2006). Within nursing, examples include Evans's proposed framework (2003), which describes a multidimensional approach to grading evidence; the newly revised grading criteria from the American Association of Critical-Care Nurses (AACN; Armola et al., 2009); and the elegant framework used by the Joanna Briggs Institute (JBI, 2009), an international nursing collaborative that conducts and publishes systematic reviews of evidence for nursing practice (JBI, 2008).

As Evans (2003) explains, these grading schemes are all attempts to simplify communication of a very complex process: critically appraising and synthesizing the evidence to make recommendations for practice. He also identified major shortcoming of the schemes proposed in the medical literature, that is, the exclusive focus on the effectiveness of an intervention with the resulting reliance on results from quantitative studies, most especially from randomized controlled trials (RCTs). By essentially eliminating the use of evidence obtained in observational and interpretive studies, many of the early or medically focused grading schemes ignore evidence of the appropriateness, meaningfulness, or feasibility of implementing the reported practice in a different setting. In fact, when it comes to translating evidence into practice, results from RCTs may actually prove to be less reliable than these other kinds of research that are more grounded in the actual practice environment (Horn & Gassaway, 2007). Such evidence has an advantage over the more tightly controlled RCT in that it more accurately reflects how staff implement and patients respond to interventions when conditions are not artificially controlled to maintain the experiment.

Emergency nurses who are reviewing evidence need to consider more than just evidence of effectiveness; they also need to consider the appropriateness of the proposed change to the patients being cared for, meaningfulness of the intervention to patients and their families, and the feasibility of implementing the practice in their particular setting or context (Evans & Boyce, 2008; Rycroft-Malone et al., 2004; Shapiro & Donaldson, 2008b). The JBI also includes the domain of meaningfulness to address evidence of the meanings of the illness experience to patients and significant others (JBI, 2008, 2009). Whereas Evans (2003) and the JBI provide guidance for grading evidence in these domains, the AACN's levels of evidence (Armola et al., 2009) do not explicitly address domains other than effectiveness, although they include qualitative and interpretive studies in their hierarchy. Table 1 summarizes some of the key components of the AACN evidence-leveling hierarchy, Evans' evidence framework, and the JBI levels of evidence and grading recommendations.

Table 1
Table 1:
Key components of hierarchies of evidence and grading criteria for selected nursing references
Table 1
Table 1:
Key components of hierarchies of evidence and grading criteria for selected nursing references (Continued)



Effectiveness, a term used similarly by both Evans (2003) and the JBI, refers to the reliability with which a given practice or intervention results in the desired outcome. This is the only domain included in the medical grading systems referred to above, and it is certainly a critical component of any grading system. Meta-analyses, systematic reviews, and multicenter trials are considered the strongest evidence in this domain. The RCT is generally considered the next strongest in demonstrating the efficacy of any single treatment. Because of the rigor involved in conducting a well-designed RCT, the outcome achieved can be reasonably attributed to the intervention (e.g., a medication) and not to some other factor that may have been present, such as the patient's physical condition, other medications or treatment the patient may be undergoing, or variation in the way the intervention or medication was delivered. Other research designs used to establish effectiveness are rated lower than the RCT by all three nursing examples included here. Advanced emergency nurses should keep in mind, however, that although the RCT is an excellent design choice to test interventions in controlled settings (i.e., the efficacy* of the intervention), it is not a particularly good design to study effectiveness in clinical settings where multiple patient-level, provider-level, and institution-level variables interact simultaneously to affect outcomes (Horn & Gassaway, 2007).

Horn and Gassaway (2007) propose relying more on practice-based evidence (PBE) when testing an intervention for effectiveness. Studies using PBE draw on the expertise of care providers to identify multiple patient-level variables and processes of care. Detailed data are collected on all possible patient and process variables that could affect the outcome(s) and then analytic tools are used to isolate a given process or intervention from the “noise” of other variables. Using this method, it is possible to evaluate more than one intervention at a time and determine the relative benefits of each for different types of patients, for example, men versus women; patients older than 65 years versus those younger than 65 years; those with higher acuity scores versus those with lower scores. In the traditional hierarchies of evidence, these observational trials are not generally considered as strong as RCTs and have been accorded a grade of low or very low in some grading recommendations (Grade Working Group, 2004; Guyatt et al., 2006). Evans (2003) places well-designed observational studies alongside RCTs as good evidence of efficacy, whereas the JBI and the AACN still rate RCTs more highly in terms of effectiveness. It is important to note that the emphasis the JBI places on the narrowness of confidence intervals in RCTs and the consistency of findings across multiple studies. Both of these criteria add to the user's confidence that the intervention being evaluated is predictably effective in a variety of settings.

Appropriateness, Meaningfulness, and Feasibility

Studies that address the appropriateness of their findings to various patient populations are especially helpful to emergency nurses. Different emergency departments (EDs) see different types of patients and thus evidence from a study in one ED may or may not be applicable to others. Evans (2003) uses appropriateness to address the psychosocial aspects of care, relating to the patients' experiences, their understanding of health and illness, and the outcomes they hope to achieve with their ED visit. The JBI (JBI, 2008) includes many of these considerations in its domain of meaningfulness, along with ethical considerations. Both Evans and the JBI appreciate that these dimensions of care are best addressed with nonexperimental research designs, including both quantitative and qualitative methods. The JBI emphasizes metasyntheses and consistency of findings across studies to add confidence that the results point to a true understanding of the phenomenon under study.

Feasibility refers to the reality that change is difficult and that change in environments such as EDs and hospitals is difficult even in the face of strong evidence of effectiveness and appropriateness. Here, as with effectiveness and appropriateness, systematic reviews and multicenter trials provide excellent sources of evidence, and RCTs, observational studies, and interpretive studies may yield good evidence, especially related to aspects of organizational culture that could affect both the ease of acceptance of the new practice and help determine how best to implement it (Evans, 2003).


Evaluating evidence and making recommendations for practice changes are two separate, albeit closely related, processes. Evidence that is recommended for practice must not only meet standards for good internal and external validity, that is, should be considered “strong” enough to instill confidence in those ready to apply the practice, but also be appropriate to both the patients and practice setting, as well as feasible to implement. Using a systematic approach and making explicit the criteria for assigning recommendations gives users of that evidence confidence that their recommendations for practice change reflect the current evidence (Grade Working Group, 2004). The premise is that stronger evidence from high-quality sources is more likely to truly represent best practices than that from lesser quality sources (Newhouse, Dearolt, Poe, Pugh, & White, 2007). Some organizations have chosen simple approaches such as labeling their recommendations “high,” “good,” or “low/major flaw” (Newhouse et al., 2007, p. 90), whereas others such as the AHA (2005) and McGinn et al. (2000) have somewhat more complex levels of recommendation, although they are still focused solely on evidence of effectiveness. Among the nursing resources reviewed here, only the JBI makes explicit recommendations for practice based on the results of the evidence review and they do this across the four domains of effectiveness, appropriateness, meaningfulness, and feasibility. Table 2 shows how the JBI applies its grades to the evidence.

Table 2
Table 2:
The Joanna Briggs Institute grades of recommendations

Advanced emergency nurses who are considering a systematic practice change should agree on a grading scheme for use with their review of published evidence. It does not have to be a complex scheme but should contain the domains included by Evans (2003) and the JBI (2008). Although evidence of effectiveness is necessary before recommending or adopting a change in practice, it is not sufficient to address the multiple dimensions that must be considered when nurses apply interventions in their practice setting. Advanced emergency nurses must also consider more than just the design of the studies or recommendations they are evaluating. Studies must be both well designed and well conducted for them to add meaningfully to the scientific basis for practice; “studies of poor methodological quality” provide poor evidence of effectiveness, appropriateness, meaningfulness, and feasibility, and it remains up to the advanced emergency nurse to critically appraise studies with this in mind (Evans, 2003, p. 79).

Many other grading schemes have been proposed both in medicine (AHRQ, 2002; Guyatt et al., 2006; Uhlig et al., 2006) and in nursing (Newhouse et al., 2007), as well as for reviewing clinical guidelines or recommendations (The AGREE Collaboration, 2001). Table 3 provides an overview of some of these schemes that advanced practice nurses may find especially useful. In applying an evidence rating system or assigning a graded recommendation for practice, advanced emergency nurses should rely on the principle of transparency, making explicit to their readers the ways in which evidence has been evaluated and grades assigned.

Table 3
Table 3:
Selected references for grading evidence
Table 3
Table 3:
Selected references for grading evidence (Continued)

Advanced emergency nurses are expected to be able to review current and emerging evidence for emergency care and recommend whether or not to implement any practice changes on the basis of this review. Whether they participate in actually reviewing the evidence and assigning the recommendations, or confine themselves to evaluating the recommendations of their professional associations and colleagues, they need to understand and appreciate the meaning of the levels of evidence and associated recommendations for adoption.


Agency for Healthcare Research and Quality. (2002). Systems to rate the strength of the evidence: A summary (No. 47). Retrieved
American Heart Association. (2005). Part 1: Introduction. Circulation, 112(22, Suppl.), III-1–III-4.
Armola, R. R., Bourgault, A. M., Halm, M. A., Board, R. M., Bucher, L., Harrington, L., ... Medina, J. (2009). Upgrading the American Association of Critical-Care Nurses' evidence-leveling hierarchy. American Journal of Critical Care, 18, 405–409.
Evans, D. (2003). Hierarchy of evidence: A framework for ranking evidence evaluating healthcare interventions. Journal of Clinical Nursing, 12, 77–84.
Evans, K. D., & Boyce, K. E. (2008). Focusing on the issues. Fostering a hierarchy of evidence within the profession. Journal of Diagnostic Medical Sonography, 24(3), 183–188.
Grade Working Group (2004). Grading quality of evidence and strength of recommendations. BMJ, 328, 1–8.
Guyatt, G., Gutterman, D., Baumann, M. H., Addrizzo-Harris, D., Hylek, E. M., Phillips, B., ... Schünemann, H. (2006). Grading strength of recommendations and quality of evidence in clinical guidelines. Chest, 129, 174–181.
Horn, S. D., & Gassaway, J. (2007). Practice-base evidence study design for comparative effectiveness research. Medical Care, 45(10), S50–S57.
Joanna Briggs Institute. (2008). The JBI approach to evidence-based practice. Retrieved September from
Joanna Briggs Institute. (2009). JBI levels of evidence. Retrieved from
McGinn, T. G., Guyatt, G. H., Wyer, P. C., Naylor, C. D., Stiell, I. G., & Richardson, W. S. (2000). Users' guides to the medical literature XXII: How to use articles about clinical decision rules. JAMA, 284, 79–84.
Newhouse, R. P., Dearolt, S. L., Poe, S. E., Pugh, L. C., & White, K. M. (2007). Johns Hopkins nursing evidence-based practice model and guidelines. Indianapolis, IN: Sigma Theta Tau.
Rycroft-Malone, J., Seers, K., Titchen, A., Harvey, G., Kitson, A., & McCormack, B. (2004). What counts as evidence in evidence-based practice? Journal of Advanced Nursing, 47(1), 81–90.
Shapiro, S. E. (2007). Evidence-based practice for advanced practice emergency nurses. Advanced Emergency Nursing Journal, 29, 331–338.
Shapiro, S. E., & Donaldson, N. E. (2008a). Evidence-based practice for advanced practice emergency nurse, Part III: Planning, implementing, and evaluating an evidence-based small test of change. Advanced Emergency Nursing Journal, 30, 322–332.
Shapiro, S. E., & Donaldson, N. E. (2008b). Evidence-based practice for advanced practice emergency nurses, Part II. Critically appraising the literature. Advanced Emergency Nursing Journal, 30, 139–147.
The AGREE Collaboration. (2001). Appraisal of Guidelines for Research & Evaluation (AGREE) instrument. Retrieved
Uhlig, K., MacLeod, A., Craig, J., Lau, J., Levey, A. S., Levin, A., ... Eknoyan, G. (2006). Grading evidence and recommendations for clinical practice guidelines in nephrology. A position statement from Kidney Disease: Improving Global Outcomes (KDIGO). Kidney International, 70, 2058–2065.

*Efficacy means that the intervention works in the controlled research environment; effectiveness means that the intervention works in “real-life” clinical environments.
Cited Here


evidence-based practice; grading evidence

© 2010 Lippincott Williams & Wilkins, Inc.