The idea of guidelines is beguilingly simple. You gather together all the evidence, provide it to a group of experts who sift it through their own knowledge and expertise, and arrive at a ‘what to do and (sometimes) how to do it’ recipe that is aimed at improving patient care and clinical outcomes. And it can work: think only of antiplatelets after a heart attack, or use of statins in high-risk patients, and note the strong decline in mortality due to myocardial infarction of 64% for men and 47% for women between 1987 and 2007.1 In Norway in 2007, mortality had fallen by approximately 70% from the peak for age groups under 80, and by about 40% for those over 80 years of age.
Of course, this example is of a situation with particular advantages: a high-profile condition with high mortality, with measurable risk factors (blood pressure, cholesterol), with an astonishingly good evidence base (typically with large individual patient analyses), with effective therapies (antiplatelets, antihypertensives, statins) and frequently with a significant organisational push from government. Treating cardiovascular risk factors has no higher priority, is a common marker of quality in primary care, often supported by cash incentives for guideline implementation.
Elsewhere the picture is not so rosy. Consider, for example, guidance for co-therapy of non-steroidal anti-inflammatory drugs with a gastroprotective agent. The evidence base for risk of gastrointestinal bleeding is enormous, the risk factors well understood and easy to recognise (age being the most important), the consequences serious (5–20% mortality2) and the benefits of protection proved in randomised trials. And yet in this case, the implementation of guidance is woeful: surveys show that at best about one patient in four with a risk factor actually gets a gastroprotective agent.3
It is tempting to draw the obvious conclusion – that political will and cash make things happen, and that with them we can get guidelines taken seriously. Unfortunately, there are two other problems we should consider before we try to draw any conclusions.
The first problem involves answering simple questions about whether evidence-based guidelines work: are they implemented, and do they deliver the expected improvement in clinical outcomes? Two examples given so far show that the answer is both yes and no, respectively. The actual evaluation of guidelines to determine their effects is rare. When they are evaluated, the results can reveal a situation of complete ignorance, as in the case of UK National Guidelines for HIV testing, in which knowledge of the guidelines was scant among non-experts, and fewer than 14% of patients who should have been tested were actually tested.4 Ignorance of, and lack of use of, guidelines is the most common finding in the few studies that evaluate them.
The second problem is one of evidence. It is not that there is too little: there are about 33 600 papers on PubMed with ‘guidelines’ in the title, and almost 1000 with the addition of ‘evidence-based’. Just because a guideline claims to be evidence-based does not necessarily mean that it gives the correct advice. For example, when 20 evidence-based guidelines for anticoagulation in atrial fibrillation were tested on the same 100 consecutive patients, the proportion for whom they recommended anticoagulation was evenly distributed between 13 and 100%.5
The trouble with evidence-based medicine is that most of it is wrong. There are many ways in which evidence can be wrong. We know about randomisation and blinding, for example, and that lack of randomisation and open instead of double-blind studies can result in major bias – all in the same direction of increasing the size of the treatment effect. However, there are other major potential biases that are largely ignored and which endanger the evidence on which we want to build our guidelines. Here are three:
- Size – or rather lack of it – is one of the most frequent problems. Statistical analyses are often performed on small datasets, in which the random play of chance is potentially huge. Examples of small studies tending to overestimate treatment effects have existed for some time, but the problem is now being more widely acknowledged.6 In anaesthesia, although larger trials are often conducted, the majority remain small, as shown in a secondary analysis of granisetron trials, which also pointed out other problems.7 But size is only a problem when it is ignored. There is long-standing evidence that when there are fewer than 200 events, beneficial or harmful, there is a significant possibility of an incorrect result because of the random play of chance.8
- Non-Gaussian distributions reported as averages represent another major problem; their use is known to be incorrect,9 yet is common, particularly in pain and anaesthesia. In acute pain, for instance, we know that patients obtain either good pain relief or little,10 a result echoed in chronic pain situations as well. Anaesthetic interventions to reduce postoperative pain typically use postoperative opioid consumption as an outcome, a situation in which mean, median and mode are quite different from one another, and in which the mean is quite unrepresentative of the majority of patients.10 The use of incorrect averages in trials is compounded as meaningless means in meta-analyses, and runs the risk of informing guidelines that are just plain wrong.
- Other major sources of bias are being discovered. In chronic pain, for example, short duration trials overestimate efficacy compared with those of longer duration, and different methods of imputation to deal with patients who withdraw from studies can completely change the result of a trial.
Evidence is only helpful if it is reliable – unimpeached by sources of bias known to medical science and large enough to be trusted. All too often, what we are given is unreliable or wrong.
However frustrated we are with the state of the evidence we are often given, it is important, and is the foundation of our efforts to make things better. But guidelines alone do not cut it. Indeed, there is increasing evidence that it is relatively simple but important managerial efforts by teams that deliver the best results.
As is often the case, thinking in anaesthesia is in the forefront. For example, Gurses et al.11 developed a conceptual framework to provide a comprehensive and systematic guide identifying barriers to guideline compliance, and to design effective interventions to improve patient safety. Their conclusion was that an interdisciplinary approach is needed to improve clinicians’ compliance with evidence-based guidelines. In other words, people have to buy into the evidence, and buy into the need to use it. A multidisciplinary approach can be used in a range of different activities including, for example, designing better workflows in theatres.12 In this case, the main aims of using Operational Research approaches were related to increasing patient throughput, improving satisfaction of patients, surgeons, and operating theatre staff, maximising the utilisation of theatre resources, reducing surgery cancellations and reducing time loss due to late starts and changeovers. The non-medical world uses operational research all the time, as part of good management and quality control.
The principles work. Although treatment of acute myocardial infarction (AMI) is much better in recent decades, there is still considerable variation between hospitals. What distinguished high-performing hospitals was an organisational culture supporting efforts to improve AMI care across the hospital. Evidence-based protocols and processes are necessary but on their own are not sufficient to achieve high performance.13 Pain relief/control in hospitals is known to be poor, and a survey in Italy demonstrated the link between high prevalence of severe pain and low use of analgesics, with a decoupling of attitudes to pain between patients and nurses.14 Yet in a postoperative ward handling British military casualties, severe pain could be almost abolished by a team approach to eliminate it.15
If the road from evidence-based medicine to guidelines and recommendations is long and winding, the fault is two-fold.
- First the evidence is too often wrong, or irrelevant in the way it is described. After 20 years that is shameful; by now, we ought to be able to tell what constitutes reliable evidence. Health Technology organisations, journals and particularly organisations like the Cochrane Collaboration need to beef up their quality standards substantially. At present, too many guidelines include poor quality evidence that may well be harming more patients than it helps.
- Second, if we want guidelines and recommendations to improve clinical outcomes, then we need to accept that they constitute nothing more than hot air unless multidisciplinary local ownership and local action make them work.
Finally, we need to embrace the concept of failure as a marker of not getting things right, and measure it. Here is an example of a marker of failure – when a nurse in a major British hospital asks a postoperative patient to score their pain on a scale of 0–10 and, on getting a reply of 6, responds that that means only mild pain that requires no treatment. Measuring failure on an ongoing basis is the essence of good quality control, because neither evidence nor guidelines are of any value if failure is unmeasured or ignored. Measuring failure in ongoing audit loops offers an alternative approach if the definition of failure is credible and useful.
This article was checked and accepted by the editors, but was not sent for external peer-review.
1. Reikvam A, Hagen TP. Changes in myocardial infarction mortality. Tidsskr Nor Laegeforen
2. Straube S, Tramèr MR, Moore RA, et al. Mortality with upper gastrointestinal bleeding and perforation: effects of time and NSAID use. BMC Gastroenterol
3. Moore RA, Derry S, Phillips CJ, McQuay HJ. Nonsteroidal anti-inflammatory drugs (NSAIDs), cyxlooxygenase-2 selective inhibitors (coxibs) and gastrointestinal harm: review of clinical trials and clinical practice. BMC Musculoskelet Disord
4. Gupta ND, Lechelt M. Assessment of the implementation and knowledge of the UK National Guidelines for HIV Testing (2008) in key conditions at a UK district general hospital. Int J STD AIDS
5. Thomson R, McElroy H, Sudlow M. Guidelines on anticoagulant treatment in atrial fibrillation in Great Britain: variation in content and implications for treatment. BMJ
6. Nüesch E, Trelle S, Reichenbach S, et al. Small study effects in meta-analyses of osteoarthritis trials: meta-epidemiological study. BMJ
7. Moore RA, Derry S, McQuay HJ. Fraud or flawed: adverse impact of fabricated or poor quality research. Anaesthesia
8. Flather MD, Farkouh ME, Pogue JM, Yusuf S. Strengths and limitations of meta-analysis: larger studies may be more reliable. Control Clin Trials
9. Delucchi KL, Bostrom A. Methods for analysis of skewed data distributions in psychiatric clinical studies: working with many zero values. Am J Psychiatry
10. Moore RA, Mhuircheartaigh RJ, Derry S, McQuay HJ. Mean analgesic consumption is inappropriate for testing analgesic efficacy in postoperative pain: analysis and alternative suggestion. Eur J Anaesthesiol
11. Gurses AP, Marsteller JA, Ozok AA, et al. Using an interdisciplinary approach to identify factors that affect clinicians’ compliance with evidence-based guidelines. Crit Care Med
2010; 38 (8 Suppl):S282–S291.
12. Guerriero F, Guido R. Operational research in the management of the operating theatre: a survey. Healthcare Manag Sci
13. Curry LA, Spatz E, Cherlin E, et al. What distinguishes top-performing hospitals in acute myocardial infarction mortality rates? A qualitative study. Ann Intern Med
14. Visentin M, Zanolin E, Trentin L, et al. Prevalence and treatment of pain in adults admitted to Italian hospitals. Eur J Pain
15. Aldington DJ, McQuay HJ, Moore RA. End-to-end military pain management. Philos Trans R Soc Lond B Biol Sci