A study of 169 cancer clinical practice guidelines for lung, breast, prostate, and colorectal cancers found that none of them fully met the standards created by the Institute of Medicine in its 2011 “Clinical Practice Guidelines We Can Trust” report (iom.edu/Reports/2011/Clinical-Practice-Guidelines-We-Can-Trust.aspx). The guidelines assessed in the study, published in the July 10 issue of the Journal of Clinical Oncology (2013:31:2563-2568) met an average of only 2.75 of the eight criteria.
“The takeaway is bringing that gap to light,” the lead author, Sandra L. Wong, MD, MS, Associate Professor of Surgery at the University of Michigan Medical School, said in an interview. “One of the challenges is to actually examine those eight criteria—which ones are absolutely critical? Which ones are important, but if a guideline fails to meet that standard, do not render that guideline completely untrustworthy?
“The findings beg the question: How pragmatic are the IOM standards? Are they too stringent? We know we can't have high-level evidence for everything we do in medicine, but it's important to know the level of evidence behind the guidelines we use as clinicians,” she said.
The study analyzed oncology clinical practice guidelines published between 2005 and 2010 for the leading causes of cancer-related mortality (non-small-cell lung cancer, prostate cancer, and colorectal cancer for men; and NSCLC, breast cancer, and colorectal cancer for women). Guidelines were included only if they pertained to the screening, diagnosis, treatment, or follow-up care of one of the four selected cancer types, and the guidelines were scored twice: first using the eight standards in the IOM report and second using 20 subcriteria of those standards.
Each guideline was scored independently by each of the study's four authors (in addition to Wong: Bradley N. Reames, MD; Robert W. Krell, MD; and Sarah N. Ponto), and the scores were then tabulated and summary statistics were generated. There was minimal variability between the scores, Wong noted.
In addition to the overall finding that none of the guidelines fully met the criteria and that on average the guidelines did not even meet three of the eight IOM criteria, other results were that:
* Only four of the 20 subcriteria were met by more than half of the guidelines;
* Lung cancer guidelines scored higher than other cancer types (meeting 11.4 of the 20 subcriteria);
* Guidelines from the U.S. on average scored higher than guidelines from international groups; and
* Guidelines from groups producing more guidelines during the study period had higher scores than guidelines from groups producing fewer than four guidelines during the study period.
Perfect vs. Trustworthy
Wong noted that the scores—though low—were not entirely surprising, since previous research had shown similar results. What was worrisome about the findings, Wong noted, were the low scores in meeting the standard that systematic literature reviews should be used. “It questions how well informed the recommendations are if there's no literature review to back up the guidelines,” she said.
Asked for his opinion for this article, Sheldon Greenfield, MD, Executive Co-Director of the Healthy Policy Institute and the Donald Bren Professor of Medicine in the School of Medicine at the University of California, Irvine, who was a coauthor of the IOM standards, said that for guidelines for which there is good evidence, the standards should be met—“the scores observed in this study are still too low.”
Wong said it may be important in the future to begin grading guidelines. “It's important for clinicians to know that the guidelines that they're following are either really high-quality or just a consensus guideline—written by experts, but without a review of the evidence.”
Agreeing with Wong, Greenfield said there is a need to consider updating the standards to weight the most important ones. “Some are going to be less important than others and a lot of us agree about that.”
He said the two most important criteria for trustworthy guidelines that he would give the most weight to would be (1) conflict of interest and the composition of the committee; and (2) systematic review. Having an unbiased guideline development committee is important because even though experts often included on the committees know the topics they are writing the guidelines for, they may not recognize their own conflicts of interest.
Look at the guidelines for mammography as an example, he said: “The evidence is very good, but even then, the professional societies disagree, and that's because of preferences, values, professional issues, and biases. That's why having broad representation in the composition of the committee is so important.”
Additionally, systematic review is important to determine how good the evidence is and what it directs doctors to do, he said. “Evidence is key to the entire thing. We know there's a huge need for opinions because there are many places where evidence doesn't exist. But, wherever the evidence is good, it should be used.”
Any decision on how to weight the current IOM standards would be ultimately determined by a national committee after national studies were conducted, Greenfield noted.
Need to ‘Operationalize’ Standards
Also commenting for this article, David F. Ransohoff, MD, Professor of Medicine and Epidemiology at the University of North Carolina School of Medicine, the coauthor (with Harold C. Sox, MD) of an accompanying editorial—titled “Guidelines for Guidelines: Measuring Trustworthiness” (JCO 2013;31:2530-2531)—said there is more of a need to “operationalize” the IOM's current standards, rather than rewriting or re-evaluating them.
The Institute of Medicine standards describe conduct that produces trustworthy guidelines, but do not actually judge the quality of that conduct. “The IOM principles are fine, but the IOM did not create a tool that could actually be applied; such a scoring system would specify how to apply each standard and how to derive a score for each standard,” he said via email. And, some standards would likely be weighted as more important than others. This type of scoring system, if developed, would answer the question of “how trustworthy” the guidelines actually are, he said.