The Fable of Babel and Building a Foundation for Quality

Detterbeck, Frank C. MD

Department of Thoracic Surgery, Yale University School of Medicine, New Haven, Connecticut.

Disclosure: The author declares no conflicts of interest.

Address for correspondence: Frank C. Detterbeck, MD, Division of Thoracic Surgery, Department of Surgery, Yale University School of medicine, BB205, 330 Cedar St., New Haven, CT 06520-8062.

There is no question that accurate staging, particularly of the mediastinum, plays a central role in our current approach to lung cancer. The status of nodal involvement has a profound impact on prognosis, and it is a critical decision point in our treatment algorithms. One could equate initiating a treatment plan without clearly defining the mediastinal node status with setting out on a journey without knowing exactly where one is trying to go, as opposed to programming a global positioning system for the shortest, fastest (or perhaps most cost-effective?) route.

There is also no question that, at least in North America, we have not done a good job of assessing nodes in lung cancer. An assessment of surgical lung cancer cases in 2001 found that only 27% of patients underwent preoperative invasive mediastinal staging, and in half of the mediastinoscopies not even a single node was sampled.1 A more recent study associated better long-term outcomes with a greater extent of preoperative staging.2 There are data that addressing variations in care would save many times more lives than what we hail as “breakthrough advances” in new drugs.24

Furthermore, there is a focus on quality of care in medicine. There is a push to develop process measures (quality metrics) as a tool to assess in real-time when we are doing well and when we are not, much like the interest in biomarkers as prognostic or predictive indicators. Very few validated quality metrics exist in lung cancer; at this point we have more speculative measures than validated ones. Many groups are actively trying to develop such metrics. An obvious candidate, for the reasons noted above, is a measure of the quality of mediastinal staging.

Osarogiagbon et al5 deserve praise for an ongoing effort to develop data to help define quality indicators, as is addressed in an article in this issue of the Journal of Thoracic Oncology. They have assessed the extent of mediastinal staging at the time of resection (i.e., mediastinal node dissection [MLND], systematic sampling [SS], random sampling [RS], and no sampling [NS]) in a regional quality improvement project involving all lung cancer resections in the Memphis area from 2004 to 2007. They compared the extent of mediastinal staging as determined by the surgeon (the procedure name in the title of the operative report), by an independent review of the operative procedure, and by the pathology report.

Osarogiagbon and coworkers1,2,5 found that the extent of nodal staging at the time of resection was poor, consistent with other reports. By the surgeons assessment, 48% had NS, 8% RS, and 45% MLND. However, by the pathology report, 42% had NS, 50% RS, 9% SS, and none were a MLND. What is striking is the extremely low concordance between how the mediastinal assessment was classified by the surgeon, by an independent audit of the procedure, and by the pathology report. In fact, most of the concordant cases were those with no nodes assessed; with exclusion of these the concordance was only 11%. This essentially eliminates, at least at this time, using the extent of intraoperative node assessment as a quality indicator.

Is it a matter of documentation? How often are level 10 nodes mobilized and removed en bloc with the specimen, and thus not specifically mentioned in the operative report or pathology report? How often are nodes in stations 5 and 6 or 2R and 4R removed as one packet but labeled only as station 5 or 4R and therefore not meeting the predefined criteria for a complete MLND or SS? The authors performed a secondary analysis to account for this but found that this only minimally improved the results, demonstrating that this was not a major factor.

Are we simply operating at such a low level that no marker of high quality can emerge? In this series, the approximately 185 resections per year were divided among 8 hospitals and 21 surgeons. Only two surgeons had a general thoracic practice (who in fact performed only a few of the resections), and all others had a combined cardiac and thoracic practice.6 Nevertheless, the situation represents the reality in the United States, and we have to find metrics and interventions that work in this environment to improve care.

Reading between the lines, it seems that the major source of discordance is that we are speaking different languages, like in the story of the tower of Babel. If our ability to understand what we mean is limited, then we are clearly not in position to collectively build a foundation for quality metrics, at least not with these building blocks. We can't implement a quality indicator which might be met in name but will vary in terms of what is actually done and therefore not be useful. This is particularly important because most quality indicators are process measures that are thought to correlate with actual outcomes but in fact never get reevaluated once introduced to see if they actually do.

Given these difficulties, perhaps we should focus less on process measures (although this is favored in the United States) and consider structural measures (as done in Europe). However, only limited indirect data are available to validate measures that have been embraced in parts of Europe (e.g., presenting each case before a regional multidisciplinary panel of experts or completing the evaluation within a designated time frame). Furthermore, there are differences between health care in Europe and the United States. In much of Europe, for example, thoracic surgical care is often delivered in dedicated centers (often former tuberculosis hospitals) that have some depth to the thoracic team devoted to caring for these patients. Data from countries like the Netherlands, where care was dispersed but more recently has became more regionalized, supports the concept of organized programs.7

It is good that groups like Osarogiagbon et al.5 are carrying out their work. It is clear that we have a long way to go and we have not defined simple quality indicators. It is also clear that we have a major educational challenge to even get to the point where we can discuss terms and be comfortable that we are referring to the same thing. The volume of literature and the multiple clinical guidelines have not yet brought about sufficient consistency in care. Perhaps we are struggling to find ways to improve the quality of care not despite—but rather because of—the explosion of literature and the difficulty in staying abreast of it without a dedicated team of sufficient critical mass.

