Perhaps the most important goal of medical education is to train competent, knowledgeable, and self-directed physicians, a process that begins early in medical education. For instance, most inpatient laboratory tests in teaching hospitals are ordered by interns under the supervision of more senior residents and attending physicians. This system of graded responsibility with supervision serves educational and clinical goals. However, the shared responsibility for test ordering makes it difficult to disentangle individual test-ordering characteristics—characteristics that might be important both to improve the use of laboratory tests and to understand variation in supervision and authority.
Understanding the use of laboratory tests is important clinically and economically,1,2 to ensure comprehensive but not excessive evaluation of patients' conditions. Current efforts to improve or reduce laboratory test use often include education for those ordering the tests,3–9 unbundling of laboratory panels so that the unwanted tests are avoided,10,11 and restricting repeat lab orders.12,13
In order understand these issues more fully, we examined the frequency and cost of laboratory test ordering in a large general medical inpatient service at the Hospital of the University of Pennsylvania. Laboratory test utilization serves as a test case to understand how supervision and control can be measured in hierarchical clinical teams. Using data on more than 10,000 patient-days, we tested the hypothesis that variation between patient-days in lab test utilization could be explained by differences across interns, across supervising residents, or across attending physicians. Further, using survey data from the same group of residents and interns, we tested the hypotheses that interns and residents were aware of their lab test ordering patterns and that they believed they influenced lab test utilization on inpatient medical services.
Our general approach here is to examine the relative sources of variation in the number and cost of lab tests ordered for a patient on any given day. Conceptually, we seek to assess the extent to which different numbers of tests are ordered on two randomly selected patient-days, which differ only in who the intern was. We want to compare that amount of variation with the amount of variation present on two randomly selected days that differ only in who the attending physician was. If interns are particularly influential in laboratory utilization, then there should be greater differences in the number of tests ordered when comparing two interns than when comparing two attending physicians. If attendings are closely supervising interns, conversely, then we would expect greater variation between patient-days with different attending physicians than patient-days with different interns. We implement this approach more generally using a regression framework to assess explainable variation.
To examine control of day-to-day laboratory utilization, we made use of the overlapping but distinct rotation schedules of interns, residents, and attendings. Any given patient-day can be uniquely assigned to a care team made up of an intern, supervising resident, and attending physician. The variation in schedules means that each physician participates in several distinct team configurations, an important advantage of our study over past work.17 We examined this variation in our statistical analyses to determine the degree to which variation in day-to-day laboratory utilization can be attributed to different levels of supervision. We complemented this statistical analysis with surveying of the trainees.
This work was reviewed by the institutional review board of the University of Pennsylvania under protocol no. 805231.
This study was conducted on the hospitalist service of the Hospital of the University of Pennsylvania. This service is responsible for roughly 5,000 general medical admissions each year—approximately 40% of the admissions for the Department of Medicine. The admissions were made up of patients admitted through the Emergency Department without an identified physician in the system (80%) or those cared for by general internists in the faculty group clinical practice. At any given time, there were four teams of one faculty member, one resident, and two interns. The length of the rotations was 14 days for faculty and typically 28 days for interns and residents. Attending physicians were full-time hospitalists, with a median of seven years of practice at the time of this research.
Measurement of inpatient utilization
The unit of analysis was the patient-day, defined as a 24-hour period from noon until noon. Each patient on the hospitalist service had a uniquely identifiable primary intern. These data were recorded with a high degree of accuracy, as patient assignments drive the electronic sign-out system for interns and were also the primary mechanisms by which nurses and consulting physicians could identify physicians on call. Our primary outcome was the number of lab tests, although we considered total lab costs as a secondary outcome.
Lab tests were simultaneously attributed to each patient's primary intern, resident, and attending. Tests ordered by night coverage interns or residents were included because our program had explicit norms that planning for evening laboratory needs should be considered part of the day teams' responsibility. Lab tests ordered in the outpatient setting or during the initial workup in the emergency room were not attributed to the inpatient teams. Data for patients who were transferred off the hospitalist service (e.g., to the ICU) were included only prior to the patients' transfer.
Common lab tests were aggregated to clinically meaningful composites if they were collected at the same time. Thus, a complete blood count was considered a single test, as was a basic set of chemistries—sodium, potassium, chloride, bicarbonate, BUN, and creatinine—whether it was ordered as a bundle or separately. Point-of-care fingerstick tests for blood glucose levels were excluded because these are routinely done by nurses without physician input. Any laboratory specimens received by pathology—including tests on nonblood body fluids—were included. Lab costs were based on marginal variable supply costs obtained from the hospital cost-accounting system, which reflects the minimal costs associated with processing the lab tests, excluding any overhead, personnel fees, or markups.
We examined utilization data for the period from January 3, 2007, to June 19, 2007. This period was chosen to be late enough in the year that all interns would be familiar with the mechanisms of lab ordering and operations of the hospitalist service.
There were 14,736 patient-days on the hospitalist service during this period. We excluded patient-days that could not be mapped to the scheduling software, usually the result of coverage by interns rotating through the Department of Medicine from another department. We excluded patient-days that were cared for by covering interns, residents, or attending physicians, by requiring that all providers included in the analysis had cared for at least 30 patient-days.
We conducted a brief, Web-based survey of all interns and residents shortly after the completion of the 2006–2007 academic year. Participation in the survey was voluntary. Questions were validated with cognitive interviewing and pre-testing with house officers18 and are reproduced in Appendix 1. Questions were validated with cognitive interviewing and pretesting with house officers.18 Many nonmedicine interns rotate through the hospitalist service, but we surveyed only Department of Medicine residents. Residents who rotated in the first half of the academic year were included in the survey but not in the measurement of inpatient laboratory test use. Thus, the denominators differ somewhat from those analyzed in the direct measurement of inpatient utilization.
Our analyses of control of laboratory utilization used ordinary least squares regression.19 Overall explained variation was defined as the R 2 for a model containing indicator variables for each intern, resident, and attending. These indicator variables provide a fixed effect, controlling for all stable characteristics of a provider without needing to directly measure those characteristics.20 We calculated the uniquely explained variation for each category of provider as the difference in R 2 for models with and without that category's indicator variables, after inclusion of the other categories, and then divided by the overall explained variation. We used the Huber–White sandwich estimator to adjust standard errors for clustering of patient-days within patient-visits. We calculated 95% confidence intervals (CIs) for the partitioning of the explained variance by bootstrapping 1,000 replicate samples at the patient-day level.21,22 We report the percentile-based CIs. All analyses and simulations were conducted using Stata 9.2 (Stata Corporation Inc., College Station, Texas). We used each patient's Charlson comorbidity index in our sensitivity analysis observing the effect of patient severity on our results; the Charlson score is a well-validated comorbidity index.23–25
Patient and physician populations
There were 10,908 (74.0%) patient-days in our analytic sample. These represented 2,066 patients cared for across 2,351 hospitalizations by 85 interns, 56 residents, and 27 attending physicians. Median length of stay was 3 days, with an interquartile range of 2 to 6; mean length of stay was 5.2 days. Mean patient age was 57.3 years (SD 18.2 years), and 56.0% of patients were female; 47.5% of patients were black, 13.4% were white, and 31.9% did not report race information. The mean Charlson score was 1.9 (SD 2.1).
Variation in lab test ordering
The average patient-day had 5.9 lab tests ordered (6.4 SD), with a median of 5 (interquartile range: 0, 8); 4.1% of the variation in the number of laboratory tests ordered could be explained on the basis of members of the care team. As shown in Figure 1, 45% (95% CI: 39–53) of that variation was explained by the individual intern, 26% (CI: 21–34) by the resident, and 9% (CI: 7–16) by the attending physician; 20% (CI: 6–25) of the variation could not be uniquely attributed to any level of the care team.
A complementary approach is to ask whether adding interns to the equation after controlling for resident and attending increases the explanatory power of the model more than chance alone. In each case, adding interns, residents, or attendings independently and statistically significantly improved explanatory power beyond the other two categories alone; however, the effect for attendings was of more marginal significance. For example, adding intern to a model already containing resident and attending had an F statistic for the change between the two models of 2.47 with 84 df, P < .001. In contrast, adding attending to a model already containing interns and residents had an F statistic of 1.60 with 26 df, P = .027.
The observed patient data allow another way to express the substantial observed variation between low-ordering and high-ordering interns and residents. Interns and residents were divided into quintiles, based on the mean number of lab tests they ordered per patient-day. For interns, members of the five quintiles ordered an average of 4.48, 5.38, 5.97, 6.64, and 7.48 labs per patient-day. For residents, who were associated with less variation, the corresponding values were 4.80, 5.43, 6.00, 6.52, and 7.34 labs per patient-day.
Variation in lab test costs
Variation in laboratory test costs follows a similar pattern. The total number and total cost of labs were well correlated at 0.75. Laboratory test costs were determined using internal accounting data as marginal variable supply costs. Of the variation in daily cost, 3.2% was explained by the identity of team members: 46% (95% CI: 40–58) of that variation was attributable to the intern, 27% (95% CI: 22–38) to the resident, and 10% (95% CI: 6–16) to the attending. In sequential F testing of nested models, interns and residents separately added explanatory power at high levels of significance (P < .001). In contrast, adding attending did not statistically improve the explanatory power of models already containing only intern identities (F = 0.79, 26 df, P = .762), only resident identities (F = 1.09, 26 df, P = .345), or both (F = 1.33, 26 df, P = .123).
Interns' and residents' perceptions of control and relative laboratory use
The survey completion rate for residents was 57% (50/87) and for interns was 52% (30/58). As shown in Table 1, 52% of residents perceived that they had “much” or “total” control over issues of resource utilization; only 6% believed they had “very little” or “no” control. Only 20% of interns perceived themselves to have “much” or “total” control. (Residents perceived greater control than did interns, P = .001 by Mann–Whitney test.) Further, interns were asked to rank the relative control of the attendings, residents, and interns. Fifty-seven percent of interns perceived that interns had the least control, and 47% of interns felt that attendings had the most control.
In the survey, we asked residents and interns, “Compared to your peers on the hospitalist service, do you think you used more or less of the following?” The results for “labs per day” and “cost of labs” are shown in Figure 2. Two-thirds of respondents felt that they were in the middle 20%. Fewer than 10% of residents and fewer than 20% if interns believed themselves to be in the 40% of their group with the highest use. No residents believed themselves to be in the highest-using quintile, and no interns believed themselves to be in the lowest-using quintile.
We conducted sensitivity analyses of our findings to determine their robustness. Teams might vary in their utilization of lab tests as a result of differences in their patient composition or in the duration of patients' stays. To test this, we replicated all of our regression analyses for numbers of labs used and daily lab costs, controlling for patient age, sex, Charlson score (a measure of comorbidity), and day of stay. Our results were unchanged. Similarly, the bootstrapped CIs demonstrate the degree to which our findings are robust to the presence of particular outliers. Further, it is conceivable that the greater number of interns might lead interns to have a greater fraction of the variance explained by chance alone. To test this, we replicated our analyses 1,000 times, each time randomly permuting the identity of the intern while holding the rest of the team structure and patient characteristics constant; there was no support for this alternative hypothesis.
Part of the expense of graduate medical education results from the heavier use of testing by clinically inexperienced trainees. Our findings demonstrate that attending faculty had relatively little residual impact on laboratory test ordering patterns—even on a hospitalist-run, teaching-intensive service at an academic medical center. Although supervision is a central component of training residents and interns to use tests wisely, in our study faculty contributed very little to the variation we observed in lab ordering. In addition, interns—who seem to have the most control over lab ordering—perceived themselves to have the least control, and in general neither interns nor residents recognized how their utilization compared with others.' These findings have implications for patient care, graduate medical education, and the evaluation of individual practitioners in an increasingly team-oriented environment.
Before explaining these findings, it is worth asking why they matter. Our study found that 4.1% of the variation in lab test use between patient-days could be explained by the composition of individual teams compared with one another. Past VA research from the early 1990s found a similar aggregate magnitude but could not subcategorize the effect to look inside the teams.17 Viewed exclusively from the perspective of explaining variation in health care utilization, our findings are modest. Patient factors appropriately account for far more of the observed variation than team composition. However, our research provides a perspective into important but difficult-to-measure dynamics and roles within teams. This perspective is quantitative, complementing past qualitative efforts.26 Lab testing is a model for the interaction between graded levels of physician supervision and clinical care more generally.
There are several possible explanations for attendings' seeming lack of impact. First, our findings address whether differences between one attending and another are reflected in differences in lab test ordering. Attendings might still have considerable influence. For example, there were obviously no teams that had no attendings, but we might wonder whether variance in test ordering attributable to those imaginary teams might be considerably higher. In other words, one contribution of attendings might be to reduce the amount of variation attributable to the team composition. Second, different attendings might behave more similarly to one another than different interns or residents. However, individual faculty members do not seem to have distinctive effects on lab test ordering. Faculty might find it difficult or unnecessary to supervise such a relatively low-risk, diffuse activity; they may feel that autonomy in lab ordering is necessary for the refinement of clinical skills, they may not understand the impact of excessive test ordering,27 or they may feel that their team's test ordering has a negligible impact on the financial state of the health system. (Indeed, others have found that teaching principles of cost-appropriate care was the least well-done feature of attending supervision in a pediatric department.28)
The clinical effects of varying lab test utilization are uncertain.29 Underuse of lab testing can lead to missed diagnoses or delayed intervention, and overuse can lead to false-positives, further testing, and unnecessary management.10,30,31 Although variation driven by patient preferences might be desired, variation driven by trainees' preferences is harder to justify.
Perhaps it is not so surprising that teaching attendings seem to have such little influence on laboratory test use. Yet our findings suggest that this notion is indeed surprising to the interns and residents themselves, who believe their lab ordering decisions are substantially determined by others. In general, their perceptions of their lab test use compared with their peers was also different from their actual use. Interns and residents therefore made two related errors: They failed to recognize how their practices differ from those of others, and they failed to recognize how much control they have over those practices.
These are errors of calibration. Correcting those errors with feedback could benefit medical training. Electronic test reporting, order entry, and medical records have not been routinely harnessed for graduate medical education, particularly in the inpatient setting. The same infrastructure used for this research could be used to teach and assess practice-based learning and improvement. Measures could be benchmarked against peer performance with similar patients.32–38 As others have shown,3,4,7,8,28,39–45 interns and residents are not getting good information about their own practice patterns in the inpatient setting.
The systematic analysis of electronic medical record systems can suggest alternative approaches to feedback for practitioners (at many levels) and managers. Apportioning the explained variance in lab test utilization can help disentangle the patterns of influence and supervision that exist in teams. For example, the same techniques could be used to examine how patterns of supervision and influence vary across institutions and across medical specialties. Our findings do not provide a simple categorization into “good” or “bad.” Instead, they yield objective information that can provide a distinctive perspective for supervisors seeking to optimize the performance of their teams.
Our research has several limitations. First, we conducted our study at a single academic medical center. Yet although our findings may not be generalizable to other settings, it is notable that this teaching hospitalist service emphasizes the presence of and supervision by attendings. Thus, it is possible that other settings would demonstrate even less influence by attendings. Second, we examined only laboratory test use and costs because they are frequent, important in aggregate, and reliably measured. Lab tests are often relatively low cost and low risk at the individual level, however. Had we examined the use of more invasive or more expensive procedures, we might have observed a greater attending effect. Third, we have a response rate to the survey typical of those in the medical literature, but incomplete46; in contrast, we do have full information on lab test ordering. Finally, we have not examined a causal link between variation in medical education practice and outcomes for current or future patients.
Nevertheless, we have introduced a new model of measuring supervision of trainees in hierarchical clinical settings. That model demonstrates that variation in laboratory test use in our setting is driven almost entirely by residents and interns, suggesting little influence by attending physicians. Although this observation may reflect in part appropriate leeway that allows residents to develop autonomy and practice experience, we have also observed that interns and residents have little understanding of their own control and relative performance. Together, these observations suggest one new approach to measuring supervision in the inpatient setting as well as missed opportunities to reduce practice variation and improve practices.
The authors wish to thank Rich Urbani for his expert programming and thank Jessica Dine for insightful comments on the manuscript.
This work was supported by a Pilot Grant from the Leonard Davis Institute of Health Economics of the University of Pennsylvania.
This study was approved by the institutional review board of the University of Pennsylvania.
1Lundberg GD. The need for an outcomes research agenda for clinical laboratory testing. JAMA. 1998;280:565–566.
2van Walraven C, Naylor CD. Do we know what inappropriate laboratory utilization is? A systematic review of laboratory clinical audits. JAMA. 1998;280:550–558.
3Billi JE, Hejna GF, Wolf FM, Shapiro LR, Stross JK. The effects of a cost-education program on hospital charges. J Gen Intern Med. 1987;2:306–311.
4Pugh JA, Frazier LM, DeLong E, Wallace AG, Ellenbogen P, Linfors E. Effect of daily charge feedback on inpatient charges and physician knowledge and behavior. Arch Intern Med. 1989;149:426–429.
5Blackstone ME, Miller RS, Hodgson AJ, Cooper SS, Blackhurst DW, Stein MA. Lowering hospital charges in the trauma intensive care unit while maintaining quality of care by increasing resident and attending physician awareness. J Trauma. 1995;39:1041–1044.
6Hampers LC, Cha S, Gutglass DJ, Krug SE, Binns HJ. The effect of price information on test-ordering behavior and patient outcomes in a pediatric emergency department. Pediatrics. 1999;103:877–882.
7Korn LM, Reichert S, Simon T, Halm EA. Improving physicians' knowledge of costs of common medications and willingness to consider costs when prescribing. J Gen Intern Med. 2003;18:31–37.
8Schroeder SA, Myers LP, McPhee SJ, et al. The failure of physician education as a cost containment strategy: Report of a prospective controlled trial at a university hospital. JAMA. 1984;252:225–230.
9Miyakis S, Karamanof G, Liontos M, Mountokalakis TD. Factors contributing to inappropriate ordering of tests in an academic medical department and the effect of an educational feedback strategy. Postgrad Med J. 2006;82:823–829.
10Neilson EG, Johnson KB, Rosenbloom ST, et al. The impact of peer management on test-ordering behavior. Ann Intern Med. 2004;141:196–204.
11Attali M, Barel Y, Somin M, et al. A cost-effective method for reducing the volume of laboratory tests in a university-associated teaching hospital. Mt Sinai J Med. 2006;73:787–794.
12Bates DW, Kuperman GJ, Rittenberg E, et al. A randomized trial of a computer-based intervention to reduce utilization of redundant laboratory tests. Am J Med. 1999;106:144–150.
13Calderon-Margalit R, Mor-Yosef S, Mayer M, Adler B, Shapira SC. An administrative intervention to improve the utilization of laboratory tests within a university hospital. Int J Qual Health Care. 2005;17:243–248.
14Hindmarsh JT, Lyon AW. Strategies to promote rational clinical chemistry test utilization. Clin Biochem. 1996;29:291–299.
15May TA, Clancy M, Critchfield J, et al. Reducing unnecessary inpatient laboratory testing in a teaching hospital. Am J Clin Pathol. 2006;126:200–206.
16Meng QH, Zhu S, Booth C, et al. Impact of the cardiac troponin testing algorithm on excessive and inappropriate troponin test requests. Am J Clin Pathol. 2006;126:195–199.
17Hayward RA, Manning WG, McMahon LF, Bernard AM. Do attending or resident physician practice styles account for variations in hospital resource use? Med Care. 1994;32:788–794.
18Sudman S, Bradburn NM, Schwarz N. Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco, Calif: Jossey Boss; 1995.
19Weisberg S. Applied Linear Regression. New York, NY: John Wiley and Sons; 1985.
20Allison PD. Fixed Effects Regression Methods for Longitudinal Data Using SAS. Cary, NC: SAS Publishing; 2005.
21Good PI. Resampling Methods. Boston, Mass: Birkhauser; 2006.
22Mooney CZ, Duval R. Bootstrapping: A Nonparametric Approach to Statistical Inference. Newbury Park, Calif: Sage Publications; 1993.
23Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. J Chronic Dis. 1987;40:373–383.
24Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol. 1992;45:613–619.
25Zhang JX, Iwashyna TJ, Christakis NA. The impact of alternative lookback periods and sources of information on Charlson comorbidity adjustment in Medicare claims. Med Care. 1999;37:1128–1139.
26Mizrahi T. Getting rid of patients: Contradictions in the socialisation of internist to the doctor–patient relationship. Sociol Health Illn. 1985;7:214–235.
27Long MJ, Cummings KM, Frisof KB. The role of perceived price in physicians' demand for diagnostic tests. Med Care. 1983;21:243–250.
28Busari JO, Weggelaar NM, Knottnerus AC, Greidanus PM, Scherpbier AJ. How medical residents perceive the quality of supervision provided by attending doctors in the clinical setting. Med Educ. 2005;39:696–703.
29Daniels M, Schroeder SA. Variation among physicians in use of laboratory tests. II. Relation to clinical productivity and outcomes of care. Med Care. 1977;15:482–487.
30Sutton S, Saidi G, Bickler G, Hunter J. Does routine screening for breast cancer raise anxiety? Results from a three wave prospective study in England. J Epidemiol Community Health. 1995;49:413–418.
31Schwartz LM, Woloshin S, Sox HC, Fischhoff B, Welch HG. US women's attitudes to false positive mammography results and detection of ductal carcinoma in situ: Cross sectional survey. BMJ. 2000;320:1635–1640.
32Jamtvedt G, Young JM, Kristoffersen DT, O'Brien MA, Oxman AD. Audit and feedback: effects on professional practice and health care outcomes. Cochrane Database of Systematic Reviews 2006, Issue 2. Art No.: CD000259. DOI: 10.1002/14651858.CD000259.pub2.
33Wigder HN, Cohan Ballis SF, Lazar L, Urgo R, Dunn BH. Successful implementation of a guideline by peer comparisons, education, and positive physician feedback. J Emerg Med. 1999;17:807–810.
34Hux JE, Melady MP, DeBoer D. Confidential prescriber feedback and education to improve antibiotic use in primary care: A controlled trial. CMAJ. 1999;161:388–392.
35Kiefe CI, Allison JJ, Williams OD, Person SD, Weaver MT, Weissman NW. Improving quality improvement using achievable benchmarks for physician feedback: A randomized controlled trial. JAMA. 2001;285:2871–2879.
36Ferreira MR, Dolan NC, Fitzgibbon ML, et al. Health care provider–directed intervention to increase colorectal cancer screening among veterans: Results of a randomized controlled trial. J Clin Oncol. 2005;23:1548–1554.
37Horbar JD, Carpenter JH, Buzas J, et al. Collaborative quality improvement to promote evidence based surfactant for preterm infants: A cluster randomised trial. BMJ. 2004;329:1004.
38Ziemer DC, Doyle JP, Barnes CS, et al. An intervention to overcome clinical inertia and improve diabetes mellitus control in a primary care setting: Improving Primary Care of African Americans with Diabetes (IPCAAD) 8. Arch Intern Med. 2006;166:507–513.
39Skipper JK, Smith G, Mulligan JL, Garg ML. Medical students' unfamiliarity with the cost of diagnostic tests. J Med Educ. 1975;50:683–684.
40Robertson WO. Costs of diagnostic tests: Estimates by health professionals. Med Care. 1980;18:556–559.
41Hershey CO, Dawson NV, McLaren CE, Siciliano CJ, Cohen DI. Resident knowledge of charges: Are we asking the right questions? Am J Med Sci. 1987;293:182–186.
42Thomas DR, Davis KM. Physician awareness of cost under prospective reimbursement systems. Med Care. 1987;25:181–184.
43Shulkin DJ. Cost estimates of diagnostic procedures. N Engl J Med. 1988;319:1291.
44Allan GM, Innes G. Family practice residents' awareness of medical care costs in British Columbia. Fam Med. 2002;34:104–109.
45Allan GM, Innes G. Do family physicians know the costs of medical care? Survey in British Columbia. Can Fam Physician. 2004;50:263–270.
46Asch DA, Jedrziewski MK, Christakis NA. Response rates to mail surveys published in medical journals. J Clin Epidemiol. 1997;50:1129–1136.