Journal Logo

The Study of Pediatric Sepsis

Physicians’ ability to diagnose sepsis in newborns and critically ill children

Fischer, Joachim E. MD MSc

Author Information
Pediatric Critical Care Medicine: May 2005 - Volume 6 - Issue 3 - p S120-S125
doi: 10.1097/01.PCC.0000161583.34305.A0
  • Free


Future studies regarding infections in premature infants or critically ill newborns and children will fall into two domains: first, randomized, controlled trials assessing the efficacy or cost-effectiveness of interventions and, second, surveillance studies aiming at elucidating risk factors, prognostic signs, or trends in outcomes and incidence rates (1, 2). Whatever objective the study pursues, the success hinges on the correct identification of infected infants (cases) and noncases. Adjudication takes place either when a decision about enrolment of a patient is made, at decision branches during the course of treatment, or, in cohort studies, when the underlying disease process must ultimately be determined.

The inherent difficulty for adjudicating the presence or absence of infection arises from the heterogeneity in presentation, symptoms, changes in laboratory and other variables, and the imperfect test accuracy of the various diagnostic procedures (3, 4). Two examples may illustrate the problem. The first example is a hypothetic test accuracy study in premature infants. Assume a new marker yields an excellent sensitivity of 95% and a specificity of 95% for the detection of systemic bacterial infection. This test is employed in a cohort study of infants with suspected infection and compared against the gold standard of clinical signs plus a positive blood culture. Because a considerable proportion of blood cultures obtained from immature infants with systemic bacterial infection remain falsely negative (5), the test accuracy study will be biased toward showing inferior test performance than the test’s true properties—unless investigators employ a case-control design by omitting ambiguous cases from the analysis (6). The latter type of research design, however, has been implicated with severely inflated test accuracy measures (7).

The second example illustrating the difficulties arising from adjudication relates to a randomized trial of septic shock. Successful interventions should be implemented rapidly after the onset of the immune cascade ultimately mounting to septic shock (8). If broad criteria for inclusion are applied, many children will be included that would not have proceeded to shock. This may bias the results toward a null difference between treatment and control, even if the treatment reduces morbidity and mortality. On the other hand, if quite strict inclusion criteria are enforced, implementation of the experimental treatment may not be initiated before closure of the window of opportunity, again biasing the observed effect toward the null.

In the following, I will provide a secondary review of two previously published studies (9, 10) discussing the implications of these studies for case adjudication. In addition, a hypothetical simulation model illustrates the potential misclassification consequences of applying adult consensus definitions in the real world of pediatric intensive care with imperfect sensitivity of blood cultures. The present review article aims to address these adjudication issues from three perspectives. The first perspective relates to the reliability of post hoc adjudication of infectious episodes. Such post hoc adjudication is the usual procedure employed in most diagnostic accuracy studies and in surveillance studies (9). The second perspective reviews data from a study on the physician’s ability to estimate disease probabilities at bedside. Although arriving at probability estimates is an integral part of everyday decision making, such likelihood assumptions become rarely enumerated (10). The third perspective illustrates the consequences of applying discussed consensus conference definitions using the example of catheter-related sepsis when considering the limited sensitivity of blood cultures. The detailed methods and results from the studies underlying perspectives 1 and 2 have been published elsewhere (9, 10).

Challenge of Diagnosing Sepsis

The theory regards decision making in critical care as a process of quantitative reasoning based on the probability that an infection is present and on the correct interpretation of data from diagnostic tests (11, 12). The clinician also has to consider the harms and benefits from treatment. Because uncertainty accompanies this decision process regarding infections, the net result is a low antibiotic treatment threshold. This leads to a high treatment rate of uninfected patients (2). It has been estimated that up to 30 otherwise healthy newborns, and up to ten critically ill infants will be treated for every patient in whom infection can ultimately be confirmed. Estimates from the United States suggest that a quarter of all healthcare expenditures in neonatal care are caused by work-up of patients with suspected infection (13). The paradigm of early diagnosis rests on the assumption that infections may be detected by laboratory variables or by combinations of subtle signs that allow advancing the diagnosis by 24 to 48 hrs. It is believed that this advancement of diagnosis opens a window of opportunity to install appropriate treatment before infection proceeds to the full clinical picture of septic shock (Fig. 1) (8). Various new diagnostic tests and scores have been suggested to facilitate early diagnosis (14–17). To date, however, no single biological infection marker has gained unanimous acceptance.

Figure 1.
Figure 1.:
Early diagnosis paradigm rests on the assumption that the increase in highly specific biomarkers sufficiently precedes the clinical changes occurring in untreated sepsis. The further assumption is that swift induction of appropriate treatment will abrogate the clinical course before the conditions worsen to septic shock.

Post Hoc Adjudication

The adjudication procedure in surveillance studies is the hindsight review of purpose-designed charts (18). This also applies to the ward-round reviewing of previous decisions to initiate or to withhold antibiotic treatment. In episodes of suspected infection in which classification remains ambiguous, senior clinicians base their decision making on the clinical signs, the history, the available laboratory data, the time course of events, and their own experience. To investigate the reliability and validity of such hindsight judgment, we conducted a prospective cohort study in the tertiary neonatal and pediatric intensive care unit of the Zurich University Children’s Hospital. In brief, the study population comprised all newborns, infants, and children admitted with medical or surgical conditions. The study population comprised few premature infants born before 32 wks of gestation because these patients are usually cared for in a separate unit in the Women’s hospital. About a third of the study patients were infants in postoperative care after cardiac surgery. In these patients, postsurgical inflammation is frequent and may occur simultaneous to nosocomial infections, rendering the differential diagnosis of clinical signs suggestive of infection particularly challenging.

To adjudicate cases of suspected infection, we invited three senior clinicians who had several years of experience in working alongside each other to adjudicate ambiguous cases of suspected infection. These experts comprised the head of the division of infectious diseases and two senior consultants on the pediatric intensive care unit. Experts were blinded to the judgment of each other. The main purpose of the study was to investigate the accuracy of novel diagnostic markers. Therefore, experts were provided with all relevant data from the patient record deemed to be relevant for case adjudication. This included the patient charts, laboratory and microbiological findings, and the physicians’ case records. All three experts were provided with the same set of instructions on adjudication of cases. Experts were asked to choose from five possible diagnoses: sepsis, probable sepsis with negative blood cultures, localized infection, viral infection, or absent infection. If case experts believed more than one condition to be present (e.g., pneumonia in a child with a systemic inflammatory response to cardiac surgery), they were instructed to select the most relevant cause of the present episode regarding treatment.

Before case review by the experts, we provided a fifth-year medical student with simple criteria for identifying patients with confirmed sepsis and those not having infection. These criteria were derived from published test accuracy studies (6, 8, 19, 20). All episodes that were considered by the fifth-year medical student to be proven sepsis or absence of infection and on which the experts arrived at the same adjudication, were deemed classifiable by junior physicians. The remaining episodes were considered to require expert judgment. Expert agreement was assessed using Kappa statistics. The Kappa statistics are a measure of agreement beyond chance, for which a Kappa of 0 is consistent with chance agreement and a Kappa of 1 indicates perfect agreement.

During a 5-month period, 183 episodes of suspected infection occurred in the 19-bed multidisciplinary, tertiary, neonatal, and pediatric intensive care unit. Antibiotics were prescribed in 167 of these episodes. Overall agreement among experts was moderate (κ = 0.54), with almost perfect agreement for episodes of proven sepsis (κ = 0.92) and agreement slightly better than chance regarding episodes of probable sepsis (κ = 0.18). However, when the 48 episodes classifiable by the a priori defined criteria were removed, experts only fairly agreed on the remaining 119 episodes (71%, κ = 0.32). Summarizing the findings, only about one third of all episodes of suspected infection, in particular, those with positive blood cultures, could unambiguously be classified according to prespecified criteria. The remaining two thirds of all episodes remained ambiguous even after data from the clinical course had become available.

What are the clinical implications? In the real world, clinicians will exchange information and attempt to arrive at a common conclusion. In the study, we blinded each clinician to the adjudication of the other. Although the three clinicians used the five available classifications with similar frequency, the data suggest that in the absence of clear-cut criteria such as positive blood cultures, decision making may rapidly become arbitrary. These findings replicated other studies supporting a limited agreement in clinical judgment on potentially ambiguous outcomes (21, 22). In the absence of a gold-standard test allowing classification of all episodes, the true cause of the episode remains elusive. Therefore, it remains unknown whether the usual decision process resembling a Delphi-process would have arrived at valid adjudications. These data show that in the presence of uncertainty, experts differ in their way to extract and weigh the relevant information. In any case, this introduces a misclassification bias, which may hamper the observed treatment efficacy or impede diagnostic accuracy. It is possible that further sophistication of the adjudication process (intense pretraining of the experts) might have increased the agreement. It should also be noted that the usual procedure in clinical studies is to proceed with a Delphi-method approach to arrive at a consensus classification. Due to the additional qualitative and quantitative data available if expert review takes place at bedside, the true clinical adjudication agreement may be higher than the level reported here. However, it is also conceivable that decisions made after a verbal report of the patient findings may result in even poorer agreement than found in this investigation. The study underscores the need to supplement definitions for research in sepsis and infection in newborns and critically ill children with categories representing the degree of certainty, such as probable, possible, or unlikely.

The next section deals with the potential accuracy of physicians’ adjudication if all information to the treatment team available at decision making may be utilized. The study reported in the next section differs with the previously presented one in several aspects: physicians were encouraged to discuss their adjudication with peers and nurses, they were asked to use any information available to them, and they were offered the option to express their adjudication as a disease probability rather than as exclusive categories.

Physicians’ Probability Estimates

Clinical decision making on the presence or absence of infection is an excellent example of the art of medicine: to arrive at decision when uncertainty prevails. The decision process always starts with an estimate about the probability that serious infection is impending. Clinicians will also have at least a vague idea about the disease probability when any awaiting of further tests is more risky than immediate initiation of antibiotic therapy. Naturally, this threshold disease probability for initiation of treatment may vary from patient to patient and may also be influenced by guidelines. Physicians also will have a perception about a very low level of disease probability, when infection is so unlikely that no further testing is warranted. Decision analysts label these three pieces of information elements the pretest probability, the treatment threshold, and the testing threshold.

According to decision theory, diagnostic tests are only warranted if the pretest probability is higher than the testing threshold and lower than the treatment threshold (12). Ordering additional tests when a decision to prescribe antibiotics has already been made does not provide any useful information, nor does screening of patients in whom infection is absent. Thus, only tests ordered from perceived uncertainty have the potential to add useful information, and only tests that move the pretest probability across either of the just mentioned thresholds provide clinically useful information (11). The mathematical formula for updating a previous probability estimate with new information is provided by Bayes’ theorem. It requires enumeration of the probability estimate before ordering the test and the likelihood ratio as an expression of the test accuracy. The resulting posttest probability is then compared with the treatment or testing threshold.

Bayes’ theorem works fine in textbook theory but has rarely been used in clinical practice with respect to diagnosing infection. The reason for this failure to introduce a presumably straightforward algorithm into clinical practice arises from four problems. First, physicians are not used to enumerating their best guess about a disease probability (23). Second, physicians do not know their treatment and testing thresholds. Third, very few test accuracy studies provide multiple-level likelihood ratios that allow imputing test results from continuous laboratory variables. Fourth and finally, often more than one test result is to be considered simultaneously (e.g., increase in immature neutrophils and elevated C-reactive protein)—the conditional likelihood ratios required for these calculations are usually not available (24). Thus, clinicians have shown a marked reluctance to embrace the decision analyst’s approach, despite ample publication publicity during the past decades.

In the study reviewed in the following section, we aimed at elucidating the first two questions: to obtain pretest probability estimates from physicians during the ward round and to delineate treatment and testing thresholds. Specifically, we undertook a multiple-center, prospective cohort study to obtain daily predictions by physicians on the possible presence of serious bacterial infection. We also assessed the diagnostic accuracy of physicians’ predictions using the standard case-control design with infected case subjects and noninfected controls—treating the probability estimate as if it was continuous laboratory variable for disease prediction.

The study included all consecutive patients who were admitted to the participating units during a 3-month period. Units were the 28-bed level III neonatal intensive care unit of the Brigham and Women’s Hospital, Boston, and the 19-bed level III pediatric intensive care unit of the Children’s Hospital in Zurich. The patient population of both units overlapped for premature infants and critically ill term newborns. Except for excluding patients receiving extracorporal membrane oxygenation, the patient population comprised the entire spectrum of neonatal and pediatric critical illness.

To obtain the probability estimates, three trained research fellows asked the clinicians responsible for the care of the patient (15 fellows, 12 attending intensive care physicians) to provide an estimate on the presence of serious untreated infection at every ward round. A subsequent study in the Zurich unit revealed a rapidly plummeting compliance if the research fellows did not personally chase for the estimates, but the process was left to a paper-based procedure.

Of particular interest were the predictions at initiation of antibiotics. We asked to quantify the probability of a serious untreated bacterial infection. The next morning, we requested physicians to provide an updated estimate, considering all information that became available since the initiation of antibiotics, including possible early results from blood cultures. We also asked physicians to predict whether blood cultures during sepsis workup would become positive—this served as an external validity criterion (10).

In the light of the previous section of this article, one wonders how we dealt with the case adjudication. As explicated in the article reporting the original data, one of the investigators and a senior clinician of the unit not involved in the decision making at initiation of antibiotics carried out the adjudication process. They resolved ambiguities by a Delphi-type consensus method. In some instances, we could not find a senior clinician sufficiently familiar with the case and at the same time not having been involved in the decision making at sepsis workup. The possible bias introduced by this is believed to be less severe than the bias introduced by a case-control design (7). For that reason, we used several definitions of cases in the original report: 1) only those patients with positive blood cultures (proven sepsis) and 2) proven sepsis plus patients with negative blood cultures but a high degree of clinical probability (probable sepsis).

The median predicted probability at inception of antibiotic therapy was 20%. The predictions’ ability to discriminate between patients who were later deemed to have culture-proven systemic bacterial infection (cases) and episodes classified as no infection revealed an area under the receiver operating characteristic curve of 0.88 (95% confidence interval [CI], 0.81–0.94), with a good calibration of the model (Hosmer-Lemeshow chi-square, p = .63). Choosing a predicted probability of 25% as the cut-off, the sensitivity amounted to 0.87 (95% CI, 0.65–0.97) and the specificity to 0.83 (95% CI, 0.73–0.90). The corresponding positive likelihood ratio was 5.1 (95% CI, 3.5–7.9) and the negative likelihood ratio was 0.16 (95% CI, 0.11–0.2). Sensitivity analysis using broadened definitions for cases yielded similar results. As expected, the further information becoming available between initiation of antibiotic therapy and the next morning also resulted in a slightly increased discriminative ability of the clinical judgment (area under the receiver operating characteristic curve, 0.91; 95% CI, 0.84–0.96). Underscoring the urgent need to identify new markers for infection that facilitate an early diagnosis, the accuracy of predictions provided 24 hrs before initiation of antibiotics was indifferent from a chance result (area under the receiver operating characteristic curve, 0.49)—at a time when infection was probably already incubating. Regarding the external validity criterion of prediction of positive blood cultures, physicians again showed a reasonable discrimination (model adjusted for age and unit: area under the receiver operating characteristic curve, 0.77; 95% CI, 0.70–0.83) with good calibration (Hosmer-Lemeshow chi-square, p = .28).

What are the lessons to be learned from this study? Experienced physicians perform remarkably well when asked about disease probabilities at bedside and with all accessible information, including the personal perception not documented in the patient charts being available (Fig. 2). Apparently, there is no single threshold for initiation of antibiotic treatment; rather, there is a nonlinear relationship between estimated disease probability and the proportion of patients receiving antibiotics. This finding underscores the already mentioned problems related to introducing Bayes’ theorem into daily routine. However, for research purposes and post hoc adjudication of ambiguous episodes, researchers may want to supplement the recorded data by probability estimates either at randomization of patients or when sepsis workups are obtained during test accuracy studies. Whether such assessment in Likert-scale format (e.g., proven, probable, likely, unlikely, absent) or an estimate expressed as percentages provides superior reliability remains to be elucidated.

Figure 2.
Figure 2.:
Much information used by experienced physicians to adjudicate cases arises from data usually not documented in patient records (e.g., the qualitative information about looking pale, the rapidity of clinical deterioration). Other information is not recorded but could be obtained when looking for it (e.g., careful taking of the history, clinical examination). Hence, when decisions must be made, more information is potentially accessible than is found in the patient records. Post hoc adjudication hinges on the data recorded in the patient documentation. Adjudication into mutually exclusive categories introduces further misclassification error compared with a probability estimate provided using a graded scale.

Possible Misclassifications Resulting from Using the Gold Standard

Many attempts have been made to arrive at practical gold-standard definitions (25). In the adult literature, an agreement to proving the presence of catheter-related bacterial infection has been achieved (26). This consensus in the literature utilizes the technique of quantitative blood cultures. In cases of suspected catheter-related sepsis, it is recommended to obtain one culture through the existing catheter and one culture from a peripheral vein. If both blood cultures are positive and the culture obtained through the catheter becomes positive much faster, then and only then is catheter-related infection proven. This logic and the resulting diagnostic categories can be summarized in a straightforward 2 × 2 table (Fig. 3, top panel). It should be noted that this figure does not display the usual 2 × 2 table employed in test accuracy studies but serves to explore the various combinations of results from the two cultures. For example, if the peripheral culture yields a positive result and the culture obtained through the catheter is negative (lower left quadrant), the definitions suggest to diagnose the case as bloodstream infection of unknown origin. These definitions rest on the assumptions of near perfect sensitivity of blood cultures or, in other words, a negligible proportion of false-negative blood cultures. Because of this condition, the definitions work well for adult patients, in whom obtaining sufficiently large blood volumes (10 mL per bottle) to minimize the risk of false-negative blood cultures is not an issue.

Figure 3.
Figure 3.:
Possible consequences of the limited sensitivity of blood cultures in children. The top panel displays the suggested definitions for catheter-related sepsis. These definitions imply that blood stream infection (BSI) due to catheter infections requires a positive (Pos) peripheral culture and a positive culture from the catheter. The middle and bottom panels illustrate the consequences of the realistic assumptions that a certain proportion of cultures may remain falsely negative (Neg), even in patients with true catheter-related sepsis. The middle and bottom panels serve to illustrate the multiplying effect if two criteria with imperfect sensitivity are used for case adjudication. In both middle and bottom panels, it is assumed that 100 premature infants or children have true catheter-related sepsis and that either 1 mL of blood or 3 mL of blood are drawn per bottle. Assuming a higher yield in the culture drawn from the catheter than from the periphery (50% vs. 30% for the 1-mL blood draw), the two panels show the resulting erroneous classifications. Note that this figure does not display the usual 2 × 2 table used in test accuracy studies, in which the columns would represent cases (left column) vs. controls and the rows positive vs. negative tests. SIRS, systemic inflammatory response syndrome; Cath C, catheter culture.

Unfortunately, the same definitions yield serious misclassification risks when used in newborns or critically ill children, from whom it is no longer disputed that a sizable proportion of blood cultures will fail to grow organisms, despite the presence of blood stream infection. In these patients, considerably smaller volumes of blood are collected for blood cultures. The positive blood culture does no longer satisfy the criterion of an appropriate gold standard because blood cultures yield few false-positive and a considerable proportion of false-negative results (5). The middle panel of Figure 3 illustrates the effect of assuming a sensitivity of 30% for the peripheral blood culture and of 50% for the catheter culture when obtaining 1 mL of blood. Assuming that all patients have true catheter-related infection, the classification scheme would only correctly identify 15% of the infants. Disturbingly, about a third of patients with true catheter-related sepsis would be misclassified as having catheter colonization or catheter contamination because the peripheral culture remains falsely negative. Most clinicians will treat a contaminated catheter with a different regimen than catheter-related sepsis.

Even if 3 mL of blood are collected per bottle (Fig. 3, bottom panel), assuming an increase in the sensitivity to 70% and 80%, respectively, the correct classification only increases to about 56%. Because these examples assumed 100 patients with true catheter-related sepsis, the problem of false-positive cultures (specificity) was not considered. Therefore, in critically ill newborn or pediatric patients, the double-culture classification scheme may correctly classify only about half of the patients. Under any premise, about a quarter of all classifications will be potentially dangerous. Thus, researchers and clinicians should be particularly cautious when combining two criteria with imperfect sensitivity for case adjudication because this may aggravate the threat of possible misclassifications.


This article reviewed some of the many sources of possible misclassification in therapeutic and surveillance studies. In the absence of a true gold standard, the final post hoc adjudication of ambiguous episodes of infection remains a daunting task for both clinicians and researchers alike. Post hoc adjudication from case records is likely to fuel ongoing debates about the true status of the patient. Probability estimates incorporating all available information perform as well as the best currently available diagnostic markers. Researchers should include probability estimates in future trials at enrolment of patients. Serious consequences may arise from imperfect gold standards with limited sensitivity. The worst-case scenario presented for the workup of suspected catheter-related infection may lead to possibly dangerous therapeutic decisions. In the best case, misclassification will bias test diagnostic studies toward reporting lower test accuracy and therapeutic studies toward less treatment benefit. Power calculations for future trials must bear the effects of such misclassification biases in mind, as must clinicians treating patients. This may, as illustrated in the example for diagnostic workup of catheter-related sepsis in newborns, lead to contradictory findings from the definitions and from clinical judgment. If consensus definitions shall become helpful in clinical practice, they must incorporate this divergence by providing categories describing the degree of certainty about the diagnosis as definite, probable, possible, and absent.


1. Strait RT, Kelly KJ, Kurup VP: Tumor necrosis factor-alpha, interleukin-1 beta, and interleukin-6 levels in febrile, young children with and without occult bacteremia. Pediatrics 1999;104:1321–1326
2. Franz AR, Steinbach G, Kron M, et al: Reduction of unnecessary antibiotic therapy in newborn infants using interleukin-8 and C-reactive protein as markers of bacterial infections. Pediatrics 1999;104:447–453
3. Brun-Buisson C: The epidemiology of the systemic inflammatory response. Intensive Care Med 2000;26 (Suppl 1): S64–S74
4. Benitz WE, Gould JB, Druzin ML: Risk factors for early-onset group B streptococcal sepsis: Estimation of odds ratios by critical literature review. Pediatrics 1999;103:e77
5. Schelonka RL, Chai MK, Yoder BA, et al: Volume of blood required to detect common neonatal pathogens. J Pediatr 1996;129:275–278
6. Kuster H, Weiss M, Willeitner AE, et al: Interleukin-1 receptor antagonist and interleukin-6 for early diagnosis of neonatal sepsis 2 days before clinical manifestation. Lancet 1998;352:1271–1277
7. Lijmer JG, Mol BW, Heisterkamp S, et al: Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061–1066
8. Balk RA: Severe sepsis and septic shock: Definitions, epidemiology, and clinical manifestations. Crit Care Clin 2000;16:179–192
9. Fischer JE, Seifarth FG, Baenziger O, et al: Hindsight judgement on ambiguous episodes of suspected infection in critically ill children: Poor consensus amongst experts? Eur J Pediatr 2003;162:840–843
10. Fischer JE, Harbarth S, Agthe AG, et al: Quantifying uncertainty: Physicians’ estimates of infection in critically ill neonates and children. Clin Infect Dis 2004;38:1383–1390
11. Pauker SG, Kopelman RI: Interpreting hoofbeats: Can Bayes help clear the haze? N Engl J Med 1992;327:1009–1013
12. Pauker SG, Kassirer JP: The threshold approach to clinical decision making. N Engl J Med 1980;302:1109–1117
13. Escobar GJ: The neonatal “sepsis work-up”: Personal reflections on the development of an evidence-based approach toward newborn infections in a managed care organization. Pediatrics 1999;103:360–373
14. Castellanos-Ortega A, Delgado-Rodriguez M: Comparison of the performance of two general and three specific scoring systems for meningococcal septic shock in children. Crit Care Med 2000;28:2967–2973
15. Gendrel D, Raymond J, Coste J, et al: Comparison of procalcitonin with C-reactive protein, interleukin 6 and interferon-alpha for differentiation of bacterial vs. viral infections. Pediatr Infect Dis J 1999;18:875–881
16. Giamarellos-Bourboulis EJ, Mega A, Grecka P, et al: Procalcitonin: A marker to clearly differentiate systemic inflammatory response syndrome and sepsis in the critically ill patient? Intensive Care Med 2002;28:1351–1356
17. Isaacman DJ, Shults J, Gross TK, et al: Predictors of bacteremia in febrile children 3 to 36 months of age. Pediatrics 2000;106:977–982
18. Cook DJ, Walter SD, Cook RJ, et al: Incidence of and risk factors for ventilator-associated pneumonia in critically ill patients. Ann Intern Med 1998;129:433–440
19. Chiesa C, Panero A, Rossi N, et al: Reliability of procalcitonin concentrations for the diagnosis of sepsis in critically ill neonates. Clin Infect Dis 1998;26:664–672
20. Stoll BJ, Gordon T, Korones SB, et al: Early-onset sepsis in very low birth weight neonates: A report from the National Institute of Child Health and Human Development Neonatal Research Network. J Pediatr 1996;129:72–80
21. Clinical disagreement: II. How to avoid it and how to learn from one’s mistakes. Can Med Assoc J 1980;123:613–617
22. Walter SD, Cook DJ, Guyatt GH, et al: Outcome assessment for clinical trials: How many adjudicators do we need? Canadian Lung Oncology Group. Control Clin Trials 1997;18:27–42
23. Phelps MA, Levitt MA: Pretest probability estimates: A pitfall to the clinical utility of evidence-based medicine? Acad Emerg Med 2004;11:692–694
24. Fischer JE, Bachmann LM, Jaeschke R: A readers’ guide to the interpretation of diagnostic test properties: Clinical example of sepsis. Intensive Care Med 2003;29:1043–1051
25. Goldstein B, Giroir B, Randolph A: International pediatric sepsis consensus conference: Definitions for sepsis and organ dysfunction in pediatrics. Pediatr Crit Care Med 2005;6:2–8
26. Raad I, Hanna HA, Alakech B, et al: Differential time to positivity: A useful method for diagnosing catheter-related bloodstream infections. Ann Intern Med 2004;140:18–25

sepsis; diagnosis; test accuracy; sensitivity; prediction; outcome adjudication

©2005The Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies