A Modification of the Injury Severity Score That Both Improves Accuracy and Simplifies Scoring : Journal of Trauma and Acute Care Surgery

Secondary Logo

Journal Logo

Article

A Modification of the Injury Severity Score That Both Improves Accuracy and Simplifies Scoring

Osler, Turner MD; Baker, Susan P. MPH; Long, William MD

Author Information
The Journal of Trauma: Injury, Infection, and Critical Care 43(6):p 922-926, December 1997.
  • Free

Abstract

Objectives 

The Injury Severity Score (ISS) has served as the standard summary measure of anatomic injury for more than 20 years. Nevertheless, the ISS has an idiosyncrasy that both impairs its predictive power and complicates its calculation. We present here a simple modification of the ISS called the New Injury Severity Score (NISS), which significantly outperforms the venerable but dated ISS as a predictor of mortality.

Design 

Retrospective calculation of NISS and comparison of NISS with prospectively calculated ISS.

Materials and Methods 

The NISS is defined as the sum of the squares of the Abbreviated Injury Scale scores of each of a patient's three most severe Abbreviated Injury Scale injuries regardless of the body region in which they occur. NISS values were calculated for every patient in two large independent data sets: 3,136 patients treated during a 4-year period at the American College of Surgeons' Level I trauma center in Albuquerque, New Mexico, and 3,449 patients treated during a 4-year period at the American College of Surgeons' Level I trauma center at the Emanuel Hospital in Portland, Oregon. The power of NISS to predict mortality was then compared with previously calculated ISS values for the same patients in each of the two data sets.

Measurements and Main Results 

We find that NISS is not only simple to calculate but more predictive of survival as well (Albuquerque: receiver operating characteristic (ROC) ISS = 0.869, ROC NISS = 0.896, p < 0.001; Portland: ROC ISS = 0.896, ROC NISS = 0.907, p < 0.004). Moreover, NISS provides a better fit throughout its entire range of prediction (Hosmer Lemeshow statistic for Albuquerque ISS = 29.12, NISS = 8.88; Hosmer Lemeshow statistic for Portland ISS = 83.48, NISS = 19.86).

Conclusion 

NISS should replace ISS as the standard summary measure of human trauma.

Key Words: Mortality, Prediction of survival, Injury Severity Score, New Injury Severity Score, NISS, ROC analysis.

From the dawn of human record-keeping, humans have shown a penchant for prognostication in trauma. The great Egyptian physician and architect Imhotep published a listing of injuries and illnesses in 2000 BC, a listing that notably included an expected outcome for each injury. [1] One thousand years later, Homer carefully cataloged 147 injuries among his legendary combatants and noted an overall mortality of 77%, a mortality that climbed to 100% for heroes with head injuries. [2]

The modern scientific era of injury measurement began in 1952, when De Haven proposed a rudimentary classification of human injury to facilitate his study of light plane crashes. [3] In 1971, the Committee on Automotive Safety published the Abbreviated Injury Scale (AIS). [4] This listing of 73 different injuries referred only to blunt trauma and made no attempt to provide a comprehensive listing of all possible injuries. It did assign a severity to each injury, however, on a scale from 1 (minor injury) to 6 (fatal within 24 hours). Although the AIS provided a rudimentary dictionary of possible injuries, it failed to provide a mechanism to summarize a single patient's multiple injuries into a single score. This step was take by Baker and colleagues in 1974 with the creation of the Injury Severity Score (ISS), [5] and this score has served as the standard summary measure of human trauma for more than 20 years.

The ISS was defined as the sum of the squares of the single highest AIS score in each of the three most severely injured body regions. This definition was arrived at by a combination of intuition, experimentation, and opportunity. Of particular note, the use of only the single most severe AIS injury per body region was driven by the form for the study, which recorded only one injury for each body region. Despite its limitations, the ISS has proved very robust in use in trauma centers around the world. We believe that the ubiquitous nature of the ISS makes any improvement in its predictive power very worthwhile.

The ISS has an idiosyncrasy that both diminishes its predictive power and complicates its calculation, however. As noted above, the ISS considers at most one injury per body region. In patients with multiple injuries confined to a single body region, the ISS thus considers only one of the injuries within that region. Moreover, in patients with injuries in several body regions, the ISS is often constrained to consider a second, perhaps less severe injury in a second body region rather than a second, more severe injury in the first body region. In effect, the ISS ignores all but the worst injury per body region, and often fails to consider more severe injuries in favor of less severe injuries that happen to occur in other body regions. The original intent of the ISS algorithm to consider the body as a whole is thus found to be in conflict with the more fundamental principle that more severe injuries should be considered over less severe injuries.

We therefore tested a simple modification of the ISS, a score that we call the New Injury Severity Score (NISS). The NISS is the sum of the squares of the AIS scores of a patient's three most severe injuries, regardless of body region.

MATERIALS AND METHODS

The American College of Surgeons' Level I trauma center in Albuquerque, New Mexico, has maintained a trauma registry since 1991. Included in this registry is a complete listing of each patient's injuries in the AIS lexicon (1990 revision). This injury data base is meticulously maintained by an Association for the Advancement of Automotive Medicinetrained designated trauma registrar, with the mandatory participation of admitting resident physicians and operating attending surgeons, frequent chart reviews, and weekly service conferences during which all injuries in every patient are reviewed and if necessary corrected. This registry contains data on 3,136 patients admitted during the 4 years from 1991 through 1994. Patients were predominantly young (92% younger than 56 years) and bluntly injured (75%). Ninety-one percent survived to hospital discharge. This data set provided a "natural laboratory" with which to test the performance of ISS and NISS. ISS was computed according to Baker et al. [5] NISS was computed as the simple sum of the squares of the three most severe AIS (1990 revision) injuries regardless of body region.

A second data set consisting of 3,449 patients admitted to the Emanuel Hospital Trauma Service in Portland, Oregon, was also subjected to ISS and NISS scoring. This data set was also maintained by an Association for the Advancement of Automotive Medicine-trained designated trauma registrar. Patients in the Portland data set were also predominantly young (94% younger than 56 years) and bluntly injured (87%). The survival rate in the Portland data set was 93%.

Comparisons between ISS and NISS included misclassification rates ((false positives + false negatives)/total cases), receiver operating characteristic (ROC) curve analysis, [6] and Hosmer Lemeshow goodness of fit statistics, [7] all calculated by SAS statistical software (version 6.10) Proc Logistic. [8] The ordinal measures ISS and NISS were converted into binary outcome predictions by simply selecting the value that minimized misclassifications (ISS cut points: Albuquerque = 44, Portland = 46.5; NISS cut points: Albuquerque = 54, Portland = 55.5). Misclassifications were then defined as the sum of survivors with scores above the cut point and nonsurvivors with scores below the cut point. The ROC statistic is a general measure of the power of a test to separate two mutually exclusive subpopulations. It is defined as the area under a graph of sensitivity x 1 minus specificity. A ROC value of 1 corresponds to a test that perfectly separates two subpopulations, whereas a ROC value of 0.5 corresponds to a perfectly useless test that performs no better than chance. The Hosmer Lemeshow statistic is a measure of how well calibrated a model is. It is calculated by dividing the data into deciles by NISS and then comparing the predicted number of nonsurvivors with the actual number of nonsurvivors in each decile. The result is evaluated by a chi squared test. A high (>0.05) p value implies that there is no reason to believe that a model is not well calibrated, i.e., that the model is a good one. p values and 95% confidence intervals for misclassification rates and ROC statistics were calculated using a paired resampling approach [9] implemented in Paradox Application Language (Borland International, Scotts Valley, Calif) on a desktop IBM-compatible personal computer.

RESULTS

The majority of patients (59% of the Albuquerque data set and 60% of the Portland data set) had NISS values that were different from their ISS values. These NISS values were uniformly higher than the corresponding ISS values. This result is expected, because the ISS picks and chooses among a given patient's AIS injuries based on body region, whereas the NISS simply considers a patient's three most severe AIS injuries regardless of body region. The median ISS and NISS values for the Albuquerque data set were 11 and 17, respectively. The median ISS and NISS values for the Portland data set were 9 and 11, respectively, indicating that injuries in Portland were somewhat less severe on average.

When we examine the data set graphically (Albuquerque, Figure 1; Portland, Figure 2) we see that NISS better separates survivors from nonsurvivors. This impression is confirmed by the doubling of the separation of the median values for survivors and fatalities by NISS over ISS in both data sets. (Albuquerque ISS median fatalities - ISS median survivors = 19; Albuquerque NISS median fatalities - NISS median survivors = 36; Portland ISS median fatalities - median survivors = 18; Portland NISS median fatalities - median survivors = 34) (Figure 1 and Figure 2).

F1-9
Figure 1:
Albuquerque data set. Frequency distributions for survivors and nonsurvivors as coded by ISS (A) and NISS (B). Arrows indicate median values, which are twice as widely separated by NISS. Solid curve and arrow = survivors; dashed line and arrow = nonsurvivors.
F2-9
Figure 2:
Portland data set. Frequency distributions for survivors and nonsurvivors as coded by ISS (A) and NISS (B). Arrows indicate median values, which are twice as widely separated by NISS. Solid curve and arrow = survivors; dashed line and arrow = nonsurvivors.

A formal statistical analysis confirms the superior predictive power of NISS over ISS. Virtually every measure examined was statistically significantly better for NISS than for ISS: misclassification rates, ROC curve areas, and Hosmer Lemeshow statistics (Table 1). Only the misclassification rate in the Portland data set is not statistically significantly improved under NISS.

T1-9
Table 1:
Formal statistical comparison of ISS and NISS in two data sets

DISCUSSION

Although the ISS has served as the standard summary measure of human trauma for more than two decades, its division of the human body into regions seems unnatural and now appears to be unnecessary. The use of only a single injury per body region in the calculation of the original ISS [5] was simply the result of the design of Baker's original study and has never been tested or validated. Although the ISS's ability to consider as many as three different injuries in its final outcome score represented a considerable advance over the earlier practice of summarizing a patient's injuries based on the single worst injury (maximum AIS), today's modern trauma data bases routinely record all of the injuries that a patient sustains. It seemed likely to us that a more modern summary measure of trauma that could take advantage of this richer description of patients' injuries would more accurately predict outcome than the original ISS.

Two problems follow from the dependence of the ISS on body regions. First, the ISS often leaves some injuries out of the scoring process altogether, such as when a patient sustains multiple injuries to a single body region, in which case only the single worst injury contributes to the ISS. A second, related problem is that the ISS often ignores some more severe injuries in one body region in favor of less severe injuries to some other body region or regions, such as when multiple body regions are injured. NISS, by contrast, simply considers the three most severe injuries that a patient has sustained and thus avoids both of these shortcomings of the traditional ISS.

An example may make the differences between ISS and NISS scoring more clear. Suppose a patient involved in a motor vehicle crash sustains a steering wheel compression injury to the abdomen. At laparotomy, a small bowel perforation (AIS score = 3) is first discovered. The ISS is now 9, as is the NISS. Next, a moderate liver laceration is discovered (AIS score = 3). The ISS remains 9, but the NISS increases to 18. Next, a moderate pancreatic laceration with duct involvement is encountered (AIS score = 3). The ISS still remains 9, whereas the NISS increases again to 27. A bladder perforation is next discovered (AIS score = 4). The ISS now increases to 16, whereas the NISS continues its climb to 34. Next, a bimalleolar fibular fracture (AIS score = 2) is discovered. The ISS increases to 20, but the NISS remains unchanged at 34. The NISS thus behaves in a way that is more consistent with a trauma surgeon's instincts than does the ISS: as injuries increase in number, death becomes more likely, even if these injuries are accumulating in a single body region. Furthermore, adding a trivial injury (fibular fracture) to a different body region should not significantly affect the likelihood of death.

The price for the traditional ISS ignoring injuries or substituting less severe injuries in calculating its final outcome measure is, not surprisingly, a loss of predictive power. Additionally, this loss of predictive power is accompanied by substantially increased scoring complexity; not only must every injury be assigned to a body region before scoring, but these six scoring regions do not correspond to the nine anatomic body regions of the AIS lexicon. This complexity increases the likelihood of scoring errors and hinders "on the fly" mental estimation of ISS.

The NISS prediction of mortality is based solely on the anatomic information specified by a patient's AIS injury descriptors. Its predictive accuracy can be increased by the addition of other types of information to the scoring process, such as patient reserve (usually specified by the surrogate of patient age) and physiologic derangement (usually specified as the Revised Trauma Score). [10] We have chosen to keep these three types of information separate, but they can be easily combined using a variety of statistical techniques, such as logistic regression or tree analysis, should a single predictor be called for. Other outcome prediction approaches (American College of Surgeons' Committee on Trauma (ASCOT)) [11] combine anatomic and physiologic data at the outset, but we believe that this is an error. Not only is the contribution of injury per se disguised, but the calculation of ASCOT is itself so complicated that a computer is required. We believe that part of the value of an injury summary score is that it can be calculated by clinicians. The popularity of ISS has stemmed in some measure from its ease of computation, relying as it does on the information contained in the AIS severity descriptors rather than on complex computation. NISS inherits and extends this advantage, relying as it does on the AIS severities for each injury, but simplifies the actual calculation by eliminating the need to consider body region. This is a retrospective, nonconcurrent cohort study that compares NISS with ISS values calculated at the time of discharge. A concurrent cohort study would presumably yield identical results, but would be of interest to further verify our results.

Although the ISS has seen stalwart service as the de facto standard of trauma scoring, it was developed 20 years ago in an environment very different from the information-rich, computer-dominated world of today. NISS is better suited to take advantage of the richer, more complete injury descriptions now available in trauma systems. Because NISS is simpler to calculate and better predicts outcome, it should replace ISS.

CONCLUSION

NISS better predicts survival and is easier to calculate than ISS. This difference is highly statistically significant and practically important, because NISS better separates survivors from nonsurvivors. We recommend that NISS replace ISS as the standard summary measure of human trauma.

Acknowledgments

The authors thank the following reviewers who read the manuscript in draft form and improved it greatly with their comments and observations: Edward Bedrick, Guohua Li, Ellen MacKenzie, and Brian O'Neill.

DISCUSSION

Dr. Carl A. Soderstrom (Baltimore, Maryland): Dr. Reath, Dr. Poole, members, and guests. In the beginning was AIS, which begat maximum AIS, which begat ISS. And their creators found it was good, but not perfect. Then there was TS (Trauma Score), RTS (Revised Trauma Score), TRISS (Trauma and Injury Severity Score), PATI (Penetrating Abdominal Trauma Index), ASCOT, and very recently a hierarchical network model using ICD-9 (International Classification of Diseases, Ninth Revision) codes. Now we add NISS to the alphabet soup.

ISS is an old friend, a criterion standard that has been with us for more than 20 years. This methodology, which allowed us to begin to compare apples to apples, has three obvious inherent pitfalls. First, it was designed for blunt trauma. Despite attempts to improve the penetrating trauma scoring, many feel that it is still limited in this regard. Second, it does not take into account physiologic variables, hence TRISS. Third, as has been well articulated today, the ISS methodology takes into account only one injury per body region; hence, the patient's overall anatomic injury severity is often underestimated.

The authors present a simple yet elegant modification of the ISS to overcome the third pitfall. By taking into account the patient's three most severe injuries, they were able to better discriminate patients who will die from those who will survive. This is demonstrated by almost a doubling of median scores among survivors and nonsurvivors in each group and 12-point increases in ROC-generated cut points that predict survival. The ISS is designed to evaluate live-or-die outcome from blunt trauma patients.

The authors have a combined data base of 6,585 patients, including more than 1,200 patients who were 19% injured as a result of penetrating trauma. A much larger cohort, such as that of the Major Trauma Outcome Study, with almost 200,000 subjects, or even larger ones employed in other assessments, is needed to investigate the study question. Using two, T-W-O and T-O-O, small data bases, one with 3,000 blunt trauma victims and the other with fewer than 2,400, can make analysis dependent on individual cells with small numerators and denominators, resulting in obvious consequences.

Questions: Did you analyze your data bases only for blunt trauma? If so, what were the results? If not, why not? Do you have any plans to tap into larger data sets to verify these preliminary findings?

There were almost twice as many penetrating trauma victims in the Albuquerque group than in the Portland group. In the Albuquerque group, there was a significantly higher misclassification rate using the ISS compared with the NISS; however, there was no significant difference in ISS and NISS misclassification in the Portland group with its greater percentage of blunt trauma victims. Furthermore, the Hosmer Lemeshow statistics for each data set conflict with this misclassification trend and therefore are counterintuitive.

Questions: How do you explain these findings? Do they suggest that the new methodology is not much of an improvement over the older one? Abbreviated injury severity scoring and NISS calculation, which is needed for calculation of TRISS, generally required trauma center registrars.

Thus, if adopted, NISS would be limited to trauma center settings unless the MacKenzie ICD-9-to-AIS conversion is used, which is recommended only for very large sample sizes. Hence, studying all hospitalized trauma patients, the vast majority of whom are not treated in trauma centers, is not possible unless the MacKenzie conversion is used.

As noted, ISS and NISS do not take into account physiologic variables. Furthermore, it is by no means certain that a quadratic equation is the best way to take into account the effects of multiple injuries.

Questions: Considering these observations, why should we embrace the NISS? Shouldn't we focus our efforts elsewhere?

In any case, six days haven't passed, it is not time to rest. My apologies to King James. I congratulate the authors on a much-needed research effort, and I thank the Association for the privilege of the floor.

Dr. Turner Osler (closing): Thank you for your constructive suggestions and criticisms.

We too noticed that we had more penetrating trauma in Albuquerque and that the NISS seemed to offer greater improvement in lowering misclassification. And that is not surprising, because with penetrating trauma, injuries tend to be clustered. And the whole idea behind this is that you allow multiple injuries in a single body region to contribute to the final score.

So it is not surprising that it works better in the penetrating arena. NISS, however, also seems to work better in the blunt arena. We did separate out the blunt trauma, and the improvements are about half what they are for penetrating trauma, which seems, intuitively, just about right.

I agree that there is a problem in requiring AIS scoring for every trauma patient in the country because not every hospital has the zeal or the financial resources to hire a trauma nurse coordinator to look after a trauma registry. They are extremely expensive, which is why we also suggested, in our previous paper, the business of simply using the ICD-9 data from the hospital information system.

I do not think that there is ever going to be a best scoring system, or at least there will not be a final scoring system. But there is no question that the Injury Severity Score can easily be improved upon by simply calculating it based on the three worst injuries regardless of body region.

I think that as long as we persist in using the Injury Severity Score, we should move right along to the New Injury Severity Score and obtain those benefits right away. Thank you.

REFERENCES

1. Majno G. The Healing Hand: Man and Wound in the Ancient World. Cambridge, Mass: Harvard University Press; 1975:69-105.
2. Dunbar H. The medicine land surgery of Homer. BMJ. 1890;1:48-51.
3. De Haven H. The Site, Frequency, and Dangerousness of Injury Sustained by 800 Survivors of Light Plane Accidents. New York, NY: Crash Injury Research, Department of Public Health and Prevention Medicine, Cornell University Medical College; July 1952.
4. Committee on Medical Aspects of Automotive Safety. Rating the severity of tissue damage. JAMA. 1971;215:277-286.
5. Baker SP, O'Neill B, Haddon W, et al. The Injury Severity Score: a method for describing patients with multiple injuries and evaluation of emergency care. J Trauma. 1974;14:187-196.
6. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;29:143-149.
7. Hosmer DW, Lemeshow S. Applied Logistic Regression. New York, NY: John Wiley & Sons; 1989:140-145.
8. SAS Institute Inc. Logistic Regression Examples Using the SAS System. Version 6. Cary, NC: SAS Institute; 1995:1-73.
9. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York, NY: Chapman & Hall; 1993.
10. Champion HR, Sacco WJ, Copes WS, Gann DS, Gennarelli TA, Flanagan ME. A revision of the Trauma Score. J Trauma. 1989;29:623-629.
11. Champion HR, Copes WS, Sacco WJ, et al. A new characterization of injury severity. J Trauma. 1990;30:539-546.
© Williams & Wilkins 1997. All Rights Reserved.