In vitro contracture tests (IVCT) have been used for the diagnosis of malignant hyperthermia (MH) susceptibility for more than 20 years. They were developed after the separate discoveries that cut muscle fascicles from survivors of MH reactions were more sensitive to the contracture-inducing effects of caffeine [1] and halothane [2] . Since then, they have been safely used to form the basis upon which advice is given to anesthetists about the clinical management of patients with increased risk of susceptibility to MH, either through a family history of MH or through a reaction to a previous anesthetic.
The basis for the current protocol of the European Malignant Hyperthermia Group was agreed to and published in 1984 [3] . It is important to emphasize that this protocol and the thresholds used for diagnostic classification were not arbitrarily set but were based on consensus views of up to 12 years' study of proband cases and their families. Although some published [4] and some unpublished modifications have been made to this protocol (the complete protocol can be obtained from the Group's secretary1 ), its principles remain unchanged. Diagnosis of MH susceptibility is defined according to a threshold concentration of halothane or caffeine that induces an increase of >or=to0.2 g in resting tension of excised muscle bundles. Patients are classified as MH susceptible (MHS), normal (MHN), or equivocal (MHE) depending on whether responses to the halothane and caffeine tests are both abnormal, both normal, or one only abnormal, respectively. Due to the potentially lethal nature of MH and the subsequent necessity to avoid false negative diagnoses, the threshold levels for IVCT have been set so as not to classify as normal any individuals whose response would place them in the range of responses that constitutes the overlap between the true-normal and true-susceptible population responses. It is therefore envisaged that some MHE, and possibly some MHS, diagnoses represent false positives.
(1 ) Dr Helle Ording, Department of Anaesthesia, Herlev University Hospital, DK-2730 Herlev, Denmark.
Although this approach is satisfactory for clinical diagnostic purposes, the complexity of research into the genetics of MH [5] has reached a stage at which accurate estimates of the true likelihood of MH susceptibility of individuals within a family are needed if progress is to be made in determining the underlying causative genetic abnormality in that particular family. There were hopes that new contracture tests using ryanodine [6] would prove more specific than the halothane and the caffeine tests, but, again, there appears to be a degree of overlap [7] . The aim of this study was to generate and test statistical models that could be used to predict the probability of susceptibility of an individual to MH from the results of their contracture tests.
Preliminary analyses using the discrete variables given by the conventional threshold concentrations for caffeine and halothane tests (the lowest concentrations at which a 0.2-g contracture developed) failed to generate any useful models, as no unique solution could be found. In this paper, we have therefore extended the information used to generate the statistical models by utilizing the force of contracture developed at each concentration of caffeine and halothane and also by incorporating data from a ryanodine contracture test. Models were generated using logistic regression analysis of data from each of the four contracture test protocols currently in use among European MH testing centers to indicate which endpoints were the best predictors of MH status. Two methods of logistic model generation that incorporate data from each of the four tests (the best predictor variables from each test) were explored with the aim of improving upon the discriminatory ability of the individual tests. The discriminatory ability of each logistic regression model was evaluated using a receiver operating characteristic (ROC) curve. The reproducibility and generalizability of this approach was examined using two further groups of patients.
Methods
Static and dynamic halothane and static caffeine tests were done according to the current European MH Group protocol.1 For contracture testing, an individual muscle fascicle excised from the vastus internus muscle under regional anesthesia is mounted in a 3-mL perfused and carbogenated tissue bath, and the muscle tension is transduced onto a chart recorder. Prior to the addition of any drug, a baseline tension of 2 g is applied to the fascicle, and a 10-min period of stabilization is allowed. The ryanodine test followed the protocol currently under evaluation by the European MH Group. In this test, a muscle fascicle is prepared in the muscle bath as for halothane and caffeine tests. After the stabilization period, the bath is continuously perfused with Krebs-Ringer solution containing 1 micro M ryanodine (98% pure, Calbiochem-Novabiochem Corp., Nottingham, UK). In all tests, an electrically evoked twitch response of >or=to1 g was taken to indicate adequate viability of the fascicle.
(1 ) Dr Helle Ording, Department of Anaesthesia, Herlev University Hospital, DK-2730 Herlev, Denmark.
For the static halothane test, contractures were measured as increase from baseline tension at each concentration of drug above the minimum baseline tension seen at any point on the trace prior to the response to that concentration. A negative contracture implied that the muscle was continuing to relax. Contracture force was recorded at 0.5%, 1%, and 2% halothane. Contractures were recorded in a similar way for the static caffeine test at 0.5, 1, 1.5, 2, 3, and 4 mM caffeine. In the dynamic halothane test, in which the muscle is stretched and relaxed three times prior to halothane being added and then once more at each concentration of halothane (0.5%, 1%, and 2%), the response was measured as tension at the end of the timed stretch above or below the minimum tension reached with any of the preceding stretches. The responses to ryanodine were recorded as time (in minutes) from addition of the drug to (a) the onset time, i.e., the initial increase in baseline tension, (b) an increase in baseline tension of 0.2 g, and (c) an increase in baseline tension of 1 g.
Data were collected from 250 patients investigated since the introduction of the current ryanodine test protocol. The results of 50 of these patients were used as the index group to generate the logistic regression models. These were the first 50 patients (age range 9-73 years; MHS, n = 13; MHN, n = 32; MHE, n = 5) who fit one of two sets of criteria defining either positive or negative status according to the nature of their clinical reaction (n = 35), or negative status based on the patients' being at low risk of having MH (n = 15). The former (proband) cases were classified positive or negative depending on whether their clinical reaction gave them a probability of greater than or less than 50% of being susceptible: the probability was taken from the findings of a previous study [8] , which analyzed over 400 referrals of potential MH reactions. Thus, categories of clinical presentation leading to the classification of positive were fulminant or moderate reactions with metabolic and muscle signs; masseter muscle spasm with generalized rigidity, rhabdomyolysis, or metabolic signs; and unexplained cardiac arrest or death. Categories of clinical presentation leading to the classification of negative were mild degrees of metabolic and muscle signs; masseter muscle spasm as the only feature; other miscellaneous indications, which in all cases in this study was postoperative pyrexia. The categorizations were made by a single investigator who was blinded to the outcome of the IVCT, on the basis of the clinical information provided by the referring physician. The low-risk patients were negatively tested parents of probands who themselves were too young to be tested and whose spouse plus one other relative of the spouse were MHS. A second set of 47 patients (age range 10-62 years; MHS, n = 15; MHN, n = 28; MHE, n = 4) who could be similarly classified was used to test the reproducibility of the models generated using combinations of the contracture tests. The remaining 153 patients (age range 9-74 years; MHS, n = 44; MHN, n = 92; MHE, n = 17) were consecutively tested relatives of tested susceptible patients: the generalizability of the models was examined on this group of patients.
Statistical Analysis
The statistical package SPSS for Windows 6.0 (SPSS Inc., Chicago, IL) was used for the analyses.
Logistic Regression Analysis. Logistic regression attempts to produce the best-fit model for a data set having a dichotomous outcome variable and one or more predictor variables where the probability of having the outcome characteristic is expressed as a function of a combination of the predictor variables (the logit function, z). In this case, the outcome variable is the presence of MH, and the predictor variables are the contracture test responses. The predicted probability of the presence of MH, P, is related to the log of the odds ratio (logit), z, as follows: Equation 1 When analyzing the individual contracture tests, the construction of the logistic regression models used a stepwise forward conditional entry, log likelihood removal method with the minimum entry requirement being a score statistic with significance <0.05, and the removal criterion being a twice log likelihood ratio with significance >0.1. This means that each of these models was generated in several stages during each of which only the predictor variable that would have greatest influence on the model was incorporated into the model. Variables were added in further steps until there were no more variables whose addition would significantly improve the model. This method of model construction was also used when analyzing results from the four tests together. In addition to this, the endpoints selected by the models for the individual tests were entered together to construct a single-step model, which incorporated all the simultaneously entered predictor variables in the final model. Histograms showing the distribution of predicted probabilities for the presence of MH indicate the degree of separation of MH-negative and MH-positive groups by the logistic regression model.
ROC Curves
An ROC curve is a plot of sensitivity against 1 - specificity where the data points are calculated for different cut-off points for a test or model. In this case, the cut-off points for an individual to be assigned as positive or negative by the models were predicted probabilities for the presence of MH. Sensitivity and specificity were calculated using predicted probabilities of 1%, 10%, 50%, and 90% as the cut-off points for all models. Additional calculations for further cut-off points were made for some models when it was thought that the extra data points might yield more information on the discriminatory ability of the model. For the purposes of the sensitivity and specificity calculations, the true status for proband cases was defined according to the clinical reaction as before: the low-risk cases were considered true negatives, and in the remaining group of 153 consecutively tested relatives, MHS and MHE patients were used as true positives, while MHN patients were used as true negatives. An ROC curve demonstrating a linear association between sensitivity and 1 - specificity indicates a nondiscriminatory test or model. An ROC curve having a value for 1 - specificity of <0.1 and/or a value for sensitivity of >0.9 was taken to indicate good discrimination with the model.
Results
Generation of the Logistic Regression Models
Stepwise conditional entry logistic regression of the separate contracture tests produced models incorporating a single predictor variable for all tests except the caffeine contracture test. For both the static and dynamic halothane tests, the most discriminant predictor variable was the force of contracture at 2% halothane, while that for the ryanodine contracture test was the time to initial contracture formation. The logit expression for the caffeine test model was a function of the contracture strength at both 0.5 mM caffeine and 2 mM caffeine. The coefficients for these two endpoints had the effect of reducing the predicted probability for positivity with increasing response at 0.5 mM caffeine and increasing the predicted probability for positivity with increasing response at 2 mM caffeine. The logit functions for each model and the model statistics are displayed in Table 1 . ROC curves for each of the four models are illustrated in Figure 1 , from which it can be seen that only the curve for the dynamic halothane test deviated from the criteria for a model that discriminates well.
Table 1: Logit Functions and Model Statistics of Logistic Regression Models Generated Using Data from Individual Contracture Tests
Figure 1: Receiver operating characteristic (ROC) curves for logistic regression models generated using data from separate contracture tests. The diagonal dashed line is the theoretical ROC curve for a completely nondiscriminatory test. A model can be said to discriminate well when all the points of the ROC curve lie to the left of the vertical dotted line and/or above the horizontal dotted line. The data points were calculated by assigning the individual patients from the group of 50 patients (low-risk and proband cases) used to generate the model as negative or positive for malignant hyperthermia (MH) according to the predicted probability for the presence of MH derived from the logistic regression model using a range of cut-off points. The data point labels are the cut-off predicted probabilities used in each case.
When the contracture strengths at 2% halothane in static and dynamic halothane tests, contracture strengths at 0.5 mM and 2 mM caffeine, and the time to initial increase in tension after ryanodine were entered together in stepwise conditional entry, the model generated was identical to that for the ryanodine contracture test alone (see Table 1 ). When the same variables were entered in single-step mode, the logit function and model statistics are as indicated in Table 2 . The logit function and statistics for another single-step model are also shown in Table 2 . This latter model was generated as before except that the contracture strength at 2% halothane in the static halothane test was not included. This was because the coefficient for this variable in the previous model indicated that an increase in the variable would tend to decrease the predicted probability for the presence of MH, which is a potentially major flaw in the model. The influence of each of the predictor variables in this model was examined by determining the effect on the model statistics of removing each variable in turn. Removal of any of the variables significantly reduced the goodness of fit of the model. The ROC curve for the single-step model that incorporates variables from the dynamic halothane, caffeine, and ryanodine tests only is shown in Figure 2 .
Table 2: Logit Functions and Model Statistics of Logistic Regression Models Generated Using Data from Combinations of Contracture Tests
Figure 2: Receiver operating characteristic (ROC) curves for the logistic regression model generated using data from combinations of contracture tests (dynamic halothane, caffeine, and ryanodine tests). The diagonal dashed line is the theoretical ROC curve for a completely nondiscriminatory test. A model can be said to discriminate well when all the points of the ROC curve lie to the left of the vertical dotted line and/or above the horizontal dotted line. The three curves apply to: (a) the group of 50 patients (low-risk and proband cases) used to generate the model; (b) a group of 47 further low-risk and proband cases; (c) a group of 153 consecutively tested relatives of malignant hyperthermia (MH) probands. The data points were calculated by assigning the individual patient as negative or positive to MH according to the predicted probability for the presence of MH derived from the logistic regression model using a range of cut-off points. The error bars represent the 95% confidence intervals for the data points. The insets show the same curves without the error bars but with data point labels that are the cut-off predicted probabilities used in each case.
Reproducibility of Appropriate Predicted Probabilities
Data from the second group of probands and low-risk patients (n = 47) were entered into the Equation forpredicted probability for the presence of MH generated by the single-step model, which excluded the static halothane variable and was considered to be the best model that we obtained. In comparing these probabilities with those derived from the nature of the clinical reaction, there were only two cases that would have been differently classified, assuming a cut-off probability of 50%. These cases were both probands with clinical reactions consisting of masseteric muscle spasm as the sole feature [associated with a probability 28% of MH [8] ] but whose respective IVCT results were 1) static halothane test, 1.65-g and 1.35-g contractures at 2% halothane; 2) dynamic halothane test, 0.95-g and 0.85-g contractures at 2% halothane; 3) static caffeine test, 0.2-g and 0.45-g contractures at 2 mM caffeine. The ROC curve demonstrating the discriminatory power of the model with this second group of patients is shown in Figure 2 .
Generalizability of the Logistic Regression Model
(Figure 2 ) also shows the ROC curve for our best model (using the combination of test variables) using the data from 153 consecutively tested relatives. This curve fits the criteria for a useful discriminatory test. Histograms of the probability of the presence of MH predicted by the model for those of the 153 patients classified as MHN, MHS, and MHE according to the European MH protocol are illustrated in Figure 3 . From these histograms, it can be seen that the model separates well the positive and negative groups and that the majority of MHS and MHN individuals can be so designated with a high level of confidence. All the MHN individuals had a predicted probability for the presence of MH of less than 0.5, whereas two MHS individuals also had predicted probabilities of less than 0.5. Of the 17 MHE individuals, 9 could be assigned positive or negative with a probability for the diagnosis of 0.95 or greater.
Figure 3: Histograms of probability of the presence of malignant hyperthermia (MH) as predicted by the logistic regression model generated using data from combinations of contracture tests (dynamic halothane, caffeine, and ryanodine tests) in 153 consecutively tested relatives of MH probands classified according to the European MH Group protocol as (a) MH normal, (b) MH susceptible, or (c) MH equivocal.
Discussion
The main aim of this study was to provide an objective means of determining the likelihood of an individual patient being susceptible to MH from a set of IVCT results. In genetic linkage calculations, it is necessary to define the phenotype misclassification probability. Previously, this has been estimated from the results of IVCTs done on control patients and probands whose clinical reactions were considered to be fulminant. Based (rather loosely) on these results, a global misclassification probability of 2% for positive and negative results was agreed to by members of the genetics section of the European MH Group. This approach would be satisfactory if only one gene were implicated in MH and IVCT results within a family were similar in degree, but this is not the case. Instead, the IVCT results of individual patients can take on great importance, especially with the possibility of separate associated and causative genes being present in the same family [5] .
In this study, we have not attempted to validate the European MH Group protocol for IVCT: this requires a different approach, such as that used by the North American MH workers [9] . In generating predictive models and identifying useful predictor variables from the results of IVCT with logistic regression analysis, we have attempted to minimize the influence of an individual's IVCT result on their initial ("true") categorization. Identification of the low-risk individuals was based on results of IVCT tests, but in addition to their own negative result, the spouse and at least one blood relative of the spouse had to be clearly positive. Our approach is strengthened in that the endpoints used as predictor variables were determined in the course of the regression analysis and were not those used for diagnostic purposes. The predictive models are functions of continuously distributed variables (force of contracture or time to attain a specified force of contracture) rather than the categorical threshold concentration variables used in conventional diagnostic interpretation of IVCT results.
Logistic regression provided models of best fit to explain mathematically the entered diagnosis as a function of the predictor variables from the relevant IVCT or combination of IVCTs for the initial group of 50 patients. As such, the logistic regression has been used to generate, rather than test, hypotheses. To test the hypothesis that the models were useful, further steps were necessary. The initial step was to decide whether the models are plausible, or whether, in attempting to fit as many of the index cases as possible, they incorporate spurious coefficients that are at odds with accepted rationale. Such a situation arose with one of our models, which combined predictor variables from each of the four contracture tests but which had the paradoxical effect of reducing likelihood of positivity to MH with increasing contracture response in the static halothane test (Table 2 ). For those models that were plausible, we assessed their discriminatory ability using ROC curves. Of these, the model combining predictor variables from the dynamic halothane, caffeine, and ryanodine tests correctly classified the greatest proportion of the index group (Table 1 and Table 2 ). Using the combination of test results also appeared to increase the confidence with which a case could be classified. This is illustrated in Figure 2 a, where it can be seen from the ROC curve for the index group that the sensitivity is maintained above 90% even when the specificity reaches 100%. This is not the case for the models using predictor variables from individual contracture tests (Figure 1 ).
We therefore considered the model combining predictor variables from the dynamic halothane, caffeine, and ryanodine tests as our best model. Even though this model fit the data well (Table 2 ) and it was a good discriminator (Figure 2 ), the logit function coefficients were not statistically significant. This is a recognized failing of significance tests of logistic regression coefficients for correlated predictor variables [10] . We therefore examined the influence of each of the predictor variables by observing the effect of omitting each variable in turn on the goodness of fit of the model [11] . Omission of any of the predictor variables significantly reduced the goodness of fit of the model, indicating that each variable contributed independently to the model. It was also important, however, to verify that the model reproducibly and appropriately discriminated among a group of patients similar to that used to generate the model (Figure 2 b) and that it was generalizable in that plausible discriminatory predicted probabilities were produced for patients who could only be classified on the basis of IVCT results (Figure 2 c). The model fit these criteria, and, although other models might produce similar results, it appears to be useful for predicting the likelihood of the presence of MH from IVCT results. Our data support the contention that current diagnostic criteria are more likely to produce false-positive than falsenegative results (Figure 3 ).
It is interesting that the variables selected in the models generated from the individual halothane tests were the contracture strengths at 2% halothane, which is defined as the cut-off concentration by the European MH Group protocol [3] . In addition to the contracture strength, at the cut-off concentration used in the European MH Group protocol for the caffeine test (2 mM), the model generated from the caffeine test selected the contracture strength at 0.5 mM caffeine, this latter variable having a negative influence on the predicted probability. One interpretation of this is that the rate of relaxation of the muscle specimen when the drug is added influences the contracture formation. This may partly explain the relative lack of sensitivity of the caffeine contracture test as described in the European MH Group protocol [7] , which utilizes information from a single threshold concentration (2 mM). An alternative explanation is that the combination of the two predictor variables for the caffeine test is a chance finding peculiar to the index group of patients used in this study. Which ever is the case, inclusion of the two predictor variables from the caffeine contracture test in the single-entry regression models was compatible with the production of a model that discriminated well and that was both reproducible and generalizable.
There is, as yet, no consensus on the best variable to use in reporting results of the ryanodine contracture test, but our data indicate that the onset time is most discriminatory. From the logistic regression model for the ryanodine test, it can be estimated that an onset time of 12 minutes gives a predictive probability for the presence of MH of 0.5; an onset time of 7 minutes gives a probability of 0.9; an onset time of 17 minutes gives a probability of 0.1. The stepwise conditional entry method produced a model identical to that for the ryanodine test alone, indicating results of this test to be the most closely correlated to the true diagnosis. Examination of the partial correlation statistics derived in the generation of this model reveals a rank order of correlation of ryanodine > dynamic halothane > static halothane > caffeine test. As the ryanodine and dynamic halothane tests are the contracture tests that best correlate with diagnosis, we suggest that all MH diagnostic centers use them. Currently, only six European MH Group centers use the dynamic halothane test, which is an optional part of the protocol. This is probably because it takes longer to do than the static halothane test.
We now intend to apply predicted probabilities from our best model to individuals included in genetic linkage analysis studies.
We thank Mr. P. L. Allam and Miss A. Clough for their expert technical assistance.
REFERENCES
1. Kalow W, Britt BA, Terreau ME, Haist C. Metabolic error of muscle metabolism after recovery from malignant hyperthermia. Lancet 1970;2:895-8.
2. Ellis FR, Keaney NP, Harriman DGF, et al. Screening for malignant hyperthermia. BMJ 1972;3:559-61.
3. European Malignant Hyperpyrexia Group. A protocol for the investigation of malignant hyperpyrexia (MH) susceptibility. Br J Anaesth 1984;56:1267-9.
4. European Malignant Hyperthermia Group. Laboratory diagnosis of malignant hyperthermia susceptibility (MHS) [letter]. Br J Anaesth 1985;57:1038.
5. Hopkins PM, Halsall PJ, Ellis FR. Diagnosing malignant hyperthermia susceptibility. Anaesthesia 1994;49:373-5.
6. Hopkins PM, Ellis FR, Halsall PJ. Ryanodine contracture: a potentially specific in vitro diagnostic test for malignant hyperthermia. Br J Anaesth 1991;66:611-3.
7. Hopkins PM, Ellis FR, Halsall PJ. Comparison of in vitro contracture testing with halothane, caffeine and ryanodine in patients with malignant hyperthermia and other neuromuscular disorders. Br J Anaesth 1993;70:397-401.
8. Ellis FR, Halsall PJ, Christian AS. Clinical presentation of suspected malignant hyperthermia during anaesthesia in 402 probands. Anaesthesia 1990;45:838-41.
9. Larach MG, Landis JR, Bunn JS, et al. Prediction of malignant hyperthermia susceptibility in low-risk subjects. Anesthesiology 1992;76:16-27.
10. Bland M. An introduction to medical statistics. 2nd ed. Oxford: Oxford University Press, 1995:306-30.
11. Hauck WW, Donner A. Wald's test as applied to hypotheses in logit analysis. J Am Stat Assoc 1977;72:851-3.