In this issue of The Journal of Bone & Joint Surgery, Gagnier and Morgenstern report on a recent statement by the American Statistical Association (ASA) about the use and misuse of the p value in research reports1,2. The p value provides the probability that an effect as extreme as, or more extreme than, the observed effect could have arisen if the null hypothesis of no effect were true1-4. We provide context for the ASA report and the article by Gagnier and Morgenstern in the form of 2 (fictitious) research abstracts that base their conclusions solely on a p value.
The first abstract is written by investigators examining whether age influences the risk of readmission 30 days following lumbar arthrodesis. The investigators pursue this question by gathering data from patients treated in their own medical center. They produce Abstract 1.
Background: The effect of age on 30-day readmission following lumbar arthrodesis has received little study.
Methods: We investigated the effect of age on 30-day readmission after lumbar arthrodesis in a cohort of 120 subjects enrolled in our medical center. We collected data on age and readmission from hospital discharge records. We assessed the association between age (<65 versus ≥65 years) and 30-day readmission using the chi-square test.
Results: Forty-one percent of the subjects were ≥65 years old. Twenty-four subjects (20%) were readmitted within 30 days. An age of ≥65 years was not associated with 30-day readmission (p > 0.05).
Conclusions: Age is not associated with 30-day readmission following lumbar arthrodesis.
A second group of investigators examined whether age is associated with an increased risk of pulmonary embolism following total hip or knee arthroplasty. These investigators use the U.S. National Inpatient Sample, a large probability sample of hospitals throughout the U.S., and summarize their findings in Abstract 2.
Background: The effect of age on the risk of pulmonary embolism following lower-extremity total joint arthroplasty (TJA) has received little study.
Methods: We investigated the effect of age on the 30-day risk of pulmonary embolism among persons in the U.S. National Inpatient Sample who had knee or hip TJA in 2014. We assessed the association between age (<65 versus ≥65 years) and 30-day readmission using the chi-square test.
Results: In 2014, 1,009,866 persons in the U.S. had hip or knee TJA; 55% were ≥65 years old. Of those total hip or knee TJA recipients, 9,560 (0.95%) had a pulmonary embolism within 30 days of surgery. An age of ≥65 years was strongly associated with the development of pulmonary embolism (p < 0.001).
Conclusions: Older age is a strong risk factor for pulmonary embolism following hip or knee total joint arthroplasty.
In both abstracts, the conclusions were based on whether the association between the binary age indicator and the outcome of interest met a p value criterion of 0.05. To understand the findings of each abstract more fully, we examine the data in detail.
Table I shows the association between age and readmission following lumbar arthrodesis. It reveals a risk ratio of 1.96 (14/50 ÷ 10/70), with a 95% confidence interval (CI) of 0.95 to 4.05 and p = 0.07. On the basis of the p value of 0.07, the investigators concluded that older age was not associated with readmission despite their finding that the risk of readmission was 96% greater for the older patients.
TABLE I -
Readmission Following Lumbar Arthrodesis
||Readmission (no. of patients)
|Age of ≥65 yr
|Age of <65 yr
Table II shows the association between age and the risk of pulmonary embolism following total hip or knee arthroplasty in the National Inpatient Sample. The risk ratio is 1.12 (5,545/557,200 ÷ 4,015/452,666), with a 95% CI of 1.08 to 1.17 and a p value of <0.0001. On the basis of the p value of <0.0001, the investigators concluded that age was strongly associated with pulmonary embolism despite finding that the risk was just 12% higher in the older patients.
TABLE II -
Pulmonary Embolism Following Lower-Extremity Total Joint Arthroplasty
||Pulmonary Embolism (no. of patients)
|Age of ≥65 yr
|Age of <65 yr
Both groups of investigators would have benefited from reading the article in this issue of The Journal by Gagnier and Morgenstern and the report by the ASA1,2. The abstracts illustrate several misapplications of the p value noted by Gagnier and Morgenstern and the ASA, including (1) using the p value to assess the magnitude of an effect or the importance of a result, (2) using the p value as the basis for a scientific conclusion, and (3) reliance on a particular p value cutoff (such as 0.05) to assess the strength of association.
Most clinicians and policy experts would regard the nearly twofold greater risk of readmission following lumbar arthrodesis (relative risk, 1.96) in older patients as meaningful from clinical and policy standpoints. Since the 95% CI (0.95, 4.05) overlaps 1.0, we should be cautious, as the observed data remain consistent with the null hypothesis of no effect. In the second abstract, the 95% CI (1.08, 1.17) suggest the data are compatible with an excess risk ranging between 8% and 17%. Because the 95% CI did not contain 1.0, the null hypothesis of no excess risk, tested at a 5% significance level, would be rejected. But the magnitude of the increase in risk (12%) is quite small. In both abstracts, the effect measure estimate (risk ratio) and 95% CI are much more informative than the p value, as they provide information about both the magnitude of the effect and the amount of uncertainty that the data carry.
The p value provides little information in studies that are not explicitly powered to detect clinically meaningful effects. The first study, with just 24 readmissions among 120 patients, was at risk for finding a clinically meaningful effect that did not reach significance. The second study, with 9,560 pulmonary emboli among 1,009,866 patients, was at risk for finding significant effects that may not necessarily be clinically meaningful.
As articulated by Gagnier and Morgenstern, the p value has long been misused as the single most important criterion for deciding whether a study is “positive” or “negative.” Accordingly, investigators tend to place undue emphasis on whether results are statistically significant, with insufficient attention to whether they are clinically meaningful. The ASA statement and the article by Gagnier and Morgenstern recommend greater emphasis on the effect measure (risk ratio, odds ratio, or difference in means or in proportions) along with the 95% CI1,2. Gagnier and Morgenstern, as well as other authors3,4, emphasize numerous factors that should be considered in the assessment of whether a particular association is compelling. These include the effect measure, the 95% CI, the p value, and contextual factors, such as the mechanistic plausibility of the finding and its compatibility with prior research.
Statistical significance is an important aspect of the assessment of research findings4. However, by itself, the p value does not convey the strength of association. Using the p value as a primary measure of association may lead to problematic conclusions, particularly with especially large or small samples, as the examples presented here illustrate. The JBJS editorial team will work with JBJS authors to ensure that research reports include effect estimates with 95% CIs and that authors discuss the range of factors bearing on the plausibility of their findings.
1. Gagnier JJ, Morgenstern H. Misconceptions, misuses, and misinterpretations of p values and significance testing. J Bone Joint Surg Am. 2017 Sept 20;99(18):1598-603.
2. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process and purpose. Am Stat. 2016;70(2):129-33.
3. Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG. Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016 Apr;31(4):337-50. Epub 2016 May 21.
4. Rothman K, Greenland S. Modern epidemiology. 2nd ed. Philadelphia: Lippincott-Raven; 1998.