There is a problem with p. Specifically, there is a problem with the declaration of statistical significance when p ≤ .05. Longer ago than recently, the American Statistical Society, among others, called for changing the way researchers report results, including not reporting p values without also providing context for understanding what those values mean and eliminating the claim of statistical significance from research papers. Recently, the American Statistical Society has become more insistent in the call for us to change not only language but practice, as well.
There are many good reasons for this insistence, and in case you missed the discussion, please see the 43 articles published in The American Statistician in March 2019 for details. The editorial in that issue provides excellent detail about the problems with p values and statements of statistical significance (Wasserstein, Schirm, & Lazar, 2019). The editors also, helpfully, introduce some potential solutions to p-value use. These solutions, which are further detailed in the collected articles, include the use of minimal important effect size (Amrhein, Trafimow, & Greenland, 2019) and second-generation p value, which takes practical significance into account (Greevy, Welty, Blume, DuPont, & Smith, 2019). Other suggestions are available; most of us will need to work closely with statisticians to make sure we choose the most appropriate approach for our work.
Briefly, because some of you have asked us what to do about reporting statistical results in papers submitted to Nursing Research, we support the recommendation of Wasserstein et al. (2019) to accept uncertainty and to be thoughtful, open, and modest in reporting your statistical results. More specifically and as Hayat (2010) noted almost a decade ago in this Journal, significance testing is a subjective procedure; tests of significance do not provide an objective measure of scientific evidence nor do p values have any clinical meaning. A specific p value depends on many factors, including statistical power, effect size, and sample size. Thus, when reporting a p value, a context for understanding the value needs to be included, such as confidence intervals, odds ratios, hazard ratios, or regression coefficients, all of which provide an estimate of the magnitude of an effect. Matthews (2019) suggests providing clarity to the width of the confidence interval as another way to contextualize p values. The Publication Manual of the American Psychological Association, which is used by this Journal for manuscript style, will publish a new edition late in 2019. The revision will reportedly provide strengthened guidance to researchers about ways to report statistical results (American Psychological Association, 2010).
What is abundantly clear is that we must stop using the language of statistical significance (Hurlbert, Levine, & Utts, 2019). Reporting exact p values is acceptable as long as there is context provided about the meaning of the values; p values themselves are not the problem. The problem is the claim of statistical significance (or lack thereof) with the failure to make sense of what that claim means. As Hayat et al. (2019) have noted, it is unfortunate that the term “significant” was ever attached to p < .05; it should not have been. Rather, in order for science to progress and provide the basis for clinical practice, which nursing science aims to do, we need to thoughtfully and carefully present our research results, along with a clear context for meaning.
The Editorial Board at Nursing Research is revising author guidelines to reflect current recommendations for best practice in reporting statistical findings. In the meantime, we encourage you to read the many references currently available and to consult with statistician colleagues. We will work with you to revise papers to reflect the best practices. We know you want your research papers to convey information in the most transparent and honest way possible. We share that goal and will do all we can to help you achieve it.
American Psychological Association (2010). Publication manual of the American Psychological Association
(6th ed.). Washington, DC: Author.
Amrhein V., Trafimow D., & Greenland S. (2019). Inferential statistics is no replication crisis if we don’t expect replication. The American Statistician
, 73(suppl), 262–270. doi:.
Greevy R., Welty V., Blume J., DuPont W., & Smith J. (2019). An introduction to second generation p
-value. The American Statistician
, 73(suppl), 157–167. doi:.
Hayat M. J. (2010). Understanding statistical significance
. Nursing Research
, 59, 219–223. doi:. PMID: 20445438.
Hayat M. J., Staggs V. S., Schwartz T. A., Higgins M., Azuero A., Budhathoki C., … Ye S. (2019). Moving nursing beyond p
< .05. Nursing Outlook
, Advance online publication. pii: S0029-6554(19)30357-4. doi:10.1016/j.outlook.2019.06.010 PMID: 31375344.
Hurlbert S., Levine R., & Utts J. (2019). Coup de grâce for a tough old bull: ‘Statistically significant’ expires. The American Statistician
, 73, (Suppl, 352–357. doi:.
Matthews R. (2019). Moving toward the post p
< 0.05 era via the analysis of credibility. The American Statistician
, 73(suppl), 202–212. doi:.
Wasserstein R. L., Schirm A. L., & Lazar N. A. (2019). Moving to a world beyond p
< 0.05. The American Statistician
, 73(supp1), 1–19. doi:.