Are P values given more importance than they deserve? : Endodontology

Secondary Logo

Journal Logo

Statistics in Endodontic Research

Are P values given more importance than they deserve?

Aggarwal, Vivek; Singla, Mamta1

Author Information
Endodontology 35(1):p 15-17, Jan–Mar 2023. | DOI: 10.4103/endo.endo_235_22
  • Open

Abstract

INTRODUCTION

There has been an increase in the reporting of randomized clinical trials in dentistry. These trials usually involve evaluating one or more experimental groups and comparing them with an active/passive control group. The aim is to test a null hypothesis. Unfortunately, the results are evaluated using only a P value. Usually, the level of significance of the P value, or the probability of a true null hypothesis, is kept at 0.05.[1] In simple words, if the P < 0.05, there is a <5% chance that the null hypothesis may be true. This practice has led to categorizing the data as “significant” or “nonsignificant,” without considering the clinical significance.[2] For example, a new local anesthetic drug may provide statistically significantly better results than 2% lidocaine, but the improvement may be clinically insignificant. Let us understand it with a hypothetical clinical scenario.

A clinical study is evaluating two different local anesthetic drugs during the endodontic management of symptomatic teeth. The drug is tested on a total of 500 human subjects (250 in each group). Drug A was effective in 56% of cases (140 out of 250), while drug B was effective in 47% of cases (117 out of 250). Since the variables are categorical, a Chi-square test was used to evaluate the level of significance and it was found that drug A was statistically better than drug B (χ2 = 3.88, P < 0.05 [0.0489]) [Figure 1]. On a gross reading of the results, one can assume that drug A is better than drug B. However, can we consider an increase of 9% in success as clinically significant? In other words, the success rates of the new drug may be statistically better, but clinically insignificant. Another point to be noted is the absolute dependence on the P value. The P value in the above case is 0.0489, which is quite close to the cutoff mark of 0.05. If drug A was successful in 139 patients, instead of 140, the P value would have been 0.060 [Figure 1]. This would have made the difference statistically insignificant.

F1
Figure 1:
Chi-square distribution graphs for 500 patients with different success rates (prepared from www.geogebra.org)

In the above example, we can see that overdependence on solely the P values can lead to the adoption of erroneous data. There has been excessive use of reporting of P values to an extent that the actual differences or the anticipated clinical differences are omitted in many pieces of research. Instead of solely using P values, confidence intervals can give a better picture of the actual difference between the groups.

The confidence intervals provide a picture of the applicability of the data in a true population.[3] When a new treatment is tested, it is not possible to evaluate it on the whole population. A randomly selected group is used to test the treatment. The results are then statistically applied to the whole population. For this purpose, a standard error of the mean is first calculated. The scanning electron microscope (SEM) will help to understand the deviation of the population means. For example, the effect of a new drug is tested on the heart rates of 50 patients [Table 1]. The evaluated mean heart rate is 74.5 with a standard deviation of 5.74. The SEM for this data would be 0.81. It would imply that if the sample is randomly selected from the whole population, there is a 95% chance that the actual mean of the true population would lie between 74.5 ± 1.96 × 0.81. That is between 72.9 and 76.2 [Figure 2]. This would make the 95% confidence interval for this data. In simple words, we would have confidence that 95% of the actual population mean would lie between 72.9 and 76.2.

T1
Table 1:
Descriptive statistics for the heart rate data for a sample size of 50
F2
Figure 2:
Confidence intervals for a sample size of 50 patients (prepared from https://measuringu.com/calculators/ci-calc/)

An important variable in the determination of the width of the confidence interval is the sample size. The SEM is inversely proportional to the square root of the sample size. In the above example, if the sample size is doubled to 100, keeping the mean same, the SEM shall reduce to 0.57 [Table 2]. Accordingly, the width of the confidence interval shall be 74.5 ± 1.96 × 0.57 [Figure 3].

T2
Table 2:
Descriptive statistics for the heart rate data for a sample size of 100
F3
Figure 3:
Confidence intervals when the sample size was increased to 100 (prepared from https://measuringu.com/calculators/ci-calc/)

The value of the confidence interval also depends upon the level of significance chosen. In the above example, 95% levels are used. In statistical analysis, confidence intervals of 90%, 95%, or 99% are commonly used. Confidence intervals of more than 99% are rarely used. The confidence interval level simply means that there is a 95% probability that the actual value of the population would lie between these values. In other words, if the same experiment is carried out on different samples of the same population, there is a 95% chance of getting the mean between the confidence intervals obtained by the experiment.[4,5] Thus, confidence intervals tell about the trial’s accuracy in determining true effect size. A wider interval suggests the collection of further data.

The confidence intervals also help in finding a statistical difference between the two groups.[1–3] We can plot the confidence interval for each group. If the confidence interval overlap, there would be a possibility of the absence of any difference between the groups. Traditionally, the difference between the groups is evaluated using hypothesis testing. In simple words, we find whether the difference between the groups is due to the difference in treatment or due to pure chance. For this purpose, we initially assume that there is no difference between the two groups.[3] This assumption is known as the null hypothesis. This hypothesis is then statistically verified to accept or reject the null hypothesis. We calculate the probability of accepting/rejecting the null hypothesis and this probability is called the P value. Since it is a probability, its value lies between 0 and 1. Traditionally, 0.05 values are kept as a cutoff. Anything above 0.05 means that there is more than a 0.05 probability that the null hypothesis is true. If the null hypothesis is true, there would be no difference between the groups and any difference could have been because of chance. Similarly, P < 0.05 suggests that there is less probability of the null hypothesis being true and it should be rejected. The value of P can provide some information about the level of evidence in favor/against the null hypothesis. However, it should be noted that 0.05 levels are arbitrary and do not correspond to clinical significance. As explained in the example before, there is not much difference between P = 0.04 and P = 0.06, and tilting only one case can shift the P value beyond the cutoff.

The confidence intervals present more information than the P values.[6] They provide a range of values in which for the population mean and the precision to find one. They can also help in the testing of the null hypothesis or in other words, finding a significant difference between the groups. Let us consider the example of a drug affecting the heart rate. In the drug A group, the heart rate was 74.5 ± 5.62. In the drug B group, the heart rate was 78.5 ± 5.62. The sample size in both groups was 50 with the same standard deviation. The SEM in both groups would be the same. The confidence interval for drug A would be 72.9–76.1 and for drug B, would be 76.9–80.1. It means that the population value for drug A would be up to 76.1, which is less than the minimum value of drug B. There would be a significant difference between the two groups. We can also calculate the confidence intervals for the mean difference. If the confidence intervals include 0, there would be no statistical difference between the groups and vice versa.

The confidence intervals also provide a clinical difference too.[7] In the above example, the confidence interval for the mean difference is 1.76–6.23. It means that if the experiment is repeated over the whole population, there is a 95% chance that drug B would increase the heart rate between 1.76 and 6.23. The clinician can relate a clinical significance based on these values. If the clinical significance values are priori kept at 3, the data would be statistically significant but not clinically significant.

CONCLUSIONS

The confidence intervals provide more useful information than the P values. These should be reported along with the actual P values.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

REFERENCES

1. Altman DG. Why we need confidence intervals. World J Surg 2005;29:554–6.
2. Gardner MJ, Altman DG. Confidence intervals rather thanPvalues:Estimation rather than hypothesis testing. Br Med J (Clin Res Ed) 1986;292:746–50.
3. Akobeng AK. Confidence intervals andp-values in clinical decision making. Acta Paediatr 2008;97:1004–7.
4. Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests,Pvalues, confidence intervals, and power:A guide to misinterpretations. Eur J Epidemiol 2016;31:337–50.
5. Bland JM, Peacock JL. Interpreting statistics with confidence. Obstet Gynaecol 2002;4:176–80.
6. Bland JM, Peacock JL. Interpreting statistics with confidence. Obstet Gynaecol 2002;4:176–80.
7. du Prel JB, Hommel G, Röhrig B, Blettner M. Confidence interval orp-value?Part 4 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2009;106:335–9.
Keywords:

Confidence interval; P value; statistical significance

Copyright: © 2023 Endodontology