A broad purpose of health research could be understood as the advancement of science by answering important questions with regard to how people can live healthier, happier lives. The scientific process has been described as one of civilization's greatest accomplishments.1 It is, by nature, a process that continually evolves and improves.2
Contemporary developments are embedded in a widespread recognition that there is no singular scientific method3 as well as a growing recognition that no research methodology should be ascribed as the gold standard privileged position. Rather, the gold standard for producing high-quality evidence is matching important research questions to appropriate methodologies.4 Recognizing the value of synthesizing high-quality evidence for advancing the scientific process has also been a recent innovation.
While the value of methodological heterogeneity is increasingly appreciated, the defining feature of scientific activity is an attitude that demands both an allegiance to evidence as well as a willingness to modify theories on the basis of evidence.3 The quality of evidence, therefore, is critical to science. Evidence quality will also have an important influence on the conclusions that can be derived from the synthesis of this evidence.
In terms of improving the quality of the available evidence, it is essential that all aspects of the research process be valued. There is an increasing and concerning trend, however, for some aspects of the research process to be considered more important than others. In particular, the attention being afforded to sample size calculations seems disproportionate to the purpose of these calculations. Grant applications, for example, frequently require sample size calculations, and we have experienced lengthy and detailed conversations in funding review meetings regarding the accuracy of the calculations being offered. It is not unusual for research proposals to be penalized for what might be regarded as inaccurate or incorrect sample size calculations. By contrast, there is rarely a requirement by funders for researchers to specify what the impact of their research will be for individuals, communities, or society at large. Nor is there a request for researchers to indicate how the research will contribute to modifying and improving existing theories.
It is important to ensure that the sample for any study is of sufficient size to robustly answer the research question under investigation. But how big is big enough? The relentless pursuit of statistical significance along with the skilled power calculations by researchers has promoted a “paint by numbers” type of approach to research in which teensy effects can be statistically significant with large enough samples. Although statistical significance might help achieve a journal publication, it does not guarantee practical significance or impact of the research.
The ideal for any research is to collect data from the entire population under consideration. Sampling is always the default option. Given constraints of time and money, however, this ideal is rarely achievable. Researchers, therefore, work backward from the entire population to collect data from the largest and most representative sample that time and money will allow. Having some sense of what sample size might be reasonable for a particular research program is obviously pertinent so that resources are not wasted. However, other considerations should be regarded as at least equally crucial. If researchers were required to demonstrate a genuine and tangible impact of their research, then the size of the sample used to produce that impact might be less relevant. Similarly, if researchers could stipulate the way in which our knowledge of important theories and mechanisms would be legitimately enhanced, the size of the sample might be a secondary consideration.
Replication should be a standard feature of the scientific process,2 but it is not routine in the health field to find studies independently replicated. It is certainly the case that similar studies are conducted repeatedly but, without attention to replication, important details might be overlooked. Replication is not the same as duplication.5 Replicating studies for the purpose of advancing theory or purported mechanisms, and conducting power analyses at the end of a study, rather than the beginning, would enhance the quality of the evidence available to be synthesized. Of particular interest would be studies that did not produce statistically significant results but, through a post hoc power analysis, were demonstrated to have had adequate power to detect an effect if an important effect was, in fact, present. Such results might provide a useful and novel contribution to the body of knowledge being gathered. These results might also make the conclusions of meta-analyses more robust and broadly applicable.
While increasing the quality of primary studies will undoubtedly also improve meta-analyses, even with existing studies’ meta-analyses, more precise and accurate estimates of the relationships between variables are possible due to the fact that, by combining primary research studies, meta-analyses have much larger sample sizes than the original studies.6 The value of meta-analyses in advancing knowledge in a field is clear,6 although the findings of meta-analyses do not address research impact directly. Moreover, it could be argued that, ironically, with the development, growth, and widespread acceptance of meta-analytic methods, sample size calculations of primary studies are even less important than they were before. There is simply very little to no scientific justification for anything more than a “ballpark” estimate of what an appropriate sample size for a primary study should be.
Large samples are not essential for demonstrable and monumental impact. The story of the Nobel Prize winners Marshall and Warren7 is relevant here in terms of their demonstration that ulcers were caused by bacteria. Through their efforts, they “overturned established medical dogma and revolutionised the treatment of peptic ulcers.”7(p.1429) A key element in eventually having the credibility of their results acknowledged was when Marshall, using himself as a sample size of 1, deliberately infected himself with the bacterium.7 The results were game-changing in the most profound and absolute sense.
To ensure the highest-quality evidence is being synthesized so that, at any point in time, we can be confident that we are making decisions based on the best available evidence, it is clearly important to only include studies that are sufficiently powered to detect important effects. It should be just as imperative, however, that research be planned from the outset to have a genuine and tangible impact and also to contribute in meaningful ways to the improvement of authoritative theories. Through this more balanced approach to the scientific process, we may see far greater progress in the eradication of health inequities and the ability of all people to live healthy and contented lives.
1. Novell S. The skeptics’ guide to the universe. London: Hodder & Stoughton Ltd; 2018.
2. Zimring JC. What science is and how it really works. Cambridge, UK: Cambridge University Press; 2019.
3. McIntyre LC. The scientific attitude: defending science from denial, fraud, and pseudoscience. Cambridge, MA: Massachusetts Institute of Technology; 2019.
4. Carey TA, Tai SJ, Mansell W, Huddy V, Griffiths R, Marken RS. Improving professional psychological practice through an increased repertoire of research methodologies: illustrated by the development of MOL. Prof Psychol Res Pr
2017; 48 (3):175–182.
5. Tugwell P, Welch VA, Karunananthan S, Maxwell LJ, Akl EA, Avey MT, et al. When to replicate systematic reviews of interventions: consensus checklist. BMJ
6. Chan MLE, Arvey RD. Meta-analysis and the development of knowledge. Perspect Psychol Sci
2012; 7 (1):79–92.
7. Pincock S. Nobel Prize winners Robin Warren and Barry Marshall. Lancet
2005; 366 (9495):1429.