Introduction
Assessing the methodological quality of primary research is a crucial step in evidence synthesis to evaluate the validity of the evidence base on a given research topic. The methodological quality of primary research is commonly assessed prior to synthesis using a methodological quality assessment tool. Such tools consider the extent to which safeguards against bias (listed as items in a tool) have been implemented in the design, conduct, and analysis of a study to protect against biased results.1,2 Safeguards are key methodological aspects of design or conduct of a study that protect against bias (eg, randomization, blinding). Valid methodological quality assessment tools aim to assess only the internal validity of a study and should therefore not include items related to reporting quality, although poor reporting makes it more difficult to undertake methodological quality assessment.
There are a multitude of methodological quality assessment tools available and they are extensively used, which makes the selection of a suitable tool a priority. There is currently an explosive proliferation of such tools,3–6 mainly owing to the differing views among researchers regarding what constitutes a valid quality assessment and how quality safeguards are implemented in different research designs and for different research questions.3,5–8 Nevertheless, all such tools make use of a similar pool of safeguards.
We have previously proposed a unified framework for bias assessment based on evidence from a systematic review of methodological quality assessment tools, and subsequently developed the MethodologicAl STandard for Epidemiological Research (MASTER) scale.6 This scale was created to be used across analytic study designs included in an evidence synthesis (ie, randomized controlled trial [RCT], quasi-experimental, cohort, case-control, analytic cross-sectional); therefore, it is crucial that it be compared against tools for different designs. Our aims were to: i) map the MASTER scale to several key critical appraisal tools for analytic designs9,10 and assess the completeness of safeguards in design-specific tools vis-à-vis the comprehensive list of safeguards in the MASTER scale; and ii) assess the extent of duplication of safeguards and redundancy across design-specific tools when compared with this unified scale.
Methods
Tools used in this study
We chose 3 tools for comparison. First, the JBI toolset was selected because JBI has clearly defined tools across different designs and JBI is well known, giving these tools a wide audience. Second, we chose 3 of the Scottish Intercollegiate Guidelines Network (SIGN) tools. Finally, we selected the Newcastle-Ottawa Scale (NOS) for analytic study designs, as both the SIGN tools and NOS have a long history and are widely used.
MASTER scale
The MASTER scale6 was developed iteratively by our research team through a systematic review of 393 methodological quality assessment tools delivering a total of 6295 methodological safeguards. We iteratively classified safeguards according to which methodological standard each safeguard aimed to fulfill. Details regarding this scale are published elsewhere.6 The MASTER scale adopts a relative framework for quality assessment that uses the count of safeguards present (quality score) to rank studies relative to the best study in the evidence synthesis.11 This works by dividing the quality score of the assessed study by the quality score of the highest-scoring study in the synthesis to calculate a rank between 0 and 1 (1 is always the highest rank because it is the highest score divided by itself).11 This tool is not specific to any study design and can be used to assess multiple analytic study designs within a synthesis.
JBI tools for analytic study designs
JBI is an international research and development organization that specializes in promoting and supporting evidence-based health care.10,12 JBI established a working group to develop critical appraisal tools by consensus.9,13,14 The pilot versions of these tools were tested internally and then approved by the JBI Scientific Committee. The tools are disseminated through the JBI website (https://jbi.global) and the JBI Manual for Evidence Synthesis.10 Each of the JBI tools consists of a number of safeguards that can be checked against a research report of interest, and each tool is specifically designed for appraisal of a particular study design.9,10 Of the 13 design-specific checklists, 8 were excluded because they were either for nonquantitative studies (2 checklists: text and opinion, qualitative studies), reports of cases (2 checklists: case series,13 case reports), designs that require consideration of different safeguards to the typical analytic designs (3 checklists: diagnostic studies, economic evaluations, studies of incidence or prevalence), or for systematic reviews (1 checklist). The 5 remaining checklists were included in this study: case-control studies, cohort studies, analytical cross-sectional studies, RCTs, and quasi-experimental studies.
Scottish Intercollegiate Guidelines Network
SIGN15 is a national guideline collaboration focused on improving health care for patients in Scotland. Its aim is to network with clinicians, health and social care providers, patient organizations, and individuals to produce and disseminate guidelines based on current best evidence and to ultimately reduce the variation in clinical practice and patient outcomes. SIGN produces guidelines based specifically on systematic reviews and provides a range of critical appraisal tools for different study designs. These checklists assess the methodological limitations of a study that have the potential to cause bias in the results. The checklist items differ depending on the study design.
Of the 6 design-specific checklists, 3 were excluded because they were either for systematic reviews and meta-analyses (1 checklist), diagnostic studies (1 checklist), or economic studies (1 checklist). The 3 remaining checklists were included in this study: case-control studies, cohort studies, and RCTs.
Newcastle-Ottawa Scale
The NOS16 was developed collaboratively between the Universities of Newcastle, Australia, and Ottawa, Canada, for the methodological quality assessment of non-randomized studies. The NOS is a single instrument divided into 2 study designs, case-control and cohort studies, and has been validated for use based on evaluation by experts in the field.16 The NOS uses a star system to count safeguards implemented in 3 domains: the selection of the study groups, the comparability of the groups, and the ascertainment of the exposure or outcome of interest for case-control or cohort studies, respectively. The NOS is a popular tool with a wide audience due to its ease of use. Both the case-control and the cohort study scales were included in this review.
Analytic strategy
The safeguards from each design-specific checklist were compiled into a list and compared to the list of safeguards from the MASTER scale. Each safeguard was mapped 1:1 to the MASTER scale safeguard to which it most closely aligned. These were not exact matches but aimed to map conceptually similar safeguards across the tools of interest. For example, the safeguard “length of follow-up was not too long or too short in relation to the outcome assessment” (question 36 in the MASTER scale) was mapped to “was the follow-up time reported and sufficient to be long enough for outcomes to occur?” (question 8 in the JBI cohort study checklist). The best match was chosen for each safeguard; however, more than 1 match may have been possible. Therefore the design-specific tools have safeguards that may encompass more than 1 safeguard in the MASTER scale.
To ensure a high degree of accuracy, this process was completed by 2 authors (JS and SD) and sent to another member of the JBI Evidence-based Healthcare Research Division (ZM) for checking, as well as to a research assistant. Mapping of the safeguards was also discussed by the authors to ensure consensus. Stacked bar charts were used to classify the distributions of methodological standards from the MASTER scale (Figures 1–3) based on the mapped safeguards across the JBI checklists, SIGN tools, and the NOS. The mapped safeguards from each of the design-specific checklists to the MASTER scale were also presented as raw data (Supplemental Digital Content 1: https://links.lww.com/SRX/A4). Design-specific safeguards that remained unmapped (Supplemental Digital Content 2: https://links.lww.com/SRX/A5), safeguards that mapped to the same MASTER safeguard, and safeguards deemed not for use as a methodological quality assessment were described. Agreement metrics for this process were not calculated because a single set of results after consensus were utilized and all disagreements were resolved.
Figure 1: Safeguards within the JBI checklists mapped against methodological standards from the MASTER scale.
Figure 2: Safeguards within the Scottish Intercollegiate Guidelines Network checklists mapped against methodological standards from the MASTER scale.
Figure 3: Safeguards within the Newcastle-Ottawa Scale mapped against methodological standards from the MASTER scale.
Results
Assessment of safeguards within the tools
The MASTER scale consists of 36 safeguards aiming to fulfill 7 methodological standards. The JBI tools include a total of 51 safeguards, with a minimum of 8 and a maximum of 13 safeguards per tool. The SIGN tools include a total of 35 safeguards, with a minimum of 10 and a maximum of 14 safeguards per tool. Finally, the NOS tool includes a total of 16 safeguards, with 8 safeguards in each scale. All of the safeguards across all of the tools (JBI, SIGN, and NOS) were accounted for by the MASTER scale.
One design-specific JBI critical appraisal tool included a safeguard that was not considered a methodological safeguard against bias (2% of all safeguards). This safeguard was safeguard #2 in the JBI analytical cross-sectional study tool (Were the study subjects and the setting described in detail?). While safeguarding against differences in study subjects would prevent bias in the study (eg, age, severity of condition, previous therapies), it is the balance of these differences between groups that is important and not whether it was described, which is considered reporting quality.
All 3 design-specific SIGN tools included a safeguard that was not considered a methodological safeguard against bias (23% of all safeguards). These safeguards were safeguard #1, #5, and #11 in the SIGN case-control study checklist (#1: The study addresses an appropriate and clearly focused question; #5: Comparison is made between participants and non-participants to establish their similarities or differences; and #11: Confidence intervals are provided), and safeguards #1, #9, and #14 in the SIGN cohort study checklist (#1: The study addresses an appropriate and clearly focused question; #9: Where blinding was not possible, there is some recognition that knowledge of exposure status could have influenced the assessment of outcome; and #14: Have confidence intervals been provided?). One safeguard in the SIGN RCT checklist was not considered a methodological safeguard against bias (The study addresses an appropriate and clearly focused question).
Both design-specific NOS included an item that was not considered a methodological safeguard against bias (13% of all safeguards). These were items #2 in the NOS for case-control study (Representativeness of the cases) and #1 in the NOS for cohort study (Representativeness of the exposed cohort).
Comparison to methodological standards in the MASTER scale
Equal ascertainment and equal prognosis were the most common MASTER scale methodological standards addressed by safeguards in the 5 JBI design-specific tools (25% and 24% of safeguards in the design-specific toolset, respectively). Safeguards falling under the standards of equal recruitment, temporal precedence, equal retention, equal implementation, sufficient analysis were least common (8%, 8%, 10%, 12%, and 12%, respectively) (Figure 1). The JBI cohort study and quasi-experimental design tools included safeguards from all methodological standards from the MASTER scale, whereas the JBI case-control tool did not have a safeguard from 1 methodological standard (equal retention). Equal retention may not be relevant to case-control designs given that cases and controls are often selected retrospectively. The tool for JBI RCTs did not have safeguards from 2 methodological standards in the MASTER scale (equal recruitment and temporal precedence), and the JBI analytic cross-sectional tool had no safeguards that addressed 3 of the methodological standards in the MASTER scale (equal retention, equal implementation, and temporal precedence). The reasons for this were unrelated to the tool’s corresponding study design. Further information regarding the design-specific tool mapping to the MASTER scale’s methodological standards can be found in the supplementary materials.
Equal ascertainment, equal prognosis, and equal retention were the most common MASTER scale methodological standards addressed by safeguards in the 3 design-specific SIGN tools (26%, 17%, and 15% of safeguards in the design-specific toolset, respectively). Safeguards falling under the standards of equal recruitment, equal implementation, sufficient analysis, and temporal precedence were least common (8%, 6%, 3%, and 3%, respectively) (Figure 2). None of the SIGN checklists included safeguards from all methodological standards of the MASTER scale. The SIGN case-control study checklist had no safeguards belonging to 2 methodological standards (sufficient analysis and temporal precedence). The SIGN checklist for cohort studies did not have safeguards falling within 2 methodological standards (equal implementation and sufficient analysis), and the SIGN checklist for RCTs had no safeguards that addressed 3 of the methodological standards in the MASTER scale (equal recruitment, equal implementation, and temporal precedence).
Equal ascertainment and equal recruitment were the most common MASTER scale methodological standards addressed by safeguards in the 2 design-specific NOS tools (25% and 19% of safeguards in the design-specific toolset, respectively), while safeguards falling under the standards of equal retention, equal implementation, equal prognosis, and temporal precedence were least common (6%, 13%, 13%, and 13%, respectively; Figure 3). Sufficient analysis was not addressed by any safeguards in the NOS. None of the scales included safeguards from all methodological standards of the MASTER scale. The NOS case-control study scale had no safeguards belonging to 3 methodological standards (equal retention, sufficient analysis, and temporal precedence). The NOS for cohort studies had no safeguards from 2 methodological standards (equal implementation and sufficient analysis) in the MASTER scale.
Design-specific tools that included safeguards with conceptual overlap when compared with the MASTER scale
JBI tools
Conceptual overlap was found in 3 of the design-specific JBI critical appraisal tools: the case-control, quasi-experimental, and RCT tools.
Case-control:
Question 2.Were cases and controls matched appropriately?
Question7.Were strategies to deal with confounding factors stated?
Quasi-experimental:
Question 5. Were there multiple measurements of the outcome both pre and post the intervention/exposure?
Question8.Were outcomes measured in a reliable way?
RCT:
Question12.Was appropriate statistical analysis used?
Question13.Was the trial design appropriate, and any deviations from the standard RCT design (individual randomization, parallel groups) accounted for in the conduct and analysis of the trial?
SIGN tools
Conceptual overlap was found in all 3 of the design-specific SIGN tools.
Case-control:
Question 2. The cases and controls are taken from comparable populations.
Question 3. The same exclusion criteria are used for both cases and controls.
Question 6. Cases are clearly defined and differentiated from controls.
Question 7. It is clearly established that controls are non-cases.
Cohort:
Question 7. The outcomes are clearly defined.
Question 11. Evidence from other sources is used to demonstrate that the method of outcome assessment is valid and reliable.
Question 10.The method of assessment of exposure is reliable.
Question 12.Exposure level or prognostic factor is assessed more than once.
RCT:
Question 5.The treatment and control groups are similar at the start of the trial.
Question 6. The only difference between groups is the treatment under investigation.
The reason for such overlap in each of the tools is likely because each of the overlapping safeguards address different nuances of the same concept, which were unified within the MASTER scale. There were no overlapping safeguards within the NOS design-specific tools.
MASTER safeguards not used
The safeguards in the JBI design-specific tools represented 22 (61%) unique safeguards in the MASTER scale. This means that across JBI design-specific tools, there were 14 MASTER scale safeguards not implemented by any of the 5 design-specific JBI critical appraisal tools. These can be found in the supplementary material.
The SIGN design-specific tools had safeguards that represented 13 (36%) unique safeguards assembled in the MASTER scale. This means that there were 23 safeguards in the MASTER scale that were not utilized by the 3 design-specific SIGN critical appraisal tools. These can be found in the supplementary material.
There were 16 safeguards in the design-specific NOS that represented 10 (28%) unique safeguards assembled in the MASTER scale. This means that across design-specific NOS tools, there were 26 safeguards in the MASTER scale that were not utilized. Several of these were related to RCTs, a design that the NOS does not aim to assess (see supplementary material for details).
Comparison across scales of the same design
MASTER scale questions #2, #12, and #23 were incorporated across all 3 case-control tools from JBI, SIGN, and NOS (see Table 1 for frequencies). MASTER scale questions #2, #5, #11, #12, #23, and #32 were incorporated within all 3 tools for cohort studies (Table 2). MASTER scale questions #5, #9, #11, #13, #25, #26, #27, and #29 were incorporated within both JBI and SIGN tools for RCTs (Table 3).
Table 1 -
MASTER scale question captured in each of the JBI, Scottish Intercollegiate Guidelines Network, and Newcastle-Ottawa Scale tools for case-control studies
MASTER scale safeguard number |
MASTER scale safeguard |
JBI |
SIGN |
NOS |
Total |
0 |
Non-safeguard |
0 |
3 |
1 |
4 |
1 |
Data collected after the start of the study was not used to exclude participants or to select them into the analysis |
0 |
0 |
0 |
0 |
2 |
Participants in all comparison groups met the same eligibility requirements and were from the same population and timeframe |
1 |
2 |
1 |
4 |
3 |
Determination of eligibility and assignment to treatment group/exposure strategy were synchronised |
0 |
0 |
0 |
0 |
4 |
None of the eligibility criteria were common effects of exposure and outcome |
0 |
0 |
1 |
1 |
5 |
Any attrition (or exclusions after entry) was less than 20% of total participant numbers |
0 |
1 |
0 |
1 |
6 |
Missing data was less than 20% |
0 |
0 |
0 |
0 |
7 |
Analysis accounted for missing data |
0 |
0 |
0 |
0 |
8 |
Exposure variations/treatment deviations were less than 20% |
0 |
0 |
0 |
0 |
9 |
Variations in exposure or withdrawals after start of the study were addressed by the analysis |
0 |
0 |
0 |
0 |
10 |
Procedures for data collection of covariates were reliable and the same for all participants |
0 |
0 |
0 |
0 |
11 |
The outcome was objective and/or reliably measured |
1 |
0 |
1 |
2 |
12 |
Exposures/interventions were objectively and/or reliably measured |
1 |
1 |
1 |
3 |
13 |
Outcome assessor(s) were blinded |
0 |
1 |
0 |
1 |
14 |
Participants were blinded |
0 |
0 |
0 |
0 |
15 |
Caregivers were blinded |
0 |
0 |
0 |
0 |
16 |
Analyst(s) were blinded |
0 |
0 |
0 |
0 |
17 |
Care was delivered equally to all participants |
0 |
0 |
0 |
0 |
18 |
Cointerventions that could impact the outcome were comparable between groups or avoided |
0 |
0 |
0 |
0 |
19 |
Control and active interventions/exposures were sufficiently distinct |
0 |
2 |
1 |
3 |
20 |
Exposure/intervention definition was consistently applied to all participants |
1 |
0 |
0 |
1 |
21 |
Outcome definition was consistently applied to all participants |
0 |
0 |
1 |
1 |
22 |
The time period between exposure and outcome was similar across patients and between groups or the analyses adjusted for different lengths of follow-up of patients |
0 |
0 |
0 |
0 |
23 |
Design and/or analysis strategies were in place that addressed potential confounding |
2 |
1 |
1 |
4 |
24 |
Key confounders addressed through design or analysis were not common effects of exposure and outcome |
1 |
0 |
0 |
1 |
25 |
Key baseline characteristics/prognostic indicators for the study were comparable across groups |
1 |
0 |
0 |
1 |
26 |
Participants were randomly allocated to groups with an adequate randomization process |
0 |
0 |
0 |
0 |
27 |
Allocation procedure was adequately concealed |
0 |
0 |
0 |
0 |
28 |
Conflict of interests were declared and absent |
0 |
0 |
0 |
0 |
29 |
Analytic method was justified by study design or data requirements |
1 |
0 |
0 |
1 |
30 |
Computation errors or contradictions were absent |
0 |
0 |
0 |
0 |
31 |
There was no discernible data dredging or selective reporting of the outcomes |
0 |
0 |
0 |
0 |
32 |
All subjects were selected prior to intervention/exposure and evaluated prospectively |
0 |
0 |
0 |
0 |
33 |
Carry-over or refractory effects were avoided or considered in the design of the study or were not relevant |
0 |
0 |
0 |
0 |
34 |
The intervention/exposure period was long enough to have influenced the study outcome |
1 |
0 |
0 |
1 |
35 |
Dose of intervention/exposure was sufficient to influence the outcome |
0 |
0 |
0 |
0 |
36 |
Length of follow-up was not too long or too short in relation to the outcome assessment |
0 |
0 |
0 |
0 |
Total
|
|
10
|
11
|
8
|
29
|
NOS indicates Newcastle-Ottawa Scale; SIGN, Scottish Intercollegiate Guidelines Network
Table 2 -
MASTER scale question captured in each of the JBI, Scottish Intercollegiate Guidelines Network, and Newcastle-Ottawa Scale tools for cohort studies
MASTER scale safeguard number |
MASTER scale safeguard |
JBI |
SIGN |
NOS |
Total |
0 |
Non-safeguard |
0 |
4 |
1 |
5 |
1 |
Data collected after the start of the study was not used to exclude participants or to select them into the analysis |
0 |
0 |
0 |
0 |
2 |
Participants in all comparison groups met the same eligibility requirements and were from the same population and timeframe |
1 |
1 |
1 |
3 |
3 |
Determination of eligibility and assignment to treatment group/exposure strategy were synchronised |
0 |
0 |
0 |
0 |
4 |
None of the eligibility criteria were common effects of exposure and outcome |
0 |
0 |
0 |
0 |
5 |
Any attrition (or exclusions after entry) was less than 20% of total participant numbers |
1 |
1 |
1 |
3 |
6 |
Missing data was less than 20% |
0 |
0 |
0 |
0 |
7 |
Analysis accounted for missing data |
0 |
0 |
0 |
0 |
8 |
Exposure variations/treatment deviations were less than 20% |
0 |
0 |
0 |
0 |
9 |
Variations in exposure or withdrawals after start of the study were addressed by the analysis |
1 |
1 |
0 |
2 |
10 |
Procedures for data collection of covariates were reliable and the same for all participants |
0 |
0 |
0 |
0 |
11 |
The outcome was objective and/or reliably measured |
1 |
2 |
1 |
4 |
12 |
Exposures/interventions were objectively and/or reliably measured |
1 |
2 |
1 |
4 |
13 |
Outcome assessor(s) were blinded |
0 |
1 |
0 |
1 |
14 |
Participants were blinded |
0 |
0 |
0 |
0 |
15 |
Caregivers were blinded |
0 |
0 |
0 |
0 |
16 |
Analyst(s) were blinded |
0 |
0 |
0 |
0 |
17 |
Care was delivered equally to all participants |
0 |
0 |
0 |
0 |
18 |
Cointerventions that could impact the outcome were comparable between groups or avoided |
0 |
0 |
0 |
0 |
19 |
Control and active interventions/exposures were sufficiently distinct |
0 |
0 |
0 |
0 |
20 |
Exposure/intervention definition was consistently applied to all participants |
1 |
0 |
0 |
1 |
21 |
Outcome definition was consistently applied to all participants |
0 |
0 |
0 |
0 |
22 |
The time period between exposure and outcome was similar across patients and between groups or the analyses adjusted for different lengths of follow-up of patients |
0 |
0 |
0 |
0 |
23 |
Design and/or analysis strategies were in place that addressed potential confounding |
1 |
1 |
1 |
3 |
24 |
Key confounders addressed through design or analysis were not common effects of exposure and outcome |
1 |
0 |
0 |
1 |
25 |
Key baseline characteristics/prognostic indicators for the study were comparable across groups |
0 |
0 |
0 |
0 |
26 |
Participants were randomly allocated to groups with an adequate randomisation process |
0 |
0 |
0 |
0 |
27 |
Allocation procedure was adequately concealed |
0 |
0 |
0 |
0 |
28 |
Conflict of interests were declared and absent |
0 |
0 |
0 |
0 |
29 |
Analytic method was justified by study design or data requirements |
1 |
0 |
0 |
1 |
30 |
Computation errors or contradictions were absent |
0 |
0 |
0 |
0 |
31 |
There was no discernible data dredging or selective reporting of the outcomes |
0 |
0 |
0 |
0 |
32 |
All subjects were selected prior to intervention/exposure and evaluated prospectively |
1 |
1 |
1 |
3 |
33 |
Carry-over or refractory effects were avoided or considered in the design of the study or were not relevant |
0 |
0 |
0 |
0 |
34 |
The intervention/exposure period was long enough to have influenced the study outcome |
0 |
0 |
0 |
0 |
35 |
Dose of intervention/exposure was sufficient to influence the outcome |
0 |
0 |
0 |
0 |
36 |
Length of follow-up was not too long or too short in relation to the outcome assessment |
1 |
0 |
1 |
2 |
Total
|
|
11
|
14
|
8
|
33
|
NOS indicates Newcastle-Ottawa Scale; SIGN, Scottish Intercollegiate Guidelines Network.
Table 3 -
MASTER scale question captured in each of the JBI and Scottish Intercollegiate Guidelines Network tools for randomized controlled trials
MASTER scale safeguard number |
MASTER scale safeguard |
JBI |
SIGN |
Total |
0 |
Non-safeguard |
0 |
1 |
1 |
1 |
Data collected after the start of the study was not used to exclude participants or to select them into the analysis |
0 |
0 |
0 |
2 |
Participants in all comparison groups met the same eligibility requirements and were from the same population and timeframe |
0 |
0 |
0 |
3 |
Determination of eligibility and assignment to treatment group/exposure strategy were synchronised |
0 |
0 |
0 |
4 |
None of the eligibility criteria were common effects of exposure and outcome |
0 |
0 |
0 |
5 |
Any attrition (or exclusions after entry) was less than 20% of total participant numbers |
1 |
1 |
2 |
6 |
Missing data was less than 20% |
0 |
0 |
0 |
7 |
Analysis accounted for missing data |
0 |
0 |
0 |
8 |
Exposure variations/treatment deviations were less than 20% |
0 |
0 |
0 |
9 |
Variations in exposure or withdrawals after start of the study were addressed by the analysis |
1 |
1 |
2 |
10 |
Procedures for data collection of covariates were reliable and the same for all participants |
0 |
0 |
0 |
11 |
The outcome was objective and/or reliably measured |
1 |
1 |
2 |
12 |
Exposures/interventions were objectively and/or reliably measured |
0 |
0 |
0 |
13 |
Outcome assessor(s) were blinded |
1 |
1 |
2 |
14 |
Participants were blinded |
1 |
0 |
1 |
15 |
Caregivers were blinded |
1 |
0 |
1 |
16 |
Analyst(s) were blinded |
0 |
0 |
0 |
17 |
Care was delivered equally to all participants |
1 |
0 |
1 |
18 |
Cointerventions that could impact the outcome were comparable between groups or avoided |
0 |
0 |
0 |
19 |
Control and active interventions/exposures were sufficiently distinct |
0 |
0 |
0 |
20 |
Exposure/intervention definition was consistently applied to all participants |
0 |
0 |
0 |
21 |
Outcome definition was consistently applied to all participants |
1 |
0 |
1 |
22 |
The time period between exposure and outcome was similar across patients and between groups or the analyses adjusted for different lengths of follow-up of patients |
0 |
0 |
0 |
23 |
Design and/or analysis strategies were in place that addressed potential confounding |
0 |
0 |
0 |
24 |
Key confounders addressed through design or analysis were not common effects of exposure and outcome |
0 |
0 |
0 |
25 |
Key baseline characteristics/prognostic indicators for the study were comparable across groups |
1 |
2 |
3 |
26 |
Participants were randomly allocated to groups with an adequate randomisation process |
1 |
1 |
2 |
27 |
Allocation procedure was adequately concealed |
1 |
1 |
2 |
28 |
Conflict of interests were declared and absent |
0 |
0 |
0 |
29 |
Analytic method was justified by study design or data requirements |
2 |
1 |
3 |
30 |
Computation errors or contradictions were absent |
0 |
0 |
0 |
31 |
There was no discernible data dredging or selective reporting of the outcomes |
0 |
0 |
0 |
32 |
All subjects were selected prior to intervention/exposure and evaluated prospectively |
0 |
0 |
0 |
33 |
Carry-over or refractory effects were avoided or considered in the design of the study or were not relevant |
0 |
0 |
0 |
34 |
The intervention/exposure period was long enough to have influenced the study outcome |
0 |
0 |
0 |
35 |
Dose of intervention/exposure was sufficient to influence the outcome |
0 |
0 |
0 |
36 |
Length of follow-up was not too long or too short in relation to the outcome assessment |
0 |
0 |
0 |
Total
|
|
13
|
10
|
23
|
Discussion
The aim of this study was to assess whether several analytic, design-specific, methodological quality assessment tools were covered within the unified MASTER scale. We found complete coverage by the MASTER scale of the design-specific critical appraisal tools, with all safeguards accounted for. However, 14 out of the 36 MASTER scale safeguards were not included in any of the design-specific tools. For example, safeguard #28 “conflict of interests were declared and absent” from the MASTER scale was not included in any of the 5 design-specific critical appraisal tools. It is well documented that conflicts of interest, such as sponsorship, have a tendency to influence the design, conduct, and analysis of a study in such a way that study results may support the interests of the study’s financial sponsor.17–19 As such, this safeguard is included in many quality-appraisal tools in the epidemiological literature to account for the potential that a conflicting interest has caused bias. However, there is contention surrounding the inclusion of this safeguard, as it overlooks the type of bias caused by conflicting interests.20 In addition, the problem of focusing on only financial conflicts of interests when others (eg, personal, academic) exist has led to many tools excluding this safeguard. MASTER scale safeguards not included in the design-specific tools have a theoretical or empirical basis for inclusion in a quality-appraisal tool, and further investigation and consideration for inclusion in any methodological quality assessment tool, such as through meta-epidemiological evidence that accounts for this safeguard uniquely, is warranted.
While some study designs are better suited to answer a given research question, the same research question may be addressed using different study designs whose features are susceptible to different biases that need to be addressed when assessing study quality. Typically, methodological quality assessment tools are classified by research design, such as in the various toolsets described. However, this may not be necessary if the design feature is specifically included as a quality safeguard when the design protects against an element of bias, as has been achieved in the MASTER scale. Moreover, an advantage of emphasizing study design features in a unified tool (eg, the MASTER scale) over design-specific appraisal tools (eg, JBI, SIGN, NOS tools) is that because the use of study design labels in practice is inconsistent, selecting the appropriate tool is not always straightforward. Whether using design-specific tools as a set to assess multiple designs within a synthesis or a single tool for multiple designs, there is a requirement for a comprehensive list of safeguards across designs to be able to assess all aspects of the different designs comparatively. There were several safeguards from the MASTER scale not included in the design-specific tools and which spanned all 7 methodological standards in the MASTER scale, suggesting that the methodological quality assessment of a given study using the design-specific tools could be improved if the MASTER scale was used instead.
The MASTER scale was informed by a systematic review of existing tools in the epidemiological literature and performed well when mapped against individual toolsets. This study provides a back translation, and confirms adequate coverage of the MASTER scale and validation of the MASTER scale development process. Given that the MASTER scale was developed through an extensive systematic review of risk of bias tools, findings are expected to be representative of what is commonly considered bias assessment. This is because the tools used for comparison in this study were popular tools, and all risk of bias assessment tools tend to be similar in content but differ in their choice of safeguards.
Two important points need to be raised in relation to the MASTER scale. First, the MASTER scale categorizes safeguards into 7 methodological standards based on type of equivalence.7 This is different from the classic domains used in existing tools, which have “mechanisms of bias” domains, and many safeguards do not address only a single traditional domain. Thus, the categorization of safeguards into type of equivalence makes them mutually exclusive and allows meaning for the grouped safeguards by unifying them under a common platform. The MASTER scale therefore names 7 methodological standards groups in lieu of the conventional bias domains in existing tools.
Second, the MASTER scale advocates for a relative assessment framework when quantifying the results of a methodological quality assessment. The latter relies on conversion of the count of implemented safeguards into a relative rank (ie, relative to other studies in the systematic review). The relative framework works well in a unified tool because the epidemiological design itself contains safeguards that are omitted from a design-specific tool. Indeed, levels (strength) of evidence are organized by design, which indicates the usefulness of research design in quality assessments.21,22 Inapplicable safeguards then contribute to a lower quality ranking due to design deficits, and studies of different designs can be assessed simultaneously.6 For example, if there is a case-control study and an RCT design being assessed, then the inapplicable safeguards in a case-control design will allow a reduction in rank of the study by design, even if non-design-related safeguards are all present. Design-specific tools cannot be used in this fashion, even if considered a set, because the study will not be assessed on the same scale as all other studies, even if the tools were to be combined.
It is important to note that the count of safeguards across a tool is study-specific (ie, belongs to the study) and scale-specific (ie, belongs to the tool). The only difference between tools is the number of safeguards. Therefore, if each study has a count of safeguards within Tool A, it can easily be rescaled to a count of safeguards within Tool B. This means that once a study has been assessed with Tool A, reassessing with Tools B, C, or D is not required (the counts can simply be rescaled). Of course, this assumes that important safeguards retain coverage across different tools.
Counts of safeguards are not useful for metaanalysis because there is no reason why a specific count in Tool A should serve as a threshold where bias is less, as this would vary across tools based on the number of safeguards. Even when the same tool is used across studies in a meta-analysis, the counts are still not useful because they are benchmarked to the number in the tool, and there is no reason to benchmark each study to this arbitrary number. Therefore, to avoid this issue, we should look at studies relative to the best study in the meta-analysis to assess if they differ based on quality. Even if all the studies are poor quality and have a single safeguard each, when looked at relatively, they would all have the same relative rank, which is all that matters (ie, they cannot be differentiated based on quality assessment). The purpose of a methodological quality assessment is therefore to compare better studies with poorer studies or to bias-adjust the meta-analysis, not to pass judgment on a particular study (as is done in the risk of bias framework). Qualitatively, however, a general statement can be made about safeguards implemented, as depicted in Figures 1–3.
Although it may be considered that not all safeguards are equally important, we still do not know the optimal weight for each safeguard, and this will require future study. However, those who make risk of bias judgments implicitly assume this information when making these judgments based on personal assessments of the team that built the tool. Many risk of bias tools group safeguards into domains and rename these safeguards as signaling questions. The purpose of this is to consider the safeguard within an algorithm to make a judgment. One of the reasons why signaling questions are not used in the MASTER scale is because we do not recommend making absolute judgments.11
With methodological quality assessment tools, each study receives a quality count/score that belongs to the study/outcome combination and not to the synthesis. However, each synthesis uses its own relative rank based on these quality scores. In other words, the comparison of studies based on their assessment within a synthesis is not based on counts of safeguards but rather on the rank of the study within each data synthesis vis-à-vis these scores. This is why we consider the assessment through the MASTER scale as transferable across methodological quality assessment tools, but the implementation derived from this assessment is specific to each synthesis. We therefore advocate ranking over the use of raw scores to compare studies in terms of methodological quality within each synthesis of interest.
Study limitations
A potential limitation of this study is that the process of mapping safeguards involved matching safeguards in each tool 1:1. However, in this study, they were matched to the best safeguard in the MASTER scale, which may have exaggerated the percentage of missing safeguards we report. Nevertheless, this was infrequent and it does not materially affect our conclusions. A limitation (by intention) of the MASTER scale is that it does not map to safeguards from tools developed for noncomparative and prevalence designs, including other tools within the JBI toolset that were excluded in this study. Examples include nonquantitative studies, reports of cases, nonconventional designs, or systematic reviews. Therefore, reviewers assessing such study type will not be able to use the MASTER scale. Further research is required to evaluate the experiences of users in terms of time to complete the scale, understanding the safeguards within the tool, and its extension beyond analytical studies.
Conclusion
The results of the current study have implications for the move toward a unified system for bias assessment from design-specific tools for methodological quality assessment. The main advantages of the MASTER scale are the avoidance of redundancy and of problems when comparing across multiple study designs, making it an attractive alternative to design-specific tools. Switching to the MASTER scale can mitigate the problems we have flagged with design-specific tools as well as present an opportunity to initiate a unified framework for bias assessment in health research.
Funding
JCS is supported by the Australian National University Higher Degree by Research Scholarship and Erasmus+ programme (Radboud University) scholarship.
ZM is supported by an NHMRC Investigator grant, APP1195676.
References
1. Moher D, Jadad AR, Nichol G, Penman M, Tugwell, Walsh S, et al. Assessing the
quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials 1995;16(1):62–73.
2. Sackett DL.
Bias in analytic research. J Chronic Dis 1979;32(1-2):51–63.
3. Olivo SA, Macedo LG, Gadotti IC, Fuentes J, Stanton T, Magee DJ. Scales to assess the
quality of randomized controlled trials: a systematic review. Phys Ther 2008;88(2):156–75.
4. Sanderson S, Tatt ID, Higgins JPT. Tools for assessing
quality and susceptibility to
bias in observational studies in epidemiology: a systematic review and annotated bibliography. Int J Epidemiol 2007;36(3):666–76.
5. Wang Y, Ghadimi M, Wang Q, Hou L, Zeraatkar D, Iqbal A, et al. Instruments assessing risk of
bias of randomized trials frequently included items that are not addressing risk of
bias issues. J Clin Epidemiol 2022;152:218–225.
6. Stone JC, Glass K, Clark J, Ritskes-Hoitinga M, Munn Z, Tugwell P, et al. The MethodologicAl STandards for Epidemiological Research (MASTER) scale demonstrated a unified framework for
bias assessment. J Clin Epidemiol 2021;134:52–64.
7. Stone JC, Glass K, Clark J, Munn Z, Tugwell P, Doi SAR. A unified framework for
bias assessment in clinical research. Int J Evid Based Healthc 2019;17(2):106–20.
8. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the
quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials 1995;16(1):62–73.
9. JBI.
Critical appraisal tools 2020 [internet]. Adelaide: JBI; 2020 [cited 2022 Feb 11]. Available from:
https://jbi.global/critical-appraisal-tools.
10. Aromataris E, Munn Z. JBI Manual for Evidence Synthesis [internet]. Adelaide, JBI; 2020 [cited 2022 Feb 11]. Available from:
https://synthesismanual.jbi.global.
11. Stone JC, Gurunathan U, Aromataris E, Glass K, Tugwell P, Munn Z, et al.
Bias assessment in outcomes research: the role of relative versus absolute approaches. Value Health 2021;24(8):1145–9.
12. Jordan Z, Lockwood C, Aromataris E, Pilla B, Porritt K, Klugar M, et al. JBI series paper 1: introducing JBI and the JBI Model of EHBC. J Clin Epidemiol 2022;150:191–5.
13. Munn Z, Barker TH, Moola S, Tufanaru C, Stern C, McArthur A, et al. Methodological
quality of case series studies: an introduction to the JBI
critical appraisal tool. JBI Evid Synth 2020;18(10):2127–33.
14. Aromataris E, Stern C, Lockwood C, Barker TH, Klugar M, Jadotte Y, et al. JBI series paper 2: tailored evidence synthesis approaches are required to answer diverse questions: a pragmatic evidence synthesis toolkit from JBI. J Clin Epidemiol 2022;150:196–202.
15. Scottish Intercollegiate Guidelines Network.
Methodology checklists [internet]. SIGN; 2020 [cited 2022 Feb 11]. Available from:
https://www.sign.ac.uk/what-we-do/methodology/checklists/.
16. Wells GA, Shea B, O’Connell D, Peterson J, Welch V, Losos M, et al. The Newcastle-Ottawa Scale (NOS) for assessing the
quality of nonrandomised studies in meta-analyses [internet]. The Ottawa Hospital; n.d. [cited 2022 Feb 11]. Available from:
https://www.ohri.ca/programs/clinical_epidemiology/oxford.asp.
17. Bastian H. “They would say that, wouldn’t they?” A reader’s guide to author and sponsor biases in clinical research. J R Soc Med 2006;99(12):611–4.
18. Lexchin J, Bero LA, Djulbegovic B, Clark O. Pharmaceutical industry sponsorship and research outcome and
quality: systematic review. BMJ 2003;326(7400):1167–70.
19. Lundh A, Lexchin J, Mintzes B, Schroll JB, Bero L. Industry sponsorship and research outcome. Cochrane Database Syst Rev 2017;2(2):MR000033.
20. Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.3 [internet]. Cochrane; 2022 [cited 2022 Apr 10]. Available from:
http://training.cochrane.org/handbook.
21. Feinstein AR. Clinical biostatistics. XLVIII. Efficacy of different research structures in preventing
bias in the analysis of causation. Clin Pharmacol Ther 1979;26(1):129–41.
22. National Health and Medical Research Council. NHMRC levels of evidence and grades for recommendations for developers of guidelines. NHMRC; 2009 [cited 2022 Feb 11]. Available from:
https://www.nhmrc.gov.au/sites/default/files/images/NHMRC%20Levels%20and%20Grades%20(2009).pdf.