A panel of 6 internationally recognised experts from different institutions was established to critique the CATRAS. The expert panel comprised 6 of the authors (T.B., J.T.B., B.D.X.L., S.A.R., P.V.M.S., and P.M.T.). Panel members were selected based on their professional certifications and credentials, clinical experience and publication profile in translational research, veterinary pain management, and PAT construction.
The study used Delphi methodology to develop consensus from a panel of experts, by means of surveys conducted over 3 rounds, to ensure that the 3 domains and items generated in the development of the CATRAS were not merely a function of the smaller working group by means of surveys conducted over 3 rounds.22,29 The objective of the first round was to gauge the completeness of the domains (and items within each) to assess adequately the quality of analgesia studies. Definitions for the domains and items within each were provided to enable comparison of each domain and item against its definition. Members of the expert panel were invited to contribute as many ideas as they wished in response to 2 open-ended questions regarding quality in analgesia studies: (1) “Are there additional domains beyond those already encompassed (ie, LOE, methodological soundness, and grading of the PAT), which you consider integral in comprehensively assessing the quality of analgesia studies? If YES, please list and explain your answer.” (2) Within the existing 3 domains, what factors not already encompassed by existing items (if any), do you consider important for assessing the quality of analgesia studies?”
In the second round, the responses obtained in round 1 were collated into one document by the working group and redistributed to the panel for individual rating of relevance as well as evaluation of the clarity of item construction and wording. Participants were asked to accept, reject, or suggest modification to points arising from round 1 relating to existing items or suggest additional items within each domain.
Information acquired from round 2 was then incorporated into the round 3 questionnaires with the addition of the participant's own ratings and comments for each item as a reminder. Thus, separate round 3 questionnaires were developed for each member of the expert panel. Panel members reviewed and rerated the items in the light of new information from the opinion of the group as a whole. Consensus for inclusion of each item was predefined as acceptance by 4 or more members of the expert panel (>4/6, >66%) without any further modification being recommended by any of the endorsing members. Any modifications were to be rated individually in a subsequent round, with consensus for inclusion being predefined as previously described.
The panel of experts reviewed the content and relevance of the CATRAS and evaluated the appropriateness of each item of the tool as well as the tool's relevance as a whole using the following Likert-type scheme: 1 = not relevant, 2 = somewhat relevant, 3 = quite relevant, and 4 = very relevant.28
For each item of the CATRAS, the content validity (item-level content validity index [I-CVI]) was calculated by dividing the number of experts assigning a rating of either 3 or 4 by the total number of experts—that is, the proportion of experts in agreement concerning the relevance. For example, an item rated as “quite relevant” or “very relevant” by 4 of 6 experts would have an I-CVI of 0.67.37 The kappa coefficient for individual items was also calculated using previously described methodology.50 Evaluation criteria for kappa used guidelines described in Cicchetti and Sparrow (1981) and Fleiss et al. (2013): Fair = kappa of 0.40 to 0.59; Good = kappa of 0.60 to 0.74; and Excellent = kappa >0.74.8,15 Items were considered to have adequate content validity for inclusion in the CATRAS if they achieved an I-CVI of 0.83 or greater and a kappa coefficient of 0.81 or greater. Kappa coefficients and I-CVI were calculated, and based on published recommendations, a cutoff point for an item to remain in the tool was predefined as 0.81 and 0.83, respectively (reflecting one disagreement).30,37
The content validity of the tool as a whole (scale-level CVI [S-CVI]) was evaluated using previously described methodology, whereby the S-CVI is calculated as the average I-CVI across all items of the tool.37 Based on published recommendations, the minimum S-CVI required for the CATRAS to achieve content validity for the tool as a whole was predefined as 0.90.37,49
A final list of quality items that achieved consensus agreement from the panel of experts was collated by the working group. Items were grouped within their respective domains and weighted to reflect their importance within each domain.
The sum total of all weighted items within each individual domain was transcribed into a percentage by dividing the attributed score by the maximum possible score of each respective domain. Transcription of the score into a percentage allowed for standardisation between the 3 domains of the CATRAS. Each domain was assigned equal weighting. Members of the expert panel were then asked to accept, reject, or suggest modification to the allocation of item scores and weightings. Consensus for inclusion of item scores and weightings were predefined as acceptance by 4 or more members of the expert panel without any further modification being recommended by any of the endorsing members. Any modifications were to be rated individually in a subsequent round, with consensus for inclusion being predefined as previously described.
During the round 1 review of the draft CATRAS, no additional domains were deemed necessary by any member of the expert panel. The following item was added to the methodological soundness domain in each of the 5 possible categories (A–E): “Was ethical/institutional review board approval of the study stated?” This addition was unanimously accepted during rounds 2 and 3 of the review process, and no further items were modified or excluded.
The content validity of the final remaining 67 items of the CATRAS resulted in a 97% (S-CVI = 0.97) agreement, indicating that the tool achieved excellent content validity.
After content validation, the working group assigned scores and weightings to the 67 quality items and the associated domains. The assigned scores and weightings for all 67 quality items and the associated domains were unanimously accepted without modification by the expert panel. The final results derived from application of the CATRAS are 3% scores (ie, one for each domain). The final version of the CATRAS is shown in the Supplemental Table (available at http://links.lww.com/PR9/A29). An example of the application of the CATRAS can be found in (Supplemental Appendix 1, http://links.lww.com/PR9/A22).
In 2014, a working group (L.N.W. and S.H.B.) found no evidence of a published CAT designed specifically to evaluate the quality of published analgesia studies in any species. To address this absence, the authors designed and validated a 67-item CAT (Supplemental Table, available at http://links.lww.com/PR9/A29) that incorporates 3 domains (LOE, methodological soundness, and grading of the PAT).
Level of evidence is used by many review processes to create order and simplicity from the heterogeneity of published studies, and is assigned according to the study type and its inherent likelihood to exclude bias.4,31 The methodological quality and transparent reporting of an analgesia study is a key factor to consider when assessing its translational value.36,41 The quality items listed within the CATRAS to assess methodological soundness are primarily based on those used in the RECOVER initiative process, which were originally derived from CATs designed by the OCEBM.4,6 In addition, all the quality items described in methodological soundness category A of the CATRAS are part of the Consolidated Standards of Reporting Trials (CONSORT) 2010 “checklist of information to include when reporting a randomised trial.”35 The CONSORT statement was developed to improve the standard of reporting of randomised controlled trials for medical interventions.3 Furthermore, the methodological soundness domain of the CATRAS also complies with the Animal Research Reporting In Vivo Experiments (ARRIVE) guidelines methodology section, which highlights details of bias reduction tactics such as sample size calculation, random allocation to groups, and observer blinding.25,41 After recommendations from the panel of experts, the item “was ethical/institutional review board approval of the study stated?” was added to the methodological soundness domain (categories A, B, C, D, and E) both to strengthen the tool and to promote ethical research.
Content validity concerns the degree to which a scale has an appropriate sample of items to represent the construct of interest; that is, whether the domain(s) of content for the construct is adequately represented by the items.37 Content validity of the CATRAS was reviewed using a panel of 6 experts selected according to previously defined criteria.12,18 In addition, widespread geographical distribution of the panel members (Australia, Brazil, Canada, United Kingdom, and United States) allowed for differences in colloquial terms that could affect instrument comprehension by many diverse groups.18
There are potential biases in the methodology used. First, the statements are not an inventory of every aspect of methodology that could impact on trial quality. In an attempt to reduce the likelihood of this bias, the working group obtained consensus opinion from individuals with direct experience of conducting studies involving assessment of pain; as knowledge of the subject matter is considered the most significant assurance of a valid outcome using the Delphi methodology.47 Second, the reliability of the findings relating to validity coefficients may have been influenced by including the same individuals in both the content consensus and subsequently also as raters during the content validity process. The working group attempted to minimise this bias by both sequencing the order of the tasks and by their temporal separation: the consensus process occurred approximately 10 months before the content validity process.
A widely accepted method of quantifying content validity for multi-item tools such as CATRAS is the CVI based on expert rating of relevance.37 The CVI is an index of consensus and the extent to which experts share a common interpretation of the construct of a tool.46 A CVI was calculated for each quality item of the 3 domains (I-CVI) as well as for the overall CATRAS as the whole tool (S-CVI), thereby providing an index of interrater agreement. Critics of the CVI cite concerns about the possibility of inflated values because of the risk of chance agreement.50 In an attempt to address this issue, the current study used a previously described modified kappa-like index that adjusts each I-CVI for chance agreement or disagreement.37 Fifty-seven items received 100% agreement (I-CVI = 1); and ten items received 83% agreement (I-CVI = 0.83); there was no consistency observed in relation to individual members of the expert panel who rejected items of the CATRAS (including geographical location of the experts). Based on previously published guidelines citing the acceptable I-CVI in relation to the number of expert raters, these items of the CATRAS achieved adequate item-level content validity. In addition, using previously described evaluation criteria for kappa, these items were considered to have excellent agreement on relevance.8,15
Assessment of the S-CVI for the combined final 67 items of the CATRAS resulted in a 97% (0.97) agreement, indicating that the tool achieved excellent content validity. This result is considerably higher than published recommendations of the minimum S-CVI required for validation. Critical appraisal tool developers often set a criterion of 80% (0.80) or better agreement among expert reviewers as the lower limit of acceptability for an S-CVI.11 However, we chose to adopt the more stringent recommendations of Waltz et al.49 who set the lower limit of acceptable agreement at 90% (0.90).
It is widely accepted that interpretation of the results of a particular study should be informed by the quality of all aspects of the trial: the higher the quality, the greater the confidence in the validity and utility of the findings.51 When evaluating analgesia studies, the quality of the PAT used must be considered as a significant factor determining the quality of the report. Previously, consideration of the impact of the PAT on the strength of evidence has not been possible. Development of the CATRAS may now enable evaluation of the strength of findings from published analgesia studies based on a more thorough assessment of quality. The need to assess the original or revised literature for the purpose of grading the PAT may decrease the time efficiency of the CATRAS and might be considered a limitation by some users. Future work could streamline this process by establishing precalculated grades for the commonly used PATs. Another potential limitation of the CATRAS could be that because of the nonlinearity of both visual analogue scales and numeric rating scales, calculation of sensitivity and specificity (domain 3, item 3.3 and 3.4) may not be relevant for assessing these scales and as such they may be slightly downgraded by up to 4% of the overall possible score for domain 3. An interpretation of the results obtained from the grading of a PAT can be found in (Supplemental Appendix 2, http://links.lww.com/PR9/A23).
The final results derived from application of the CATRAS are 3% scores (ie, one for each domain). Illustration of these results should be at the discretion of the investigators; however, a radar chart would allow for a two-dimensional representation of the 3% scores on axes starting from the same point (Supplemental Appendix 1, http://links.lww.com/PR9/A22). Charting of the data in this way enables clear visual representation of the quality of a specific analgesia study in relation to the LOE, methodological soundness, and grading of the PAT.
Central to the practice of evidence-based medicine (EBM) is the process of asking well-focused questions, searching for the best available evidence, critically appraising that evidence for quality and validity, then applying the results to improve patient outcomes.20,42 The CATRAS is designed to facilitate the practice of EBM by enabling a quantitative quality assessment of an individual published study's evidence supporting (or rejecting) the clinical question being investigated. The CATRAS can be used to explore the influence of study quality or design methodology on the strength of the results and conclusions. We endorse the use of a systematic EBM approach that provides an explicit framework for formulating the clinical question or statement under investigation in terms of its 4 key parts—Problem/Population, Intervention, Comparison, and Outcome (PICO).1
The CAT developed in this study offers several benefits for assessing the quality of analgesia studies involving subjects incapable of self-reporting pain: its content was developed through the consensus of experts; it captures features of study design methodology, which are widespread in this field; and content validation has been established. The next step in the development of this important tool would be to apply the CATRAS in a systematic review of the literature focused on questions arising from analgesia studies.
The authors have no conflict of interest to declare.
The authors thank Associate Professor Graham Hepworth of The University of Melbourne, Statistical Consulting Centre, for his statistical advice.
Supplemental digital content
Supplemental digital content associated with this article can be found online at http://links.lww.com/PR9/A29, http://links.lww.com/PR9/A22, and http://links.lww.com/PR9/A23.
. Armstrong EC. The well-built clinical question: the key to finding the best evidence efficiently. WMJ 1999;98:25–8.
. Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, Liberati A, O'Connell D, Oxman AD, Phillips B, Schünemann H, Edejer TTT, Vist GE, Williams JW; GRADE Working Group. Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches the GRADE Working Group. BMC Health Serv Res 2004;4:38.
. Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, Pitkin R, Rennie D, Schulz KF, Simel D, Stroup DF. Improving the quality of reporting of randomized controlled trials. JAMA 1996;276:637–9.
. Boller M, Fletcher DJ. RECOVER evidence and knowledge gap analysis on veterinary CPR. Part 1: evidence analysis and consensus process: collaborative path toward small animal CPR guidelines. J Vet Emerg Crit Care 2012;22:S4–S12.
. Brondani JT, Mama KR, Luna SPL, Wright BD, Niyom S, Ambrosio J, Vogel PR, Padovani CR. Validation of the English version of the UNESP-Botucatu multidimensional composite pain
scale for assessing postoperative pain
in cats. BMC Vet Res 2013;9:143.
. Cicchetti DV, Sparrow SA. Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. Am J Ment Defic 1981;86:127–37.
. Cooper S, Desjardins P, Turk D, Dworkin R, Katz N, Kehlet H, Ballantyne J, Burke L, Carragee E, Cowan P, Croll S, Dionne R, Farrar J, Gilron I, Gordon D, Iyengar S, Jay G, Kalso E, Kerns R, McDermott M, Raja S, Rappaport B, Rauschkolb C, Royal M, Segerdahl M, Stauffer J, Todd K, Vanhove G, Wallace M, West C, White R, Wu C. Research design considerations for single-dose analgesic clinical trials in acute pain
: IMMPACT recommendations. PAIN
. Corbett A, Achterberg W, Husebo B, Lobbezoo F, de Vet H, Kunz M, Strand L, Constantinou M, Tudose C, Kappesser J, de Waal M, Lautenbacher S. An international road map to improve pain
assessment in people with impaired cognition: the development of the Pain
Assessment in Impaired Cognition (PAIC) meta-tool. BMC Neurol 2014;14:229.
. Davis L. Instrument review: getting the most from a panel of experts. Appl Nurs Res 1992;5:194–97.
. Davis L, Grant J. Guidelines for using psychometric consultants in nursing studies. Res Nurs Health 1993;16:151–55.
. de Grauw JC, van Loon JPAM. Systematic pain
assessment in horses. Vet J 2016;209:14–22.
. Dixon-Woods M, Booth AJ, Sutton AJ. Synthesizing qualitative research: a review of published reports. Qual Res 2007;7:375–422.
. Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions. New Jersey: John Wiley & Sons, 2013.
. Gélinas C. A validated approach to evaluating psychometric properties of pain
assessment tools for use in nonverbal critically ill adults. Semin Respir Crit Care Med 2013;34:153.
. Gordon HG, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ. Rating quality of evidence and strength of recommendations: GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924–26.
. Grant J, Davis L. Selection and use of content experts for instrument development. Res Nurs Health 1997;20:269–74.
. Group GW. Grading quality of evidence and strength of recommendations. BMJ 2004;328:1490–94.
. Guyatt G, Cairns J, Churchill D, Cook D, Haynes B, Hirsh J, Irvine J, Levine M, Levine M, Nishikawa J, Sackett D, Brill-Edwards P, Gerstein H, Gibson J, Jaeschke R, Kerigan A, Neville A, Panju A, Detsky A, Enkin M, Frid P, Gerrity M, Laupacis A, Lawrence V, Menard J, Moyer V, Mulrow C, Links P, Oxman A, Sinclair J, Tugwell P. Evidence-based medicine: a new approach to teaching the practice of medicine. JAMA 1992;268:2420–25.
. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck Ytter Y, Alonso Coello P, Schünemann HJ. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924–26.
. Hasson F, Keeney S, McKenna H. Research guidelines for the Delphi survey technique. J Adv Nurs 2000;32:1008–15.
. Hellyer PW. Treatment of pain
in dogs and cats. J Am Vet Med Assoc 2002;221:212–15.
. Jüni P, Altman DG, Egger M. Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ 2001;323:42–6.
. Kilkenny C, Browne W, Cuthill I, Emerson M, Altman D. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol 2010;8:e1000412.
. Langford D, Bailey A, Chanda M, Clarke S, Drummond T, Echols S, Glick S, Ingrao J, Klassen Ross T, LaCroix Fralish M, Matsumiya L, Sorge R, Sotocinal S, Tabaka J, Wong D, van den Maagdenberg AMJM, Ferrari M, Craig K, Mogil J. Coding of facial expressions of pain
in the laboratory mouse. Nat Methods 2010;7:447–49.
. Lichtner V, Dowding D, Esterhuizen P, Closs SJ, Long A, Corbett A, Briggs M. Pain
assessment for people with dementia: a systematic review of systematic reviews of pain
assessment tools. BMC Geriatr 2014;14:138.
. Likert R. A technique for the measurement of attitudes. Arch Psychol 1932;140:1–55.
. Linstone HA, Turoff M. The Delphi method: techniques and applications. Reading: Addison-Wesley Publishing Company, 1975.
. Lynn MR. Determination and quantification of content validity. Nurs Res 1986;35:382–5.
. Martín-Hernández H, López-Messa JB, Pérez-Vela JL, Herrero-Ansola P. ILCOR 2010 recommendations. The evidence evaluation process in resuscitation. Med Intensiva 2011;35:249–55.
. Marzinski LR. The tragedy of dementia: clinically assessing pain
in the confused, nonverbal elderly. J Gerontol Nurs 1991;17:25–8.
. McGrath P, Walco G, Turk D, Dworkin R, Brown M, Davidson K, Eccleston C, Finley GA, Goldschneider K, Haverkos L, Hertz S, Ljungman G, Palermo T, Rappaport B, Rhodes T, Schechter N, Scott J, Sethna N, Svensson O, Stinson J, von Baeyer C, Walker L, Weisman S, White R, Zajicek A, Zeltzer L. Core outcome domains and measures for pediatric acute and chronic/recurrent pain
clinical trials: PedIMMPACT recommendations. J Pain
. Mignini LE, Khan KS. Methodological quality of systematic reviews of animal studies: a survey of reviews of basic research. BMC Med Res Methodol 2006;6:10.
. Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux P, Elbourne D, Egger M, Altman DG. CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c869–c69.
. Percie du Sert N, Rice ASC. Improving the translation of analgesic drugs to the clinic: animal models of neuropathic pain
. Br J Pharmacol 2014;171:2951–63.
. Polit D, Beck C, Owen S. Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Res Nurs Health 2007;30:459–67.
. Pudas-Tähkä SM, Axelin A, Aantaa R, Lund V, Salanterä S. Pain
assessment tools for unconscious or sedated intensive care patients: a systematic review. J Adv Nurs 2009;65:946–56.
. Resnik D, Rehm M. The undertreatment of pain
: scientific, clinical, cultural, and philosophical factors. Med Health Care Philos 2001;4:277–88.
. Rice AS, Cimino-Brown D, Eisenach JC, Kontinen VK, Lacroix-Fralish ML, Machin I, Mogil JS, Stöhr T, Consortium PP. Animal models and the prediction of efficacy in clinical trials of analgesic drugs: a critical appraisal and call for uniform reporting standards. PAIN
. Rice ASC, Morland R, Huang W, Currie G, Sena E, Macleod M. Transparency in the reporting of in vivo pre-clinical pain
research: the relevance and implications of the ARRIVE (animal research: reporting in vivo experiments) guidelines. Scand J Pain
. Schardt C, Adams MB, Owens T, Keitz S, Fontelo P. Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med Inform Decis Mak 2007;7:16.
. Schünemann HJ, Jaeschke R, Cook DJ, Bria WF, El-Solh AA, Ernst A, Fahy BF, Gould MK, Horan KL, Krishnan JA, Manthous CA, Maurer JR, McNicholas WT, Oxman AD, Rubenfeld G, Turino GM, Guyatt G. An official ATS statement: grading the quality of evidence and strength of recommendations in ATS guidelines and recommendations. Am J Respir Crit Care Med 2006;174:605–14.
. Sengstaken EA, King SA. The problems of pain
and its detection among geriatric nursing home residents. J Am Geriatr Soc 1993;41:541–44.
. Sotocinal SG, Sotocina S, Sorge R, Zaloum A, Tuttle A, Martin L, Wieskopf J, Mapplebeck J, Wei P, Zhan S, Zhang S, McDougall J, King O, Mogil J. The Rat Grimace Scale: a partially automated method for quantifying pain
in the laboratory rat via facial expressions. Mol Pain
. Stemler SE. A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Pract Assess Res Eval 2004;9:1.
. Stone F, Busby D. The Delphi research methods in family therapy. New York: Guildford, 1996.
. Turk D, Dworkin R, Burke L, Gershon R, Rothman M, Scott J, Allen R, Atkinson JH, Chandler J, Cleeland C, Cowan P, Dimitrova R, Dionne R, Farrar J, Haythornthwaite J, Hertz S, Jadad A, Jensen M, Kellstein D, Kerns R, Manning D, Martin S, Max M, McDermott M, McGrath P, Moulin D, Nurmikko T, Quessy S, Raja S, Rappaport B, Rauschkolb C, Robinson J, Royal M, Simon L, Stauffer J, Stucki G, Tollett J, von Stein T, Wallace M, Wernicke J, White R, Williams A, Witter J, Wyrwich K. Developing patient-reported outcome measures for pain
clinical trials: IMMPACT recommendations. PAIN
. Waltz CF, Strickland OL, Lenz ER. Measurement in nursing and health research. New York: Springer Publishing Company, 2010.
. Wynd CA, Schmidt B, Schaefer MA. Two quantitative approaches for estimating content validity. West J Nurs Res 2003;25:508–18.
. Yates S, Morley S, Eccleston C, de C Williams AC. A scale for rating the quality of psychological trials for pain
. Zwakhalen SM, Hamers JP, Abu-Saad HH, Berger MP. Pain
in elderly people with severe dementia: a systematic review of behavioural pain
assessment tools. BMC Geriatr 2006;6:3.