Physical inactivity in adults, a leading contributor to multiple noncommunicable diseases (e.g., metabolic disorders, certain cancers, cardiovascular disease), increases risk of premature mortality in the United States (1). Physical activity is defined as any bodily movement produced by skeletal muscle that increases energy expenditure above rest, whereas exercise (a subset of physical activity) refers to structured, repetitive activity conducted with the goal of improving one or more components of physical fitness (2). Evidence-based guidelines promoted by the American College of Sports Medicine and the United States Office of Disease Prevention and Health Promotion state adults should obtain the metabolic equivalent of 150 min of moderate-intensity aerobic exercise and physical activity (ExPA) and engage in resistance exercises of all major muscle groups two or more times, per week (3,4). Although easy to interpret, these guidelines do not explicitly state how to reach or maintain these goals once achieved. Presently, only 20% of American adults meet both aerobic and strengthening recommendations (5), with different factors cited preventing them from regularly engaging in ExPA, including internal barriers (e.g., lack of motivation, boredom, and time constraints) and environmental barriers (e.g., weather conditions and lack of exercise equipment), lack of support from family or friends, and physical or health limitations (6). Researchers have used different methods to improve ExPA levels in inactive populations, including educating individuals on practices and benefits (7,8), incorporating material/monetary incentives (9,10), and applying cognitive behavioral strategies (8,11). However, although various types of behavioral interventions are often successful in initiating ExPA adoption, poor long-term adherence poses a major concern (12,13).
To date, behavior researchers rely heavily on randomized controlled trials (RCT) for intervention development, as this method represents the gold standard in testing causal relationships (14,15). However, the relatively slow pace and high cost of conducting RCT may place substantial barriers in translating research from basic biological/psychological testing to real-world practice (16). Further, RCT outcomes tend to focus on differences between group means, limiting researchers’ understanding of potentially important factors between and within participants’ responses to treatment (16,17). These limitations sometimes cause promising behavioral treatments to be abandoned, rather than refined, if they do not achieve statistically significant outcomes early on in their development (18). In addition, treatments not tested rigorously in preefficacy and efficacy trials can ultimately fail if prematurely translated to patient and community populations. Different frameworks, such as the Medical Research Counsel guidelines for developing and evaluating complex interventions (19) and the Obesity-Related Behavioral Interventions Trials model (18), have been established to guide researchers in rigorously testing aspects of health-related interventions early on in development before dissemination at the clinical and community level. These frameworks provide a basis for the intervention refinement process; they are general, making them adaptable to a wide variety of health and behavior-related interventions, but do not provide specific methods to use in the various stages of refinement, leaving much up to interpretation by researchers. There is a critical need for researchers to develop, test, and refine behavioral interventions in preefficacy stages using methods that are efficient and rigorous but also flexible (12). In doing so, more successful, efficacious health-based interventions may be translated to the general population.
The purpose of preefficacy designs is to test and define appropriate intervention components based on preliminary measurements of causation. Different types of preefficacy designs exist, each with its own goals and standards to meet. For example, experimental and observational studies in a laboratory setting or in the field enable researchers to identify and define potential treatment components necessary to affect behavior (18). Quasi-experimental studies, where participants act as their own control and pre-/postmeasurement means are analyzed, help researchers determine proof of concept and whether a design warrants more rigorous testing (18). Pilot studies allow the protocol to be implemented at a small scale (e.g., one person or group) and allow researchers to ascertain whether clinically significant outcomes can be replicated in a larger sample (18). Feasibility testing lets researchers assess the practicality of design protocols and provides estimates for future efficacy trials (18). These designs can be built on one another to define and refine intervention components to be tested in future randomized trials (18,19). Although the use of these various preefficacy designs is common in ExPA research, experts behind the Medical Research Counsel and the Obesity-Related Behavioral Interventions Trials model, as well as several behavioral researchers, endorse the increased use of single-case designs (SCD) in preefficacy stages of intervention development (12,18–20).
Although the terminology for SCD can vary (e.g., single-case experimental design, small-case design, and single-subject design), the primary purpose of these designs is to make causal inferences using relatively small sample sizes (~6–20 participants or cases). Multiple SCD approaches exist that use the methods of delayed-treatment onset (multiple baseline design), treatment reversal (ABAB design), treatment progression (changing criterion design), or combined methodologies in the intervention (12). Because of the small sample sizes, each of these different methods relies on participants serving as their own controls to enhance internal validity (12). Further, because they are underpowered for traditional parametric statistics, researchers use visual analysis to assess clinical relevance regarding primary outcomes. A unique characteristic of SCD, unlike traditional pre-/postdesigns, is the requirement of multiple measurements taken within baseline and intervention phases. This intensive assessment of participants’ behavior over time accommodates the internal “idiosyncratic and dynamic” behavioral changes individuals inevitably experience within the intervention (12). More in-depth data per participant can yield useful insight into inter-/intraindividual responses to treatment in preefficacy stages of intervention design (12). For these reasons, SCD could be an insightful and cost-effective approach to be used in early stages of intervention development before translation to larger preefficacy designs (e.g., quasi-experimental studies, pilot studies) and eventual randomized trials. Unfortunately, SCD methodologies appear to be relatively underutilized in ExPA research (20), potentially because of the lack of awareness or misconceptions about perceived lack of rigor in these designs among those trained in conventional RCT methodology (21,22).
In a previous systematic review, Gorczynski (20) examined available ExPA research literature to identify studies that used SCD methodology. Ten studies were summarized based on their specific method (AB, ABA, and ABCA), with sample populations including children and adults. Seven of these studies took place with the goal of rehabilitation for injury, disease, or disability through ExPA, with researchers measuring improvement through functional mobility or performance outcomes. Gorczynski noted that within this small body of literature, study designs were sufficient regarding three factors necessary to increase rigor of SCD implementation, and he emphasized that exercise researchers should focus on these factors when implementing SCD: the use of 1) validated ExPA/fitness measures, 2) appropriate statistical analyses to determine phasic differences, and 3) establishment of stable baseline periods before treatment (20). Although important, these factors represent a small portion of criteria listed in specific tools (12,23) designed to objectively quantify methodological quality of SCD. Further, only three studies in this review included interventions designed to increase long-term ExPA behavior. Given the call to use these designs in behavioral interventions (12,18–20), and to expand on Gorczynski’s (20) focus on rigor, the purpose of this review was to use the aforementioned established tools to assess the quality of SCD specifically testing the impact of behavioral interventions to promote ExPA in adults.
A systematic search following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (24) was conducted between July and October 2017 (Fig. 1). Figure 1 shows the information flow during the literature search pertaining to the PRISMA guidelines. Peer-reviewed articles were retrieved from PubMed, Web of Science, and PsycINFO. Eight terms describing types of SCD (e.g., “single-case design” and “small-case design”) were individually paired with five terms relating to ExPA (e.g., “exercise” and “physical activity”), resulting in 120 individual searches that identified 1227 publications with any combinations of the terms (see Table, Supplemental Content 1, to view search outcomes by search term combination in each database, http://links.lww.com/TJACSM/A92).
Inclusion and Exclusion Criteria
Inclusion and exclusion criteria were determined with the goal of including articles that implemented SCD on at-risk but otherwise healthy adults capable of volitional ExPA. Articles included for analysis met the following criteria: 1) written in English, 2) included adult participants (≥18 yr), 3) used an SCD to test hypotheses, 4) applied a behavioral intervention as the independent variable with the intent to improve volitional ExPA behavior, 5) measured ExPA behavior as a primary dependent variable, and 6) reported ExPA behavioral outcomes using validated measures (e.g., accelerometers, self-report questionnaires). To assure that the members of the sample populations in studies were capable of volitional ExPA, SCD methodologies conducted in special populations where participants were reliant on a caretaker (e.g., children and adolescents, those with mental/physical disability) were excluded. Further, because the target for this review was ExPA interventions striving for behavior change, studies where activity was implemented to meet other goals (e.g., testing exercise equipment and improving sport performance) or where ExPA was expected to end after the intervention (e.g., injury rehabilitation) were also excluded. No constraints were set for publication date.
The first author conducted the initial search of PubMed and Web of Science, and the second author repeated this process and additionally searched PsycINFO to ascertain if any further articles could be added to the sample. Both authors completed the screening of titles and abstracts independently and compared their results. If discrepancies arose in which the first and the second authors could not reach consensus, the third author was designated as a tie breaker during this process; she was not consulted at any point during screening, as no discrepancies occurred between the first and the second authors. Articles that remained after the screening process were then distributed evenly among all three authors, who read them in full independently, then came together to determine which would be included in the final analysis based on inclusion and exclusion criteria.
The search yielded 1227 articles, 621 of which were excluded as duplicates. In accordance with the PRISMA Guidelines, another 530 articles were excluded after title and abstract screenings (Fig. 1). An article was rejected if the reviewer was able to determine from the title or abstract that it did not meet inclusion criteria (e.g., inclusion of children or adolescents and volitional ExPA not a primary dependent variable). As a result, 76 articles were read in full. On the basis of the following exclusion criteria, 67 studies were further eliminated for the following reasons: participants were younger than 18 yr of age (26 articles); ExPA was rehabilitative and expected to end after the intervention (e.g., acute care via physical therapy) (23 articles); participants had intellectual/neurological disabilities, were dependent on a caretaker, and/or incapable of volitional ExPA (10 articles); researchers were assessing technology (e.g., robotics and exergames) for functional mobility improvement (5 articles); or ExPA was used as the independent variable to improve functional mobility/sports performance (e.g., walking speed) (3 articles). Three studies were identified after additional screening of references from studies included in the initial search. One of these articles (25) was included in the final analysis and was composed of two distinct studies. For this reason, it is counted as two studies in the search strategy (Fig. 1). The final analysis includes 10 research studies (see Table, Supplemental Content 2, which describes included articles, http://links.lww.com/TJACSM/A93) that met all inclusion criteria.
Two tools were used to assess study quality. The 14-item scale created by Logan et al. (23) outlines standards for implementing SCD in research and then classifies studies based on met criteria (<7 = weak, 7–10 = moderate, 11–14 = strong). The 16-item checklist by Dallery et al. (12) describes criteria set specifically for SCD used in behavioral health interventions. The 14-item scale was created by compiling criteria from studies evaluating group designs (26–28) and a scale by Horner et al. (29) describing the use of single-subject research in identifying evidence-based practices in special education. The 16-item checklist, derived from the same tool by Horner et al. (29), condenses seemingly redundant criteria (e.g., describing variables with replicable precision) and adapts obsolete criteria for more current research methodologies (e.g., remote capture of data). Both tools analyze SCD components (i.e., participant and research setting descriptions, independent and dependent variables, type of design, appropriate phases, and visual analyses) and were chosen because of the unique items they contain: assessor blinding, inter-/intrarater reliability, statistical analyses (23), experimental control, fidelity, and social validity (12). Brief descriptions of the individual items included in the 14- and 16-item tools are summarized in the footnotes of Tables 1 and 2, respectively.
Before analysis, the authors met to clarify each item in the tools, address the similarities and differences, and discuss how to implement the quality assessments. The first and the second authors individually assessed each study and determined quality scores; the third author acted as a tie breaker when necessary. Interrater reliabilities were 99.2% for the 14-item scale and 97.2% for the 16-item checklist, initially. Additional discussions were held to come to 100% consensus.
Two of the interventions resulted in no change and/or decreases in ExPA in the majority of their participants (31,34) (see Table, Supplemental Content 2, which describes included articles, http://links.lww.com/TJACSM/A93). Quality assessment results according to the 14-item scale and the 16-item checklist are depicted in Tables 1 and 2, respectively.
The 14-Item Scale
Studies met on average 10–11 of the listed criteria (range = 8–11), with two classified as strong (30,32) and the remaining eight classified as moderate (25,31,33–37); no studies were classified as weak (Table 1). None of the included studies incorporated assessor blinding (item 6). Although Nijs et al. (34) specified that the physiotherapists who implemented the intervention were not the same as those who conducted tests, this scale item refers to the blinding of those who assess the outcome data to the timing or allocation of the intervention. Therefore, the study was not counted as meeting the criterion.
Adequate reporting of inter-/intrarater reliability regarding the measurement of the primary dependent variable (item 5) was another commonly missed criterion. Only two studies (30,37) used interrater reliability. Fitterling et al. (30) had researchers or participants’ significant others observe participants’ exercise sessions and confirm self-reported exercise data. However, although the authors indicated participants’ significant others were trained on validating bouts, no details were provided on this training. Wysocki et al. (37) required group exercise of their cohort and conducted reliability checks via direct observation. Of these observations, 70% consisted of two participants observing the exercise of a third, whereas the remaining 30% consisted of one researcher and one participant observing another participant (37).
Only four studies (31–34) incorporated additional “appropriate statistical tests” (23) beyond visual analysis on data (items 13 and 14). An SCD by Gorczynski et al. (31) applied paired-sample t-tests on accelerometry data and mediating psychological variables. Irons et al. (32) also used this method to test body composition outcomes but did not assess ExPA changes statistically. Nijs et al. (34) applied the Wilcoxon signed rank test on pre- and posttreatment accelerometer data to calculate effect size as Cohen’s d. Finally, McFadden et al. (33) used simulation modeling analysis (i.e., a time series analysis program appropriate when there are less than 30 data points per phase for participants) to determine changes in outcome variables between study phases. A one-way ANOVA was also conducted on the means of each outcome variable, and effect size was calculated using Cohen’s d by averaging data within and across participants for each phase (33). The analyses of all remaining studies included visual and descriptive statistical analyses only.
The 16-Item Checklist
Studies met 13–14 of listed criteria (range = 9–15), on average (Table 2). The most commonly missed criterion was the measurement of treatment fidelity (item 7). Authors of eight studies did not mention fidelity measures, whereas the remaining two studies indicated conducting fidelity measures with minimal detail provided as to the processes involved or outcome data. Gorczynski et al. (31) stated that exercise counseling sessions were audio-recorded, assessed fidelity using a trained independent reviewer, and reported that all counseling session objectives were fulfilled and “all necessary [treatment] components were addressed.” The standards by which the objectives and components were compared are unclear. McFadden et al. (33) indicated that each physical activity counseling session was videotaped to ensure counseling delivery followed evidence-based techniques but also did not describe how these videos were assessed. Further, results from assessments were not reported, with the authors alluding they would be found in a forthcoming article. The results, found in an abstract repository, suggest that the intervention was implemented as intended but did not specifically state the results of measured fidelity (38).
The purpose of this review was to summarize the quality of SCD implementing interventions to increase volitional ExPA behavior in adults. Of the 10 studies included, 80% were classified as moderate, and 20% were classified as strong. All or most met a majority of criteria based on the assessment tools. Commonly unmet criteria included fidelity reporting, use of assessor blinding, application of rater reliability, and implementation of appropriate statistical analyses, which were unfulfilled by 100%, 100%, 80%, and 60% of studies, respectively. These unmet items represent rigorous reporting standards for intervention science. Therefore, it is reasonable to consider that addressing these specific criteria can allow for improved SCD implementation as well as optimal preparation before large-scale RCT.
Because the items on both the quality tools are concise and described with minimal detail, it may be necessary for researchers conducting SCD to reference additional resources to apply rigorous protocols. The National Institutes of Health Behavior Change Consortium (BCC) provides example strategies for conducting fidelity measurements in five areas of intervention development and implementation: study design, training providers, treatment delivery, treatment receipt, and enactment of treatment skills (39). Although Gorczynski et al. (31) and McFadden et al. (33) used methods supported by the BCC (audio and video recording ExPA sessions, respectively), detail was minimal regarding how measurements were implemented. As an example of more detailed reporting, in the study by Wing et al. (40,41) fidelity was assessed by audio-recording all weight loss intervention sessions and by randomly assessing 20% of these recordings to compare content to condition-specific fidelity checklists (40), with the authors reporting that 100% accuracy was achieved (41). Although this example was taken from an RCT, adding such relevant details can further enhance perceived rigor of SCD regarding this criterion. Regarding assessor blinding, the Consolidated Standards of Reporting Trials (CONSORT) 2010 statement’s guidelines for blinding members of an intervention could be applied for SCD (42). The authors of CONSORT state that researchers should explain why they chose to blind the groups they did (e.g., data collectors and outcome assessors) in an effort to promote the validity of their results (42), an aspect that can be replicated in SCD for similar purposes. Further, the authors also recognize that in certain studies, blinding may not be necessary or impossible to conduct (42). This issue may pertain to SCD conducted by small research teams, and similarly, researchers of these studies should explicitly explain why the strategy was not implemented to validate their methods (42).
Given the lack of rater reliability implemented in the included studies, researchers conducting SCD would benefit from choosing an appropriate technique from the numerous methods available to determine “the consistency with which the same information is rated by different raters” (43). Although Logan et al. (23) does not specify how agreement should be quantifiably measured, two recommendations can be found in the foundational SCD quality checklist by Horner et al. (29). Perhaps the simpler of these methods is calculating the proportion of measures agreed upon by observers (i.e., interobserver agreement), wherein an initial interobserver agreement should reach at least 80%. The second method is computing a Kappa coefficient (44) to assess consistency by quantifying “how much the observed agreement between raters exceeds agreement due to chance alone” (45); Horner et al. (29) specifies that a κ = 60% indicates “good” agreement and is desirable by those implementing SCD. Because multiple measurements of the dependent variable are required within phases of an SCD, for continuous data, the intraclass correlation coefficient (ICC) can be calculated to determine the degree of correlation and agreement between measurements (46); specifically, an ICC exceeding 0.75 indicates acceptable reliability (47). A benefit of using ICC is that if the reliability of a single measurement falls below 0.75, the number of additional measurements needed to meet or exceed this threshold can be calculated from the initial ICC to improve reliability in the future (48).
As the implementation of health-related RCT and management of chronic disease is enhanced through the use of multidisciplinary teams (49), using specialists from other appropriate areas may also be beneficial in preefficacy SCD to streamline intervention development. Regarding quantitative data, trained statisticians may be necessary to test for both statistical and clinical significance, as recommended (12,20,23). The authors of each quality tool provide examples of appropriate analyses, including descriptive (e.g., measures of central tendency, trend lines, and variability) and inferential (e.g., χ2, t-tests, and C-statistic) approaches (23) as well as regression-based approaches (e.g., autoregressive models, robust regression) to examine time series data (12). Because of the small sample sizes of SCD, more focus should be on measures such as effect size rather than P values. It should also be highlighted that statistical analysis is recommended to supplement visual analysis, not replace it. Visual analysis is particularly useful in SCD when determining patterns and trajectory of individuals’ data during various study phases. For example, although none of the authors of the included articles explicitly mentioned concerns regarding regression to the mean, an important pattern in physiological and behavioral data that can skew statistical interpretations (50), such patterns should be accounted for and are made apparent via visual analysis. Further, because SCD methodologies allow researchers to capture in-depth, inter-/intraindividual experiences to treatment (12), the addition of qualitative approaches stand to complement outcomes determined via quantitative methods.
Qualitative research is designed to understand phenomena through the interpretation of “human perception and understanding” (51) and, as with quantitative research, holds research quality to set paradigms via rigorous methodologies. Although statistical saturation in the small sample sizes indicative of SCD may not be reached, the inclusion of qualitative approaches may enhance overall intervention development because of gained insights from participants. On the basis of the current review, only researchers from two studies (31,34) used qualitative/mixed methods to understand social validity. Nijs et al. (34) reported results from the Canadian Occupational Performance Measure, a mixed-method scale that instruments a semistructured interview to determine participants’ ability to perform activities of daily living then quantifies their responses into performance and satisfaction subscale scores. Gorczynski et al. (31) used the step-by-step process from Braun and Clarke (52) to thematically analyze patterns within participants’ responses to a poststudy interview. Braun and Clarke’s (52) methods require researchers to familiarize themselves with participant responses, generate initial codes from key words/phrases in these responses, categorize codes into common themes, and then refine and define final themes that represent study participants’ experiences. Because of the small sample size, Gorczynski et al. (31) reported direct quotes from participants as opposed to general themes in support of their enjoyment and perceived helpfulness of the intervention. Another strategy that may add an extra layer of rigor when using qualitative interviews is the Consensual Qualitative Research method (53). This method uses several analyzers throughout the data review process to address and minimize personal biases in interpreting data. Further, this method requires researchers to solicit feedback from at least one external auditor to minimize the effects of groupthink (53). Regardless of the methods used, trained quantitative and qualitative specialists would ensure appropriate paradigms are followed in terms of implementation and data analysis, and should be considered when developing the research team.
It is particularly interesting to note a lack of replicable precision regarding exercise prescriptions in the reviewed literature. The American College of Sports Medicine supports the FITT-VP principle, such that an exercise prescription should be individualized, but that the following information be explicitly detailed: frequency of the exercise, intensity of each bout, time taken to complete a bout, type of activity, and more recently, volume and progression of exercise (3). This framework provides specific instructions regarding exercise dosage, indicating a degree of importance similar to that of a medical prescription by elevating “the advice to become more physically active from that of a recommendation to an ‘order’” (54). Only the study by Irons et al. (32) described each of these components with sufficient detail. In the remaining studies, walking, jogging, cycling, and other sporting activities (e.g., swimming and tennis) were recommended with no specific details given in terms of an exercise prescription. A clear FITT-VP prescription can improve replicability and fidelity by providing a framework that researchers can easily duplicate for each participant, and ensure necessary details are provided to aid in participants’ understanding, compliance, and ultimately adherence. For example, exceeding a moderate-intensity prescription may elicit unpleasant feelings during exercise, which may negatively influence long-term behavior (55–57). Conversely, failing to meet minimum volume requirements may delay or prevent health and fitness-related adaptations. Such concerns noted during an SCD could be addressed and allow for intervention refinement before moving forward to large-scale controlled trials. Although a criterion for the reporting of ExPA prescription is too specific for a broad quality assessment tool, a sufficient understanding of the FITT-VP principle by behavioral researchers, or inclusion of a trained exercise specialist on a research team, is likely necessary to improve the rigor of SCD targeting ExPA behavior.
Limitations, Strengths, and Future Directions
This review is not without limitations. For logistical reasons, the authors only included studies published in English; therefore, it is possible that assessing non-English research may provide additional insight into the quality of ExPA research using SCD. In addition, the potential for publication bias (i.e., studies producing null findings or of poor quality are not accepted by peer-reviewed journals) may have artificially inflated the overall degree of quality found in the current review. A notable strength of this review is that two SCD quality assessment tools were used to assess the multiple components used in SCD and provide a holistic picture of quality across various ExPA-promoting SCD. In addition, the testing of interrater reliability between quality reviewers was included to minimize the potential bias of a single author determining the quality of included studies. Finally, because this review was limited to populations capable of volitional ExPA, similar reviews of SCD in special populations may need to be conducted in the future.
The results of this review demonstrate that SCD promoting ExPA behavior change are relatively rare, but that quality ranged from moderate to strong. The commonly missed criteria addressed in this review should be applied when implementing SCD, so as to further improve the rigor of these designs. It is important to note that the authors do not endorse one quality assessment tool over the other, as such a determination is outside the scope of this review. Although the 14- and 16-item tools overlap substantially, there are unique criteria listed in each that contribute to increased rigor. It may be most useful for researchers to use both tools when designing and conducting an SCD to ensure all criteria are met. Researchers can also apply and adapt guidelines for large-scale RCT (e.g., CONSORT statement and BCC) to these smaller designs if SCD-specific standards do not exist. In addition, researchers interested in implementing an SCD to promote exercise may consider a standard of reporting regarding the exercise prescription via the FITT-VP framework.
The results of this study do not constitute endorsement from the American College of Sports Medicine. The authors report no conflict of interest or sources of external funding.
1. Lee IM, Shiroma EJ, Lobelo F, Puska P, Blair SN, Katzmarzyk PT. Effect of physical inactivity on major non-communicable diseases worldwide: an analysis of burden of disease and life expectancy. Lancet
. 2012;380(9838):219–29. doi: https://doi.org/10.1016/S0140-6736
3. Garber CE, Blissmer B, Deschenes MR, et al. American College of Sports Medicine Position Stand: quantity and quality of exercise for developing and maintaining cardiorespiratory, musculoskeletal, and neuromotor fitness in apparently healthy adults: guidance for prescribing exercise. Med Sci Sports Exerc
6. Stutts WC. Physical activity determinants in adults: perceived benefits, barriers, and self efficacy. AAOHN J
7. Bijlani RL, Vempati RP, Yadav RK, et al. A brief but comprehensive lifestyle education program based on yoga reduces risk factors for cardiovascular disease and diabetes mellitus. J Altern Complement Med
8. Yap TL, Davis LS. Physical activity: the science of health promotion through tailored messages. Rehabil Nurs
9. Strohacker K, Galarraga O, Williams DM. The impact of incentives on exercise behavior: a systematic review of randomized controlled trials. Ann Behav Med
10. Strohacker K, Galárraga O, Emerson J, Fricchione SR, Lohse M, Williams DM. Impact of small monetary incentives on exercise in university students. Am J Health Behav
11. Saelens BE, Gehrman CA, Sallis JF, Calfas KJ, Sarkin JA, Caparosa S. Use of self-management strategies in a 2-year cognitive-behavioral intervention to promote physical activity. Behavior Therapy
12. Dallery J, Raiff BR. Optimizing behavioral health interventions with single-case designs: from development to dissemination. Transl Behav Med
13. Middleton KR, Anton SD, Perri MG. Long-term adherence to health behavior change. Am J Lifestyle Med
14. Oakley A, Strange V, Bonell C, Allen E, Stephenson J. Process evaluation in randomised controlled trials of complex interventions. BMJ
15. Collins LM, Baker TB, Mermelstein RJ, et al. The multiphase optimization strategy for engineering effective tobacco use interventions. Ann Behav Med
16. Biglan A, Ary D, Wagenaar AC. The value of interrupted time-series experiments for community intervention research. Prev Sci
17. Dallery J, Cassidy RN, Raiff BR. Single-case experimental designs to evaluate novel technology-based health interventions. J Med Internet Res
18. Czajkowski SM, Powell LH, Adler N, et al. From ideas to efficacy: the ORBIT model for developing behavioral treatments for chronic diseases. Health Psychol
19. Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, Petticrew M. Developing and evaluating complex interventions: the new Medical Research Council guidance. BMJ
20. Gorczynski P. The use of single-case experimental research to examine physical activity, exercise, and physical fitness interventions: a review. J Appl Sport Psychol
21. Bryson-Brockmann W, Roll D. Single-case experimental designs in medical education: an innovative research method. Acad Med
22. Dermer ML, Hoch TA. Improving descriptions of single-subject experiments in research texts written for undergraduates. Psychol Rec
23. Logan LR, Hickman RR, Harris SR, Heriza CB. Single-subject research design: recommendations for levels of evidence and quality rating. Dev Med Child Neurol
24. Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med
25. Kurti AN, Dallery J. Internet-based contingency management increases walking in sedentary adults. J Appl Behav Anal
26. O’Donnell M, Darrah J, Adams R, Butler C, Roxborough L, Damiano D. AACPDM Methodology to Develop Systematic Reviews of Treatment Interventions
. Rosemont: AACPDM; 2004.
27. Van Tulder M, Furlan A, Bombardier C, et al. Updated method guidelines for systematic reviews in the cochrane collaboration back review group. Spine
28. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials
29. Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M. The use of single-subject research to identify evidence-based practice in special education. Except Child
30. Fitterling JM, Martin JE, Gramling S, Cole P, Milan MA. Behavioral management of exercise training in vascular headache patients: an investigation of exercise adherence and headache activity. J Appl Behav Anal
31. Gorczynski P, Faulkner G, Cohn T, Remington G. Examining the efficacy and feasibility of exercise counseling in individuals with schizophrenia: a single-case experimental study. Ment Health and Phys Act
32. Irons JG, Pope DA, Pierce AE, Van Patten RA, Jarvis BP. Contingency management to induce exercise among college students. Behaviour Change
33. McFadden T, Fortier MS, Guérin E. Investigating the effects of physical activity counselling on depressive symptoms and physical activity in female undergraduate students with depression: a multiple baseline single-subject design. Ment Health and Phys Act
34. Nijs J, Van Eupen I, Vandecauter J, et al. Can pacing self-management alter physical behaviour and symptom severity in chronic fatigue syndrome?: a case series. J Rehabil Res Dev
35. Normand MP. Increasing physical activity through self-monitoring, goal setting, and feedback. Behav Interventions
36. Thyer BA, Irvine S, Santa CA. Contingency management of exercise by chronic schizophrenics. Percept Mot Skills
37. Wysocki T, Hall G, Iwata B, Riordan M. Behavioral management of exercise: contracting for aerobic points. J Appl Behav Anal
38. Gagnon JC, Fortier MS, McFadden T, Plante Y. Investigating the motivational interviewing techniques and behaviour change techniques in physical activity counselling sessions: preliminary results. J Exerc Move Sport
39. Bellg AJ, Borrelli B, Resnick B, et al. Enhancing treatment fidelity in health behavior change studies: best practices and recommendations from the NIH Behavior Change Consortium. Health Psychol
40. Wing RR, Tate D, Espeland M, et al. Weight gain prevention in young adults: design of the study of novel approaches to weight gain prevention (SNAP) randomized controlled trial. BMC Public Health
41. Wing RR, Tate DF, Espeland MA, et al. Innovative self-regulation strategies reduce weight gain in young adults: the study of novel approaches to weight gain prevention (SNAP) randomized controlled trial. JAMA Intern Med
42. Schulz KF, Altman DG, Moher D. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMC Med
. 2010;8(1):18. doi: 10.1186/1741-7015-8-18.
43. Mulsant BH, Kastango KB, Rosen J, Stone RA, Mazumdar S, Pollock BG. Interrater reliability in clinical trials of depressive disorders. Am J Psychiatry
44. Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med
45. Carpentier M, Combescure C, Merlini L, Perneger TV. Kappa statistic to measure agreement beyond chance in free-response assessments. BMC Med Res Methodol
46. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med
47. Streiner DL, Norman GR, Cairney MJ. Health Measurement Scales: A Practical Guide to Their Development and Use
. Oxford University Press; 2015.
48. Stanley J, Thorndike R. Educational Measurement
. Washington (DC): American Council on Education; 1971.
49. Hogg W, Lemelin J, Dahrouge S, et al. Randomized controlled trial of anticipatory and preventive multidisciplinary team care: for complex patients in a community-based primary care setting. Can Fam Physician
50. Berntson GG, Uchino BN, Cacioppo JT. Origins of baseline variance and the law of initial values. Psychophysiology
51. Stake RE. Qualitative Research: Studying How Things Work
. Guilford Press; 2010.
52. Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol
53. Hill CE, Knox S, Thompson BJ, Williams EN, Hess SA, Ladany N. Consensual qualitative research: an update. J Couns Psychol
54. Phillips EM, Kennedy MA. The exercise prescription: a tool to improve physical activity. PM&R
55. Williams DM. Exercise, affect, and adherence: an integrated model and a case for self-paced exercise. J Sport Exerc Psychol
56. Ekkekakis P, Hall EE, Petruzzello SJ. The relationship between exercise intensity and affective responses demystified: to crack the 40-year-old nut, replace the 40-year-old nutcracker! Ann Behav Med
57. Ekkekakis P, Parfitt G, Petruzzello SJ. The pleasure and displeasure people feel when they exercise at different intensities. Sports Med
Supplemental Digital Content
© 2019 American College of Sports Medicine