Mothers completed the Kiddie Disruptive Behavior Disorders Schedule (Keenan et al., 2007), a semistructured interview for preschoolers that includes probes to assess the frequency, severity, and pervasiveness of Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 2000, 4th ed.) disruptive behavior symptoms across school, home, and public. A developmentally enhanced approach was used in which multiple features of the child's behavior (e.g., severity and frequency) were taken into account to improve discrimination between normative misbehavior and clinical symptoms of disruptive behaviors (Wakschlag et al., 2008b). One-week test–retest reliability with the current sample was high for disruptive behavior disorder diagnoses (κ = 0.81, p < .001) and total number of DBD symptoms (intraclass correlation coefficient = 0.82, p < .001; Wakschlag et al., 2008b).
Teachers also completed the Early Childhood Inventory (ECI; Gadow, Sprafkin, & Nolan, 2001), a DSM-IV rating scale sensitive to a wide range of developmental problems in 3- to 5-year-olds. Teachers reported on 35 ECI items mapping onto DSM-IV disruptive behaviors (response options range from 0 = Never to 3 = Very Often). The ECI generates categorical and continuous symptom scores, demonstrates satisfactory test–retest reliability (r = .56 for oppositional behaviors and r = .41 for conduct problems), and differentiates clinic-referred from nonreferred children. For this study, we employed the categorical scoring approach in which behaviors were considered at the symptom level if they were endorsed as occurring “often” or “very often.” Internal consistency in the present sample was excellent (α = .94).
The Disruptive Behavior Diagnostic Observation Schedule (DB-DOS; Wakschlag et al., 2008a; Wakschlag et al., 2008b) is a 50-minute, clinic-based observation system used to assess Behavior Regulation (i.e., noncompliant, provocative, inflexible behavior) and Anger Modulation (i.e., difficulty modulating anger) on a continuum from 0 (No Evidence of Behavior) to 3 (High Level of Behavior; see Wakschlag et al., 2008a, 2008b). The DB-DOS includes one Parent and two Examiner contexts; the examiner contexts vary examiner attending to the child (Examiner Engaged Versus Examiner Busy) and all contexts create “presses” for misbehavior (e.g., compliance, frustration). DB-DOS psychometrics established with the current sample highlights acceptable interrater reliability (mean weighted κ = 0.66 for Behavior Regulation and 0.62 for Anger Modulation) and good internal consistency (mean α = .85 for Behavior Regulation, α = .92 for Anger Modulation) and distinguishes normative misbehavior from disruptive behavior (F values range from 9.40 to 29.06, all p values < .01; Wakschlag et al., 2008a, 2008b).
The Social Skills Rating System–Teacher Form Preschool Level (Gresham & Elliott, 1990) measures how frequently children engage in a range of classroom behaviors (1 = Never to 4 = Very Often). The Social Skills Rating System–Preschool Level has 40 items with the Social Skills Scale of primary interest in this study (α = .94). The Social Skills Scale comprises three subscales measuring Cooperation (α = .90), Assertion (α = .90), and Self-Control (α = .91). Normative data are provided by age and sex and were standardized on a heterogeneous population of which one third was urban and 28% minority.
Developmental functioning and other developmental concerns
The Differential Abilities Scale-Preschool Core (Elliott, 1983) assesses developmental functioning in children aged 2.6–5.11 years. The General Conceptual Ability Score (M = 100; SD = 15) was used for analytic purposes. The standardization sample oversampled minority children, and reliability coefficients are high for lower (r = .90) and upper preschool versions (r = .94). Test–retest scores are stable for the General Conceptual Ability Score ranging from .79 to .94. Examiners also completed behavioral ratings during the Differential Abilities Scale (i.e., aggression, affect, task engagement, and social engagement), ranging from 1 = None to 4 = High).
The ECI (Gadow et al., 2001) Parent and Teacher Checklist assessed other developmental problems, including autism, anxiety, depression, speech and language disabilities, and other developmental delays (psychometrics described previously), with items rated on a 4-point scale (0 = Never to 3 = Very Often).
Child and family context
Three measures were used to assess life events and stressors. The first included the Conflict Tactics Scale (Straus, 1979), which has 11 items assessing the child's exposure to violence. Internal consistency coefficients for the current sample = 0.92. The second included the Difficult Life Circumstances Questionnaire (Barnard, 1989), which includes 28 items (Yes or No) assessing the number and type of chronic stressors families have experienced in the past year, including difficulty finding affordable housing, problems with a former partner, and financial strain. The Difficult Life Circumstances Questionnaire was developed for urban families and demonstrates acceptable reliability, with 1-year test–retest for the total score r = .70 (Barnard, 1989) and internal consistency for the current sample = .70, indicating good reliability. The third included the Beck Depression Inventory (BDI-IA; Beck & Steer, 1993) completed by mothers. The BDI is a widely used self-report measure of depression for individuals 13 years and older, with response options ranging from 0 to 3 reflecting symptom severity with high internal consistency for the current sample (α = .90).
Sociodemographic data were obtained through a structured interview with mothers, who provided information on child and maternal age, gender, ethnicity/race, maternal relationship status, partner involvement, family income, and education.
Integrative Consensus procedures
An interdisciplinary team, including one senior psychologist, one senior child psychiatrist, two experienced psychologists, one doctoral-level clinical psychologist, and one doctoral-level school psychologist (hereafter, the doctoral-level psychologists are referred to as clinicians), developed the Integrative Consensus procedures. The two clinicians who independently applied the Integrative Consensus framework to the sample (N = 295, mother–teacher dyads described previously) were initially trained on Integrative Consensus procedures through several rounds of open coding and discussion, followed by independent coding of assessment data to a criterion of 80% exact agreement and 0.90 Pearson correlations on clinician overall rating of impairment using the C-GAS (Setterberg, Bird, Gould, Shaffer, & Fisher, 1992) described next. After initial reliability was established, individual children were randomly assigned to the two clinicians who were blind to referral status (i.e., clinically referred, nonreferred with behavioral concerns, and nonreferred without behavioral concerns). In addition, ongoing reliability of Integrative Consensus was assessed via random assignment of 25% of children to clinicians for double coding. Disagreements were discussed during twice monthly interdisciplinary team meetings. Clinicians spent approximately 30–45 minutes independently reviewing assessment data for each participant, applying the following procedures, which are also illustrated in Figure.
* Clinicians reviewed parent and teacher report of behavior problems by scoring the Kiddie Disruptive Behavior Disorders parent interview (Keenan et al., 2007) and the ECI–Teacher report (Gadow et al., 2001). Clinicians summed the total number of oppositional, conduct, and ADHD symptoms for which the child met the criteria according to DSM-IV (American Psychiatric Association, 2000).
* Clinicians reviewed the Differential Abilities Scale (Elliott, 1983) scores to assess learning strengths and needs, along with behavioral ratings made during developmental testing to understand social competencies and whether disruptive behaviors interfered with testing.
* Clinicians assessed teacher-rated social competence via the Social Skills Rating System (Gresham & Elliott, 1990), including the child's ability to help his/her peers, share materials, and comply with rules and directions. Raw scores within one standard deviation of the standardization sample were considered average. Scores 1 SD below or above the mean were considered below or above average, respectively.
* Other developmental problems (i.e., autism, anxiety, depression, speech and language disabilities, developmental delays) were assessed via the ECI parent and teacher report (Gadow et al., 2001).
* Clinicians scored measures of life events and stressors (i.e., Conflict Tactics Scales, Difficult Life Circumstances Questionnaire, BDI) and reviewed sociodemographic interview data to examine how family and child contextual factors were influencing the child's behavior at home and school.
* Clinicians viewed digital recordings of the DB-DOS in brief by scanning the entire video to identify salient observations to view more fully. DB-DOS observations facilitated clinicians' observation of the child's behavior within and outside of the parent–child context and supplemented other data. Review of the DB-DOS focused on the presence of negative affect and disruptive behaviors as well as social competencies, including socially directed positive affect, social engagement, and prosocial behaviors.
* Clinicians reviewed four impairment indicators. The first included KDBDs Parent Interview questions (0 = Not Very Much, 2 = Some, and 3 = A Lot) focused on interference of disruptive behaviors with the parents' ability to take the child in public, set limits, or leave the child with a caregiver and the child's ability to get along with others or learn at school. KDBDs impairment scores were viewed on a continuum, with behaviors considered impairing if the sum score exceeded 3. The second impairment indicator included maternal ratings on the Impact on the Family Scale (Preschool Disruptive Behavior Version; Sheeber & Johnson, 1992), which has 23 items assessing the social, financial, and family burden resulting from the child's disruptive behaviors (0 = Strongly Disagree to 4 = Strongly Agree). Clinicians considered the sum score and severity (e.g., parents quit their job because of behavior problems, financial burden). Sum scores falling in the 4–7 range suggested concerning level of impairment. The third impairment indicator included history of school expulsion and the fourth included mother and teacher independent rating of global impairment on the basis of the nonclinician version of the C-GAS (Shaffer et al., 1983). The C-GAS includes behaviorally oriented descriptors and life situations validated for children as young as 2 years of age (Hill, Maskowitz, Danis, & Wakschlag, 2008; Wakschlag & Keenan, 2001). Children's Global Assessment Scale scores range from 1 to 100, with each decile reflecting the impact of the child's behavior on school, family, and peer relations, with scores of 60 or lower indicating clinical impairment.
* Clinicians generated overall impairment by integrating the four impairment indicators while focusing on the extent to which disruptive behaviors interfered with (a) the child's ability to navigate normative developmental tasks (e.g., make friends, learn at school, enjoy time with family and peers) and (b) the impact of the behavior problems on the family (e.g., difficulty securing stable caregivers or maintaining employment). Clinicians generated an overall impairment score from 1 to 100 using the clinician version of the C-GAS (Setterberg et al., 1992). The clinician C-GAS score was incorporated into the final decision regarding presence of a disruptive behavior disorder. Pearson correlations on clinician C-GAS ratings made independently indicted high interrater reliability (r = .85, range: .79–.89).
* Clinicians proposed a binary diagnosis (not disruptive or disruptive). Not disruptive reflected behaviors falling generally in the normative range rather than an absence of concern. Disruptive reflected that the child met (1) DSM-IV criteria for oppositional defiant disorder (four symptoms) and/or conduct disorder (three symptoms) or Disruptive Behavior Disorder-NOS (defined a priori as the presence of at least three disruptive symptoms) and (2) Clinician C-GAS score of 60 or less.
Although differential diagnoses for other developmental problems were not made because of lack of full diagnostic information, the presence of other clinical problems observed during DB-DOS observations and reported by mothers were weighed. Given oppositional defiant and conduct symptoms were the focus of this study, clinicians coded inattention and hyperactivity symptoms as not disruptive in the absence of oppositional or more serious conduct problems. Only one child identified as not disruptive via Integrative Consensus met DSM-IV criteria for ADHD. Twenty-five children in the sample were comorbid for disruptive behaviors and ADHD.
Reliability of Integrative Consensus
The reliability of the Integrative Consensus procedures was assessed via random assignment of 25% of the participants to the two clinicians for double coding. Weighted κ values measured the proportion of weighted agreement corrected for by chance (Cohen, 1968). Weighted κ values reflect agreement beyond chance on the basis of the following guidelines: 0.75 or greater = Excellent; 0.60–0.74 = Good; 0.40–0.59 = Fair; less than 0.40 = Poor (Cicchetti & Sparrow, 1981). Results suggest that interrater reliability for Integrative Consensus was excellent for child disruptive behavior ratings (κ = 0.84; range: 0.80–0.87).
Developmental and contextual factors predicting agreement
Table 3 illustrates the percentage of the analytic sample categorized as disruptive and not disruptive on the basis of parent report using the Kiddie Disruptive Behavior Disorders interview (Keenan et al., 2007), teacher report using the ECI (Gadow et al., 2001), and Integrative Consensus ratings. Of the analytic sample of young children (N = 295), descriptive analyses reflected that Integrative Consensus consistently identified a greater percentage of children as disruptive (30%) compared with mother (14%) or teacher report (14%).
Multivariate analysis of variances examined whether child developmental factors (i.e., child's age in years, developmental functioning), life events and stressors (i.e., chronic family stress, maternal depression, relationship conflict), or mother/teacher report of disruptive behaviors or social competence taken into account during the Integrative Consensus review predicted agreement between pairs of raters and methods. For these analyses, we examined discrepancies between Integrative Consensus and the “or rule” (i.e., mother or teacher identified the child as disruptive). Only children categorized as disruptive by at least one caregiver/method were included in these analyses (n = 91). Pairs were classified as “agreeing” if they agreed on the disruptive rating (n = 65). Pairs were categorized as “disagreeing” if either the Integrative Consensus review categorized the child as disruptive but the “or rule” categorized the child as not disruptive or the Integrative Consensus review categorized the child as not disruptive but the “or rule” categorized the child as disruptive (n = 26). Few factors discriminated the “agree” from “disagree” groups, and only the multivariate analysis of variances examining mother and teacher report of child behavior was significant (Wilks' λ = 8.90; p < .001). Follow-up univariate analyses indicated that children jointly categorized as disruptive by Integrative Consensus and at least one caregiver were rated as having more teacher-reported disruptive behaviors and fewer teacher-reported social skills than children for which Integrative Consensus and other caregivers disagreed (p values < .01).
Incremental utility of Integrative Consensus
Finally, we examined whether Integrative Consensus ratings made at baseline added incremental utility in predicting service use and child impairment 1 year later at the follow-up assessment using multivariate hierarchical logistic regression. Impairment was measured using the clinician C-GAS ratings (i.e., a score of ≤60 indicates impairment) made during the follow-up assessment. Service use (i.e., parent-sought therapy for herself or the child at 1-year follow-up) and child-prescribed medication at 1-year follow-up was measured via parent report (described previously). For each regression, mother or teacher report of disruptive behavior was entered in the first step, and the Integrative Consensus rating was entered in the second step. Table 4 illustrates that being categorized as disruptive via Integrative Consensus increased the odds that the family sought services for the child between eightfold and 16-fold. In addition, children rated disruptive by Integrative Consensus were four times more likely to be rated as impaired by their mother 1 year later. In contrast, Integrative Consensus did not provide incremental utility in predicting which children were rated as impaired by their teacher 1 year later.
Disruptive behavior problems are the most common reason for referring young children for evaluations (Jones Harden et al., 2000; Webster-Stratton et al., 2001), with prevalence rates for disruptive behaviors almost three times national estimates in economically disadvantaged communities (Tolan & Henry, 1996). Evidence also suggests that some young children who struggle with aggression and noncompliance can experience longer-standing academic problems and school failure. (Keenan & Wakschlag, 2000; Webster-Stratton, Reid, & Hammond, 2004). Current policy calls for comprehensive assessments that emphasize teamwork, collaboration, and active involvement of parents at every phase of an early childhood assessment (Bruder, 2000; National Research Council, 2008; Neisworth & Bagnato, 2004). Despite the need for multisource, multimethod, multicontext assessments, more data can complicate decision making, and developmentally informed guidelines for integrating comprehensive assessments have rarely been described in the literature (Dirks et al., 2012; Westen & Weinberger, 2004). This study represents a preliminary effort to apply a priori principles for weighing different sources of information and integrating contextual data systematically.
There were three main findings from this study. First, we found that Integrative Consensus could be reliably applied to capture meaningful variation in young children's behavior and that trained clinicians were able to integrate comprehensive assessment data to arrive at binary diagnoses in a reliable manner. Reliability estimates from this study were satisfactory when compared with those from best estimate procedures used in the studies by Strober, Green, and Carlson (1981; κ = 0.63–0.82) and Maziade et al. (1992; κ = 0.76–0.88). This finding is encouraging given Integrative Consensus reflected a more complicated approach to clinical decision making in which data were synthesized from multiple sources and methods rather than the norm of attaining agreement across one measure (Angold & Costello, 2000). In this study, Integrative Consensus helped clinicians identify “gray area” children in which data from the comprehensive assessment did not easily converge and could explain why Integrative Consensus identified more children than parent or teacher report. This difference in rates of identification may in part highlight that context does matter. The likelihood of impairment over time in children rated by Integrative Consensus was substantially higher as well, suggesting that this method may have promise in identifying children earlier who could benefit from intervention.
Our findings also reflect what is known about the conditions under which professionals conducting assessments can make reliable judgments, namely, when systematic procedures for quantifying inferences and observations are used (Ægisdóttir et al., 2006; Spengler et al., 2009; Westen & Weinberger, 2005), when guidelines for integrating and combining assessment data are described a priori (Klein, Ouimette, Kelly, Ferro, & Riso, 1994), and when professionals rely on sound psychometric instruments to make clinical and educational decisions rather than making global judgments of functioning (Bierman, Nix, Maples, & Murphy, 2006; Westen & Weinberger, 2005). Lord et al. (2006), in a study of the longitudinal stability of autism, argue that the use of standardized instruments to diagnose autism improved the stability of diagnoses directly via the use of clear-cut diagnostic algorithms and indirectly by organizing and contextualizing clinical judgment.
Second, results from this study highlighted that few child developmental factors or child and family contextual factors taken into account during Integrative Consensus predicted agreement between pairs of raters and methods. Previous investigations have found either inconsistent or null findings when informant discrepancies are examined in the context of child characteristics, such as age, developmental level, and gender (Achenbach, McConaughy, & Howell, 1987; De Los Reyes & Kazdin, 2005; Kolko & Kazdin, 1993). The relationship between maternal characteristics and family stress has also been mixed, with some studies linking elevated maternal depression and stress to parent-reported behavior problems that are not confirmed by other sources (Briggs-Gowan et al., 1996; Youngstrom, Loeber, & Stouthamer-Loeber, 2000). In this study, children rated as disruptive via Integrative Consensus and mother and/or teacher were discriminated by teacher-reported social skills, and disruptive behavior was not entirely unexpected. Further investigation, however, is needed to clarify the meaning of those patterns and to elucidate whether discrepancies that emerged in the context of teacher report reflected rater bias (i.e., teachers as source) and method bias (i.e., rating scale vs. interview vs. observation) versus situational specificity (i.e., more disruptive behaviors evidenced at school than other contexts).
Third, these results highlight that children rated as disruptive through Integrative Consensus (vs. parent interview or teacher rating scale) were more likely to be clinically impaired at 1-year follow-up, including being prescribed medications, receiving mental health services, and having elevated maternal C-GAS ratings. Findings from logistic regressions suggest that Integrative Consensus ratings reduced errors related to overidentification (e.g., correctly identifying children who continue experiencing clinically impairing symptoms at follow-up) and underidentification (e.g., incorrectly missing children who continue experiencing clinically impairing symptoms at follow-up). Integrative Consensus identified those children who continued to struggle clinically over time, which is important given how difficult it is to predict impairment prospectively (Hardt & Rutter, 2004). In general, incremental validity analyses suggested that Integrative Consensus may have “value added” with respect to clinical prediction. Research diagnosing autism spectrum disorders similarly found that clinical consensus procedures enhanced predictive validity over and above standardized instruments alone, with the odds of predicting diagnosis at 9 years of age nearly two times greater than the odds for either observation or interview alone (Lord et al., 2006).
Although these findings may generalize to low-income, young children with behavior problems, a group consistently underrepresented in the literature, it is unclear whether our findings generalize to other populations or developmental periods. Therefore, replication with a larger and more diverse sample is needed. In addition, although standardized diagnostic observations were incorporated into Integrative Consensus, clinicians had no direct interaction with child participants, and it is unclear whether the clinician's experience of and with the child could have enhanced clinical decision making. Furthermore, although this study provided some preliminary evidence that Integrative Consensus demonstrated incremental value in identifying disruptive behaviors in young children and predicting impairment over time, questions remain regarding the generalizability of the method when clinical decision making is not limited to binary choices regarding a single diagnostic group. Application of Integrative Consensus to only oppositional defiant disorder and conduct disorder not only narrowed the scope of our work but also may have underestimated the potential for disagreement when considering other clinical problems. Finally, although families were not provided with results from the Integrative Consensus review, for the subset of the sample (N = 119) referred with behavioral concerns, 42% (n = 49) were identified as not disruptive by Integrative Consensus compared with 59% (n = 70) identified as disruptive. If those families were expecting a diagnosis, the diagnosis itself may have become a self-fulfilling prophecy and subsequently increased the odds of a family seeking services.
Implications for practice
The Integrative Consensus framework included examining the multiple facets of the child's behavior within a developmental context and relied on clinical judgment guided by a priori principles for distinguishing the nature and boundaries of children's behavior and functional competencies. Via the Integrative Consensus process, clinicians integrated data gathered from significant adults along with contextual factors such as recent life events and stressors—a process that has application to everyday practice by making explicit what clinicians already do implicitly and by incorporating the meaningfulness and informativeness of disagreements that naturally emerge via collaborative, comprehensive assessments. No gold standard exists for determining when to worry about young children exhibiting behavioral problems that are concerning to key caregivers, and the field is divided regarding how to balance the risk of premature labeling with the cost of failing to intervene early. However, collaborative assessments that incorporate contextual factors in a systematic manner would increase reliable and valid diagnoses. Integrative Consensus also has implications for training students how to weigh different sources of information and resolve discrepancies that naturally emerge during a multidisciplinary assessment. This framework also has application to intervention research in which a diagnosis is often made on the basis of one instrument to help better bridge the realm of clinical practice and empirically validated treatments. Increased accuracy of diagnoses at young ages would also help us better target intervention and services to those children in need.
The Integrative Consensus framework was applied by clinicians with a sample of young children with behavior problems and has relevance to the wide array of professionals (e.g., speech–language therapists, occupational therapists, social workers, inclusion support specialists, early interventionists) who gather information from key adults as they conduct early childhood assessments. These professionals are faced with similar challenges in determining how to systematically consolidate and integrate complex assessment data to arrive at reliable and valid conclusions regarding those children who require services. These comprehensive assessments have implications not only for who is identified but also which problems are treated and monitored over time, particularly given the critical role that context plays in identifying functional strengths and domains for further support (De Los Reyes & Kazdin, 2005; Macy, 2012).
The Integrative Consensus framework more closely reflects real-world, clinical decision making that inherently assumes a certain level of clinical sophistication and judgment on the basis of extensive knowledge of child development, observation of a range of children with and without clinical problems, and access to multiple sources of information, including contextual data. Clinical judgment is often portrayed as an unsystematic, unreliable endeavor in which clinicians exercise “broad clinical judgment” regarding methods for gathering and weighing evidence (Piacentini et al., 1992). In this study, we found that clinical judgment that is guided by well-articulated principles for distinguishing normative from problematic behavior and that encourages examining multiple facets of the child's behavior and context within a developmental framework holds promise as a systematic method for clinicians to collectively integrate different sources of data about a child's behavior into a gestalt in which the whole is greater than the sum of its parts.
Achenbach T. M., McConaughy S. H., Howell C. T. (1987). Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin, 101, 213–232. doi:10.1037/0033-2909.101.2.213
Ægisdóttir S., White M. J., Spengler P. M., Maugherman A. S., Anderson L. A., Cook R. S., Rush J. D. (2006). The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The Counseling Psychologist, 34, 341–382. doi:10.1177/0011000005285875
American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). Washington, DC: Author. doi:10.1176/appi.books.9780890423349
Angold A., Costello J. E. (2000). The Child and Adolescent Psychiatric Assessment (CAPA). Journal of the American Academy of Child and Adolescent Psychiatry, 39, 39–48.
Atkins M. S., Hoagwood K. E., Kutash K., Seidman E. (2010). Toward the integration of education and mental health in schools. Administration and Policy in Mental Health, 37, 40–47. doi:10.1007/s10488-010-0299-7
Barnard K. E. (1989). Difficult life circumstances (DLC). Seattle, WA: NCAST Publications.
Beck A. T., Steer R. A. (1993). Manual for the Beck Depression Inventory. San Antonio, TX: Psychological Corporation.
Bierman K. L., Nix R. L., Maples J. J., Murphy S. A. (2006). Examining clinical judgment in an adaptive intervention design: The Fast Track Program. Journal of Consulting and Clinical Psychology, 74, 468–481. doi:10.1037/0022-006X.74.3.468
Bird H. R., Gould M. S., Staghezza B. (1992). Aggregating data from multiple informants in child psychiatry epidemiological research. Journal of the American Academy of Child and Adolescent Psychiatry, 31, 78–85. doi:10.1097/00004583-199201000-00012
Briggs-Gowan M., Carter A., Schwab-Stone M. (1996). Discrepancies among mother, child, and teacher reports: Examining the contributions of maternal depression and anxiety. Journal of Abnormal Child Psychology, 24, 749–765. doi:10.1007/BF01664738
Briggs-Gowan M. J., Carter A. S., Skuban E. M., Horwitz S. M. (2001). Prevalence of social-emotional and behavioral problems in a community sample of 1- and 2-year-old children. Journal of the American Academy of Child and Adolescent Psychiatry, 40, 811–819.
Bronfenbrenner U., Morris P. A. (2006). The bioecological model of human development. In: Lerner R. (Ed.), Handbook of child psychology: Vol. 1. Theoretical models of human development (6th ed., pp. 793–828). Hoboken, NJ: Wiley.
Bruder M. B. (2000). Family-centered early intervention: Clarifying our values for the new millennium. Topics in Early Childhood Special Education, 20, 105–115.
Campbell S. B., Ewing L. J. (1990), Follow-up of hard-to-manage preschoolers: Adjustment at age 9 and predictors of continuing symptoms. Journal of Child Psychology and Psychiatry, 31, 871–889. doi:10.1111/j.1469-7610.1990.tb00831
Cicchetti D. V., Sparrow S. A. (1981). Developing criteria for establishing interrater reliability of specific items: Applications to assessment of adaptive behavior. American Journal of Mental Deficiency, 86, 127–137.
Cohen J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.
De Los Reyes A., Henry D. B., Tolan P. H., Wakschlag L. S. (2009). Linking informant discrepancies to observed variations in young children's disruptive behavior. Journal of Abnormal Child Psychology, 37, 637–652. doi:10.1007/s10802-009-9307-3
De Los Reyes A., Kazdin A. E. (2005). Informant discrepancies in the assessment of childhood psychopathology: A critical review, theoretical framework, and recommendations for further study. Psychological Bulletin, 131, 483–509.
Dirks M. A., De Los Reyes A., Briggs-Gowan M., Cella D., Wakschlag L. S. (2012). Embracing not erasing contextual variability in children's behavior: Theory and utility in the selection and use of methods and informants in developmental psychopathology. Journal of Child Psychology and Psychiatry, 53, 558–574. doi:10.1111/j.1469-7610.2012.02537.x
Durston S. (2003). A review of the biological bases of ADHD: What have we learned from imaging studies? Mental Retardation and Developmental Disabilities Research Reviews, 9, 184–195. doi:10.1002/mrdd.10079
Elliott C. (1983). Differential abilities scales: Introductory and technical handbook. New York, NY: Psychological Corporation.
Foley D. L., Rutter M., Angold A., Pickles A., Maes H. M., Silberg J. L., Eaves L. J. (2005). Making sense of informant disagreement for overanxious disorder. Journal of Anxiety Disorders, 19, 193–210. Retrieved from http://dx.doi.org/10.1016/j.janxdis.2004.01.006
Gadow K. D., Sprafkin J., Nolan E. E. (2001). DSM-IV symptoms in community and clinic preschool children. Journal of the American Academy of Child and Adolescent Psychiatry, 40, 1383–1392. doi:10.1097/00004583-200112000-00008
Gray S. A. O., Carter A. S., Briggs-Gowan M. J., Hill C., Danis B., Keenan K., Wakschlag L. S. (2012). Preschool children's observed disruptive behavior: Variations across sex, interactional context, and disruptive psychopathology, Journal of Clinical Child and Adolescent Psychology, 41, 499–507. doi:10.1080/15374416.2012.675570
Gresham F. M., Elliott S. N. (1990). Social skills rating system manual. Circle Pines, MN: American Guidance Service.
Guralnick M. J. (2011). Why early intervention works: A systems perspective. Infants and Young Children, 24, 6–28. doi:10.1097/IYC.0b013e3182002cfe
Hanson M. J., Miller A. D., Diamond K., Odom S., Lieber J., Butera G., Fleming K. (2011). Neighborhood community risk influences on preschool children's development and school readiness. Infants and Young Children, 24, 87–100. doi:10.1097/IYC.0b013e3182008dd0
Hardt J., Rutter M. (2004). Validity of adult retrospective reports of adverse childhood experiences: Review of the evidence. Journal of Child Psychology and Psychiatry and Allied Disciplines, 45, 260–273. doi:10.1111/j.1469-7610.2004.00218.x
Hill C., Maskowitz K., Danis B., Wakschlag L. S. (2008). Validation of a clinically sensitive, observational coding system for parenting behaviors: The Parenting Clinical Observation Schedule. Parenting: Science and Practice, 8, 153–185. doi:10.1080/15295190802045469
Jones Harden B., Winslow M., Kendziora K., Shahinfar A., Rubin K., Fox N., Zahn-Waxler C. (2000). Externalizing problems in Head Start children: An ecological exploration. Early Education and Development, 11, 357–385.
Keenan K., Wakschlag L. (2000). More than the terrible twos: The nature and severity of behavior problems in clinic-referred preschool children. Journal of Abnormal Child Psychology, 28, 33–46.
Keenan K., Wakschlag L. S., Danis B., Hill C., Humphries M., Duax J., Donald R. (2007). Further evidence of the reliability and validity of DSM-IV ODD and CD in preschool children. Journal of the American Academy of Child and Adolescent Psychiatry, 46, 457–468. doi:10.1097/CHI.0b013e31803062d3
Kettler R. J., Feeney-Kettler K. A. (2011). Screening systems and decision-making at the preschool level: Application of a comprehensive validity framework. Psychology in the Schools, 48, 430–441.
Klein D. N., Ouimette P. C., Kelly H. S., Ferro T., Riso L. P. (1994). Test-retest reliability of team consensus best-estimate diagnoses of axis I and II disorders in a family study. American Journal of Psychiatry, 51, 1043–1047.
Kolko D. J., Kazdin A. E. (1993). Emotional/behavioral problems in clinic and nonclinic children: Correspondence among child, parent, and teacher reports. Journal of Child Psychology and Psychiatry, 34, 991–1006. doi:10.1111/j.1469-7610.1993.tb01103.x
Kraemer H. C., Measelle J. R., Ablow J. C., Essex M. J., Boyce W. T., Kupfer D. J. (2003). A new approach to integrating data from multiple informants in psychiatric assessment and research: Mixing and matching contexts and perspectives. American Journal of Psychiatry, 160, 1566–1577. doi:10.1176/appi.ajp.160.9.1566
Lord C., Risi S., DiLavore P. S., Shulman C., Thurm A., Pickles A. (2006). Autism from 2 to 9 years of age. Archives of General Psychiatry, 63, 694–701. doi:10.1001/archpsyc.63.6.694
Macy M. (2012). The evidence behind developmental screening instruments. Infants and Young Children, 25, 19–61. doi:10.1097/IYC.0b013e31823d37dd
Maziade M., Roy M. A., Fournier J. P., Cliche D., Merette C., Caron C. (1992). Reliability of best-estimate diagnosis in genetic linkage studies of major psychoses: Results from the Quebec pedigree studies. American Journal of Psychiatry, 149, 1674–1686.
McClellan J., Speltz M. (2003). Psychiatric diagnosis in preschool children. Journal of the American Academy Child and Adolescent Psychiatry, 42, 127–128. doi:10.1097/00004583-200302000-00002
Neisworth J. T., Bagnato S. J. (2004). The mismeasure of young children: The authentic assessment alternative. Infants and Young Children, 17, 198–212.
Oliver R. M., Reschly D. J. (2010). Special education teacher preparation in classroom management: Implications for students with emotional and behavioral disorders. Behavioral Disorders, 35, 188–199.
Piacentini J. C., Cohen P., Cohen C. (1992). Combining discrepant diagnostic information from multiple sources: Are complex algorithms better than simple ones? Journal of Abnormal Child Psychology, 20, 51–63.
Richardson M., Henry J., Black-Pond C., Sloane M. (2008). Multiple types of maltreatment: Behavioral and developmental impact on children in the child welfare system. Journal of Child & Adolescent Trauma, 1, 1–14.
Setterberg S., Bird H., Gould M., Shaffer D., Fisher P. (1992). Parent and interviewer versions of the Children's Global Assessment Scale. New York, NY: Columbia University.
Shaffer D., Gould M., Brasic J., Ambrosini P., Fisher P., Bird H., Aluwahlia S. (1983). A children's global assessment scale (C-GAS). Archives of General Psychiatry, 40, 1228–1231.
Sheeber L. B., Johnson J. H. (1992). Applicability of the impact on family scale for assessing families with behaviorally difficult children. Psychological Reports, 71, 155–159. doi:10.2466/pr0.1918.104.22.168
Shernoff E. S., Kratochwill T. R. (2007). Transporting an evidence-based classroom management program for preschoolers with disruptive behavior problems to a school: An analysis of implementation, outcomes, and contextual variables. School Psychology Quarterly, 22, 449–472. http://dx.doi.org/10.1037/1045-3822.214.171.1249
Shernoff E. S., Mehta T., Atkins M., Torf R., Spencer J. (2011). A qualitative study of the sources and impact of stress among urban teachers
. New York, NY: Springer. doi:10.1007/s12310-011-9051-z
Shernoff E. S., Marinez-Lora A., Frazier S. L, Jakobsons L. J., Atkins M. S., Bonner D. (2011). Teachers supporting teachers in urban schools: What iterative research designs can teach us. School Psychology Review, 40, 465–485.
Speltz M., McMellan J., DeKlyen M., Jones K. (1999). Preschool boys with oppositional defiant disorder: Clinical presentation and diagnostic change. Journal of the American Academy of Child and Adolescent Psychiatry, 38, 838–845. doi:10.1097/00004583-199907000-00013
Spengler P. M., White M. J., Ægisdóttir S., Maugherman A. S., Anderson L. A., Cook R., Rush J. D. (2009). The meta-analysis of clinical judgment project: Effects of experience on judgment accuracy. The Counseling Psychologist, 37, 350–399.
Spinazzola J., Ford J. D., Zucker M., van der Kolk B. A., Silva S., Smith S. F., Blaustein M. (2005). Survey evaluates complex trauma exposure, outcome, and intervention among children and adolescents. Psychiatric Annals, 35(5), 433–439.
Straus M. A. (1979). Measuring intrafamily conflict and violence: The Conflict Tactics (CT) Scales. Journal of Marriage & Family, 41, 75–88. doi:10.2307/351733
Strober M., Green J., Carlson G. (1981). Reliability of psychiatric diagnosis in hospitalized adolescents: Interrater agreement using DSM-III. Archives of General Psychiatry, 38, 141–145. doi:10.1001/archpsyc.1981.01780270027002
Tolan P. H., Henry D. (1996). Patterns of psychopathology among urban poor children: Comorbidity and aggression effects. Journal of Consulting and Clinical Psychology, 64, 1094–1099.
Voight R. G., Liorente A. M., Jensen C. L., Fraley J. K., Barbaresi W. J., Heird W. C. (2007). Comparison of the validity of direct pediatric developmental evaluation versus developmental screening by parent report. Clinical Pediatrics, 46, 523–529. doi:10.1177/0009922806299100
Wakschlag L., Danis B. (2004). Assessment of disruptive behavior in young children: A clinical-developmental framework. In: DelCarmen-Wiggins R., Carter A. S. A. S. (Eds.), Handbook of infant, toddler and preschool mental health assessment (pp. 421–440). NY: Oxford University Press.
Wakschlag L., Shernoff E. S., Danis B., Hill C., Stein J., Leventhal B. (2005). Integrative Consensus Procedures Manual of the Observing Young Children and Families (DB-DOS) Study. Unpublished manuscript, Institute for Juvenile Research, University of Illinois at Chicago.
Wakschlag L., Briggs-Gowan M, Carter A., Hill C., Danis B., Keenan K., Leventhal B. (2007). A developmental framework for distinguishing disruptive behavior from normative misbehavior in preschool children. Journal of Child Psychology and Psychiatry and Allied Disciplines, 48, 976–987. doi: 10.1111/j.1469-7610.2007.01786.x
Wakschlag L. S., Briggs-Gowan M. J., Hill C., Danis B., Leventhal B. L., Keenan K., Carter A. S. (2008a). Observational assessment of preschool disruptive behavior, part II: validity of the Disruptive Behavior Diagnostic Observation Schedule (DB-DOS). Journal of the American Academy of Child & Adolescent Psychiatry, 47(6), 632–641. http://dx.doi.org/10.1097/CHI.0b013e31816c5c10
Wakschlag L. S., Hill C., Carter A., Danis B., Egger H., Keenan K., Briggs-Gowan M. (2008b). Observational assessment of preschool disruptive behavior: Part I: Reliability of the Disruptive Behavior Diagnostic Observation Schedule (DB-DOS). Journal of the American Academy of Child and Adolescent Psychiatry, 47(6) 622–631. http://dx.doi.org/10.1097/CHI.0b013e31816c5bdb
Wakschlag L., Danis B. (2009). Characterizing early childhood disruptive behavior: Enhancing developmental sensitivity. In: Zeanah C. H. (Ed.), Handbook of infant mental health (3rd ed., pp. 392–408). New York, NY: Guilford.
Wakschlag L. S., Keenan K. (2001). Clinical significance and correlates of disruptive behavior symptoms in environmentally at-risk preschoolers. Journal of Clinical Child Psychology, 30, 262–275. doi:10.1207/S15374424JCCP3002_13
Webster-Stratton C., Reid J., Hammond M. (2001). Preventing conduct problems, promoting social competence: A parent and teacher training partnership in Head Start. Journal of Clinical Child Psychology, 30, 283–302.
Webster-Stratton C., Reid M. J., Hammond M. (2004). Treating children with early-onset conduct problems: Intervention outcomes for parent, child, and teacher training. Journal of Clinical Child and Adolescent Psychology, 33, 105–124.
Westen D., Weinberger J. (2004). When clinical description becomes statistical prediction. American Psychologist, 59, 595–614. doi:10.1037/0003-066X.59.7.595
Westen D., Weinberger J. (2005). In praise of clinical judgment: Meehl's forgotten legacy. Journal of Clinical Psychology, 61, 1257–1276. doi:10.1002/jclp.20181
Yoshikawa H., Zigler E. (2000). Mental health and Head Start: New directions for the twenty-first century. Early Education and Development, 11, 247–264.
Youngstrom E., Loeber R., Stouthamer-Loeber M. (2000). Patterns and correlates of agreement between parent, teacher, and male adolescent ratings of externalizing and internalizing problems. Journal of Consulting and Clinical Psychology, 68, 1038–1050.
behavior problems; clinical judgment; comprehensive assessments; multicontext assessments© 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins.