Integrative Consensus: A Systematic Approach to Integrating Comprehensive Assessment Data for Young Children With Behavior Problems

Shernoff, Elisa S. PhD; Hill, Carri PhD; Danis, Barbara PhD; Leventhal, Bennett L. MD; Wakschlag, Lauren S. PhD

Infants & Young Children:
doi: 10.1097/IYC.0000000000000008
Original Research/Study
ISEI Article

Comprehensive assessments that include parents and teachers are essential when assessing young children vulnerable to emotional and behavioral problems given the multiple systems and contexts that influence and support optimal development (U. Bronfenbrenner & P. A. Morris, 2006; M. J. Guralnick, 2011). However, more data complicate clinical and educational decision making given the challenge of integrating comprehensive data. We report on initial efforts to develop and apply Integrative Consensus procedures designed to synthesize comprehensive assessment data using developmentally informed guidelines. Mother–teacher dyads (N = 295) reported on disruptive behavior in a sample of 295 low-income 3- to 5-year-olds; one-third referred for disruptive behaviors, one-third nonreferred with behavioral concerns, and one-third nonreferred. Two clinicians trained in Integrative Consensus procedures independently applied the framework, with findings highlighting that children identified as disruptive by Integrative Consensus ratings plus mother or teacher ratings significantly predicted behavior problems and impaired social skills. Children identified as disruptive via Integrative Consensus were 4 times more likely to be rated as impaired by their mother at follow-up than by mother or teacher report. Reliability estimates were high (κ = 0.84), suggesting that the method has promise for identifying young children with behavior problems while systematically integrating comprehensive data.

Author Information

Graduate School of Applied and Professional Psychology, Rutgers University, Piscataway, New Jersey (Dr Shernoff); Psychological Services, Jewish Child and Family Service, Northbrook, Illinois (Dr Hill); Family Institute at Northwestern University, Chicago, Illinois (Dr Danis); Nathan Kline Institute, Orangeburg, New York (Dr Leventhal); and Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, Illinois (Dr Wakschlag).

Correspondence: Elisa S. Shernoff, PhD, Graduate School of Applied and Professional Psychology, Rutgers, State University of New Jersey, 152 Frelinghuysen Rd, Piscataway, NJ, 08854 (

The authors gratefully acknowledge contributions of their collaborators on the DB-DOS study, Margaret Briggs-Gowan, Alice Carter, Kate Keenan, Helen Egger, Jennifer Stein, and Andres De Los Reyes. They also gratefully acknowledge Drs. David Henry's and Michael Schoeny's generous guidance with analyses. Portions of this article were presented at the Annual Meeting of the American Psychological Association, August 2006.

This project was supported by NIMH Grants RO1 MH68455 and MH62437 and support from the Shaw and Children's Brain Research Foundations.

The authors declare no conflict of interest.

Article Outline

COMPREHENSIVE ASSESSMENTS are essential when evaluating young children vulnerable to developmental problems given the multiple systems and contexts that influence and support the developing child (Bronfenbrenner & Morris, 2006; Guralnick, 2011; Hanson et al., 2011). Comprehensive assessments are particularly critical when evaluating young children at risk for disruptive behaviors given subtle differences between normative and problematic behavior at this age and because young children typically cannot serve as informants of their own behavior (Wakschlag et al., 2007). Although recent advances in early childhood screening and assessment call for comprehensive, ecological approaches with a strong emphasis on parent involvement (Bruder, 2000; Macy, 2012; National Research Council, 2008), more data can complicate decision making given the challenge of integrating diverse perspectives. Thus, one challenge facing professionals from multiple disciplines (i.e., education, health, social service) conducting early childhood assessments includes determining how to systematically integrate complex assessment data to validly and reliably identify children's strengths and supports needed.

Back to Top | Article Outline


Disruptive behavior problems, including difficulty following rules, tantrums, and aggression, are the most common reason for referring young children for assessments (Jones Harden et al., 2000; Wakschlag & Danis, 2009; Webster-Stratton, Reid, & Hammond, 2001), with infants and young children below the poverty level three times as likely to exhibit behavior problems in the subclinical and clinical range (Briggs-Gowan, Carter, Skuban, & Horwitz, 2001). Among head start, early childhood, and primary school teachers, challenging behaviors are identified as the most stressful, complex, and pressing issue they face and their top priority for professional development (Shernoff et al., 2011, Shernoff, Mehta, Atkins, Torf, & Spencer, 2011; Yoshikawa & Zigler, 2000). Prevention and management of chronic disruptive behaviors in inclusive classrooms is critically important in the context of the No Child Left Behind Act (2001) and the Individuals With Disabilities Education Improvement Act (2003), with a high priority placed on improving educational outcomes for students with historically low achievement including economically disadvantaged learners, students with identified behavioral disabilities, and students at risk for a special education referrals (Atkins, Hoagwood, Kutash, & Seidman, 2010; Oliver & Reschly, 2010).

Comprehensive assessment of young children vulnerable to poor developmental outcomes due to economic disadvantage is challenging given the high prevalence of exposure to trauma and co-occurrence of trauma-related externalizing behaviors as opposed to primary problems with disruptive behavior (Hanson et al., 2011; Richardson, Henry, Black-Pond, & Sloane, 2008; Spinazzola et al., 2005). In addition, research documents nor-mative increases in some of the same behaviors that characterize disruptive behavior disorders in this age group (see Wakschlag et al., 2007; Wakschlag & Danis, 2004, 2009). For many parents and teachers, this developmental period can be challenging given some children display normative increases in behavior problems that reflect burgeoning independence and significant advances in language development. Nonetheless, those behaviors remain challenging for caregivers to navigate (Shernoff & Kratochwill, 2007; Wakschlag & Danis, 2004, 2009).

Professionals from multiple disciplines working collaboratively with families are faced with the complex task of distinguishing children experiencing transient behavioral challenges from those at risk for longer-term difficulties, and the field is divided regarding how to balance the risk of premature labeling with the cost of not intervening early with children who could benefit from services. Given that evidence suggests that 50%–75% of young children with serious disruptive behaviors continue to exhibit these behaviors into school-age years and the documented effectiveness of early intervention, comprehensive assessments that are developmentally appropriate, fully include parents, and directly link to evidence-based interventions are vital (Campbell & Ewing, 1990; Guralnick, 2011; Neisworth & Bagnato, 2004; Speltz, McMellan, DeKlyen, & Jones, 1999).

Back to Top | Article Outline


The development of frameworks and methods for synthesizing and integrating complex assessment data is not new to the field but remains challenging given information provided by key caregivers can be discrepant, particularly when reporting on behavior across different contexts (e.g., home vs. school vs. public, see De Los Reyes & Kazdin, 2005; Dirks, De Los Reyes, Briggs-Gowan, Cella, & Wakschlag, 2012; Neisworth & Bagnato, 2004; Voight et al., 2007). These inconsistencies may reflect situational differences in a child's behavior (e.g., difficulty following directions only at preschool) or the subjective experiences of key adults (e.g., low threshold for behavior problems among teachers experiencing significant occupational stress). Predictable discrepancies that emerge during assessment necessitate teamwork and collaboration rather than concluding that one informant is biased or that the lack of convergence reflects measurement error (Kraemer et al., 2003; Neisworth & Bagnato, 2004). Key adults also have unique perspectives and experiences that can explain differences that emerge during an early childhood evaluation. Although parents have historically been marginalized in this process, they are vital team members who possess extensive knowledge of their child's history and development in addition to expertise regarding their child's behavior in a daily context (Bruder, 2000; Neisworth & Bagnato, 2004). Teachers contribute to the assessment process through their observation of students in a structured context that also involves same-aged peers (see Briggs-Gowan, Carter, & Schwab-Stone, 1996; De Los Reyes, Henry, Tolan, & Wakschlag, 2009). Professionals in health, education, psychology, and social service also have an important perspective on the basis of their experience with a range of children and training in assessment and intervention.

Back to Top | Article Outline


Two common approaches for combining assessment data have been applied in research contexts based on of conceptual decisions regarding the contributions of specific informants and methods, differing perspectives regarding which behaviors require intervention, and perceived costs associated with making errors (De Los Reyes & Kazdin, 2005; Kraemer et al., 2003; McClellan & Speltz, 2003). Simple aggregation methods, such as the “or rule,” assume that significant caregivers (e.g., parents and teachers, mothers and fathers) have unique but equally valid perspectives (Kraemer et al., 2003). Thus, in applying the “or rule,” positive identification of strengths or areas of concern is made if either caregiver (e.g., parent or teacher) identifies those behaviors as present (Kraemer et al., 2003; Piacentini, Cohen, & Cohen, 1992). This unique perspective can be related to different attributions made regarding the cause of a behavior, lack of convergence regarding which behaviors require intervention, or the extent to which a behavior is contextually specific (De Los Reyes & Kazdin, 2005; De Los Reyes et al., 2009). In the case of a diagnostic assessment, the “or rule” generates the most heterogeneous group which leads to higher sensitivity (i.e., correct identification of those individuals truly in need of services) but lower specificity (i.e., correct nonidentification of individuals not in need of services) when the sample and the criterion measure are held constant (see McClellan & Speltz, 2003, for a detailed discussion of sensitivity and specificity trade-offs). The “and rule,” on the contrary, requires both caregivers (e.g., parent and teacher) to endorse a behavior as present. The “and rule” tends to sacrifice sensitivity while enhancing specificity (see Kettler & Feeney-Kettler, 2011; McClellan & Speltz, 2003). This relationship is only one of many factors (e.g., quality of the criterion variable, implications of identification) that must be considered when selecting a method for integrating data and resolving discrepancies.

These two simple aggregation methods (i.e., “or rule,” “and rule”) have been applied in research and may possess some inherent appeal given that these rules standardize the assessment process and reduce idiosyncratic decisions made by professionals who may weigh data differently (Bird, Gould, & Staghezza, 1992; Foley et al., 2005; Kraemer et al., 2003; Piacentini et al., 1992; Wakschlag et al., 2005; Wakschlag & Danis, 2004, 2009). The process of integrating complex assessment data is often challenging in the real world when professionals must incorporate standardized assessment data with contextual information (e.g., recent life events, child and family strengths), all of which are vital to establishing how a child is currently functioning through the lens of significant caregivers. However, these simple aggregation methods are limited methodologically and conceptually by simply focusing on ways to combine information while ignoring the role that development and context plays in assessment. “Gray areas” often emerge and contextual factors are critical to identifying functional competencies that inform interventions. Contextual factors are also relevant when assessing children with behavioral problems given their strong environmental and interactional components in contrast to primary difficulties with inattention (Durston, 2003; Gray et al., 2012). These methods also provide no guidance regarding how to weigh contextual factors and other idiographic information that professionals have access to and ignore the role that clinical judgment plays in the integration of complex assessment data.

Back to Top | Article Outline


To address these issues, we developed the Integrative Consensus framework (Wakschlag et al., 2004, 2009), which is a manualized clinical decision-making process designed to bridge the gap between research and practice by combining the strength of standardized methods and decision rules used in research to enhance reliability without sacrificing clinical judgment. Integrative consensus incorporates the qualitative process of integrating developmentally appropriate assessment methods (i.e., developed for and validated on preschoolers rather than simple downward extensions of standardized assessment tools for adults) with child developmental functioning and recent life events into a gestalt and weighing their salience within a developmental context (i.e., clinical judgment; Waksclag & Danis, 2004, 2009). Integrative Consensus moves beyond counting discrete behaviors to incorporating quantitative (e.g., often struggles to follow directions at school), qualitative (e.g., strong social skills), and contextual information (e.g., recent divorce) into clinical decision making, with a sample of low-income youth referred for a continuum of behavior problems.

The Integrative Consensus framework was guided by the following five principles (in italics) with case examples used illustratively.

1. Clinical concern was determined by (a) the developmental inappropriateness, frequency, severity, and pervasiveness of behavior(s), (b) the degree to which behaviors impaired the child's functioning and ability to negotiate critical developmental tasks, and (c) the paucity of developmentally expected competencies to compensate for behavioral challenges (Wakschlag et al., 2005).

During the parent interview, a mother reports that her 3-year-old daughter exhibits several disruptive behaviors meeting oppositional defiant symptom criteria. However, the parent indicates that her daughter's disruptive behaviors are not impairing to the child and that family members can manage her behavior. The mother shares that her daughter has a number of social strengths, including that she is kind, empathic, she shares well, and offers comfort when others are hurt. The teacher reports no problematic behaviors at daycare and reports good academic skills. The child demonstrates age-appropriate and adaptive coping behavior and positive social engagement during the clinic visit. She is also bright and demonstrated well-regulated behavior during the developmental assessment. During integrative consensus review, after reviewing and scoring the assessment data, despite oppositional behaviors reported during the parent interview, clinicians would code not disruptive because of the lack of evidence of impairment across multiple contexts and the presence of multiple compensatory strengths.

2. Parent report was considered the primary source of information, whereas teacher report and clinical observations were used to refine maternal report and calibrate the “fit” between different sources of data.

A mother reports that her 4-year-old son exhibits very few disruptive behaviors at home whereas the teacher reports that the child tantrums on a regular basis and becomes easily upset in the classroom. The child is well regulated and positively engaged during portions of the assessment that include his mother. He transitions well between tasks, is compliant, and is a good play partner. He becomes very distressed during the transition to developmental testing when he is expected to work with the examiner alone. Given the discrepancy between mother and teacher report, the child's competencies when observed with his mother and the child's distress at separation, the clinician would review additional clinical data (e.g., presence of anxiety symptoms or learning concerns) before concluding primary problems with disruptive behaviors.

3. Behaviors reported in familiar, natural settings (e.g., home and school) and/or by converging evidence across reporters and contexts were weighed more heavily than behaviors reported in unfamiliar settings (e.g., clinic) or in isolation.

A 3-year-old boy exhibits disruptive behavior during developmental testing and the structured observation but his mother and teacher do not report elevated behavioral symptoms at home or at school. Both teacher and parent report positive and age-appropriate functioning across domains of behavioral control, social engagement, and academic skills. Without collateral concerns in familiar, typical settings, the child's behavior in the clinic would be interpreted as a function of being in an unfamiliar setting and coded as not disruptive by clinicians.

4. When a child's behavior is ambiguous, contextual factors are used to refine thinking and inform decisions.

A mother reports numerous disruptive symptoms on the parent interview and a great deal of impairment in addition to high levels of depression on the Beck Depression Interview and numerous life stressors related to an impending divorce and a recent move. Using the aforementioned guidelines, the clinician would weigh teacher report and how the child was adapting to the school environment in addition to the child's behavior in the clinic along with parent report given significant life stressors facing the parent and the family.

5. When discrepancies emerge and data fail to “add up,” clinicians must look for patterns across situations or frequent patterns within one context that make the behavior concerning and/or difficult to ignore.

A mother reports that her 3-year-old son exhibits a great deal of impulsivity and hyperactivity at home and at school along with age-appropriate compliance and good emotion regulation skills. The teacher reports elevated impulsivity at school and positive social skills and age-appropriate coping skills. He is kind to peers, takes direction well, and asserts himself appropriately. During the parent interview, the mother describes a recent event in which her son pushed his uncle down a flight of stairs in anger and that the uncle was seriously hurt. In this case, although the qualitative nature of the aggressive episode is concerning, the frequency of the aggressive outbursts is rare and would be interpreted as primary problems with impulsivity, with clinicians coding not disruptive per Integrative Consensus guidelines.

Back to Top | Article Outline


Our goal was to report on our initial efforts to develop a replicable strategy for assessing behavior problems in young children, with a specific focus on oppositional and conduct problems given their strong contextual basis (see Gray et al., 2012). Our goal also included articulating principles for weighing different sources of information and resolving discrepancies systematically. With these issues in mind, the aims of this study included examining (a) reliability evidence for Integrative Consensus, (b) whether contextual and developmental factors taken into account during Integrative Consensus predicted discrepancies between key caregivers and methods, and (c) the incremental utility of incorporating Integrative Consensus into the assessment process.

Back to Top | Article Outline



Participants in this study (N = 295, mother–teacher dyads) were drawn from a larger, federally funded longitudinal study (N = 336, low-income families) examining early emerging disruptive behavior among children at risk for poor developmental outcomes (Waschlag et al., 2008a; Wakschlag et al., 2008b). Mother–child dyads were recruited from clinics affiliated with two Midwestern universities serving urban, low-income populations. Children were sampled along the full behavioral continuum (i.e., clinically referred children and nonreferred children with and without behavioral concerns) to ensure high levels of behavioral variability (Waschlag et al., 2008a; Wakschlag et al., 2008b). Institutional review board approval was obtained before initiating the study. Forty percent of the larger sample of children (n = 134) was referred to an outpatient specialty clinic for preschool behavior problems because of aggression and defiance. Thirty percent of the sample (n = 102) was nonreferred with behavioral concerns, meaning recruited through a general pediatric clinic using a brief behavioral screening to determine that the family was not seeking an evaluation but the parent or the teacher had concerns about the child's behavior. The remaining 30% (n = 100) were nonreferred without behavioral concerns and recruited through a general pediatric clinic with a brief phone screening ruling out disruptive behaviors and confirming that the caregiver was not seeking an evaluation (Waschlag et al., 2008a; Wakschlag et al., 2008b). Inclusion criteria for the larger study included (a) 3- to 5-years-old; (b) residence with birth mother; (c) daycare or school attendance; (d) family income within 250% of the US poverty level by family size; and (e) absence of developmental disabilities (e.g., autism, seizure disorders). Referred children not attending school because of disruptive behaviors (i.e., school expulsion or caregiver avoiding enrollment because of behavior problems) were included in the study (n = 3).

This study included (N = 295) mother–teacher dyads from the larger study for whom mother and teacher data were both available for the Integrative Consensus review. Forty percent (n = 119) of the Integrative Consensus sample was referred for behavior problems, 28% (n = 81) were nonreferred with behavioral concerns, and 32% (n = 95) were nonreferred without behavioral concerns. Table 1 illustrates the demographic characteristics of the mothers and children comprising the Integrative Consensus sample. In addition, 61% of mothers reported being single parents, 89% completed high school, and the mean annual family income was $21,977 (SD = $16,618). As per institutional review board procedures, mothers gave permission for teachers to provide child-level data, thus teachers did not contribute personal demographic data.

Back to Top | Article Outline
Assessment data reviewed via Integrative Consensus

The following section describes baseline measures collected in the larger, longitudinal study and subsequently reviewed and synthesized via Integrative Consensus. Table 2 also summarizes those measures in brief whereas Table 3 illustrates the percentage of the analytic sample categorized as disruptive on the basis of the informant and assessment method. In addition, 90% (n = 265) of the mother–child dyads assessed at baseline returned 1 year later (M = 392 days; SD = 53) for a follow-up evaluation, with teacher data successfully obtained from 86% of the follow-up sample (Waschlag et al., 2008a; Wakschlag et al., 2008b). During the 1-year follow-up assessment, all baseline measures were readministered and mothers indicated within the past year (Yes or No) whether (1) the child had been prescribed any medications to treat disruptive behavior symptoms, (2) the mother had sought out mental health services for herself, and (3) the mother had sought out mental health services for the child. Mothers and current teachers also independently completed the nonclinician version of the Children's Global Assessment Scale (C-GAS; Shaffer et al., 1983) at 1-year follow-up. Integrative Consensus review was executed after baseline and follow-up data were collected well after the larger study had concluded.

Back to Top | Article Outline
Behavioral problems

Mothers completed the Kiddie Disruptive Behavior Disorders Schedule (Keenan et al., 2007), a semistructured interview for preschoolers that includes probes to assess the frequency, severity, and pervasiveness of Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 2000, 4th ed.) disruptive behavior symptoms across school, home, and public. A developmentally enhanced approach was used in which multiple features of the child's behavior (e.g., severity and frequency) were taken into account to improve discrimination between normative misbehavior and clinical symptoms of disruptive behaviors (Wakschlag et al., 2008b). One-week test–retest reliability with the current sample was high for disruptive behavior disorder diagnoses (κ = 0.81, p < .001) and total number of DBD symptoms (intraclass correlation coefficient = 0.82, p < .001; Wakschlag et al., 2008b).

Teachers also completed the Early Childhood Inventory (ECI; Gadow, Sprafkin, & Nolan, 2001), a DSM-IV rating scale sensitive to a wide range of developmental problems in 3- to 5-year-olds. Teachers reported on 35 ECI items mapping onto DSM-IV disruptive behaviors (response options range from 0 = Never to 3 = Very Often). The ECI generates categorical and continuous symptom scores, demonstrates satisfactory test–retest reliability (r = .56 for oppositional behaviors and r = .41 for conduct problems), and differentiates clinic-referred from nonreferred children. For this study, we employed the categorical scoring approach in which behaviors were considered at the symptom level if they were endorsed as occurring “often” or “very often.” Internal consistency in the present sample was excellent (α = .94).

The Disruptive Behavior Diagnostic Observation Schedule (DB-DOS; Wakschlag et al., 2008a; Wakschlag et al., 2008b) is a 50-minute, clinic-based observation system used to assess Behavior Regulation (i.e., noncompliant, provocative, inflexible behavior) and Anger Modulation (i.e., difficulty modulating anger) on a continuum from 0 (No Evidence of Behavior) to 3 (High Level of Behavior; see Wakschlag et al., 2008a, 2008b). The DB-DOS includes one Parent and two Examiner contexts; the examiner contexts vary examiner attending to the child (Examiner Engaged Versus Examiner Busy) and all contexts create “presses” for misbehavior (e.g., compliance, frustration). DB-DOS psychometrics established with the current sample highlights acceptable interrater reliability (mean weighted κ = 0.66 for Behavior Regulation and 0.62 for Anger Modulation) and good internal consistency (mean α = .85 for Behavior Regulation, α = .92 for Anger Modulation) and distinguishes normative misbehavior from disruptive behavior (F values range from 9.40 to 29.06, all p values < .01; Wakschlag et al., 2008a, 2008b).

Back to Top | Article Outline
Social competence

The Social Skills Rating System–Teacher Form Preschool Level (Gresham & Elliott, 1990) measures how frequently children engage in a range of classroom behaviors (1 = Never to 4 = Very Often). The Social Skills Rating System–Preschool Level has 40 items with the Social Skills Scale of primary interest in this study (α = .94). The Social Skills Scale comprises three subscales measuring Cooperation (α = .90), Assertion (α = .90), and Self-Control (α = .91). Normative data are provided by age and sex and were standardized on a heterogeneous population of which one third was urban and 28% minority.

Back to Top | Article Outline
Developmental functioning and other developmental concerns

The Differential Abilities Scale-Preschool Core (Elliott, 1983) assesses developmental functioning in children aged 2.6–5.11 years. The General Conceptual Ability Score (M = 100; SD = 15) was used for analytic purposes. The standardization sample oversampled minority children, and reliability coefficients are high for lower (r = .90) and upper preschool versions (r = .94). Test–retest scores are stable for the General Conceptual Ability Score ranging from .79 to .94. Examiners also completed behavioral ratings during the Differential Abilities Scale (i.e., aggression, affect, task engagement, and social engagement), ranging from 1 = None to 4 = High).

The ECI (Gadow et al., 2001) Parent and Teacher Checklist assessed other developmental problems, including autism, anxiety, depression, speech and language disabilities, and other developmental delays (psychometrics described previously), with items rated on a 4-point scale (0 = Never to 3 = Very Often).

Back to Top | Article Outline
Child and family context

Three measures were used to assess life events and stressors. The first included the Conflict Tactics Scale (Straus, 1979), which has 11 items assessing the child's exposure to violence. Internal consistency coefficients for the current sample = 0.92. The second included the Difficult Life Circumstances Questionnaire (Barnard, 1989), which includes 28 items (Yes or No) assessing the number and type of chronic stressors families have experienced in the past year, including difficulty finding affordable housing, problems with a former partner, and financial strain. The Difficult Life Circumstances Questionnaire was developed for urban families and demonstrates acceptable reliability, with 1-year test–retest for the total score r = .70 (Barnard, 1989) and internal consistency for the current sample = .70, indicating good reliability. The third included the Beck Depression Inventory (BDI-IA; Beck & Steer, 1993) completed by mothers. The BDI is a widely used self-report measure of depression for individuals 13 years and older, with response options ranging from 0 to 3 reflecting symptom severity with high internal consistency for the current sample (α = .90).

Sociodemographic data were obtained through a structured interview with mothers, who provided information on child and maternal age, gender, ethnicity/race, maternal relationship status, partner involvement, family income, and education.

Back to Top | Article Outline
Integrative Consensus procedures

An interdisciplinary team, including one senior psychologist, one senior child psychiatrist, two experienced psychologists, one doctoral-level clinical psychologist, and one doctoral-level school psychologist (hereafter, the doctoral-level psychologists are referred to as clinicians), developed the Integrative Consensus procedures. The two clinicians who independently applied the Integrative Consensus framework to the sample (N = 295, mother–teacher dyads described previously) were initially trained on Integrative Consensus procedures through several rounds of open coding and discussion, followed by independent coding of assessment data to a criterion of 80% exact agreement and 0.90 Pearson correlations on clinician overall rating of impairment using the C-GAS (Setterberg, Bird, Gould, Shaffer, & Fisher, 1992) described next. After initial reliability was established, individual children were randomly assigned to the two clinicians who were blind to referral status (i.e., clinically referred, nonreferred with behavioral concerns, and nonreferred without behavioral concerns). In addition, ongoing reliability of Integrative Consensus was assessed via random assignment of 25% of children to clinicians for double coding. Disagreements were discussed during twice monthly interdisciplinary team meetings. Clinicians spent approximately 30–45 minutes independently reviewing assessment data for each participant, applying the following procedures, which are also illustrated in Figure.

* Clinicians reviewed parent and teacher report of behavior problems by scoring the Kiddie Disruptive Behavior Disorders parent interview (Keenan et al., 2007) and the ECI–Teacher report (Gadow et al., 2001). Clinicians summed the total number of oppositional, conduct, and ADHD symptoms for which the child met the criteria according to DSM-IV (American Psychiatric Association, 2000).

* Clinicians reviewed the Differential Abilities Scale (Elliott, 1983) scores to assess learning strengths and needs, along with behavioral ratings made during developmental testing to understand social competencies and whether disruptive behaviors interfered with testing.

* Clinicians assessed teacher-rated social competence via the Social Skills Rating System (Gresham & Elliott, 1990), including the child's ability to help his/her peers, share materials, and comply with rules and directions. Raw scores within one standard deviation of the standardization sample were considered average. Scores 1 SD below or above the mean were considered below or above average, respectively.

* Other developmental problems (i.e., autism, anxiety, depression, speech and language disabilities, developmental delays) were assessed via the ECI parent and teacher report (Gadow et al., 2001).

* Clinicians scored measures of life events and stressors (i.e., Conflict Tactics Scales, Difficult Life Circumstances Questionnaire, BDI) and reviewed sociodemographic interview data to examine how family and child contextual factors were influencing the child's behavior at home and school.

* Clinicians viewed digital recordings of the DB-DOS in brief by scanning the entire video to identify salient observations to view more fully. DB-DOS observations facilitated clinicians' observation of the child's behavior within and outside of the parent–child context and supplemented other data. Review of the DB-DOS focused on the presence of negative affect and disruptive behaviors as well as social competencies, including socially directed positive affect, social engagement, and prosocial behaviors.

* Clinicians reviewed four impairment indicators. The first included KDBDs Parent Interview questions (0 = Not Very Much, 2 = Some, and 3 = A Lot) focused on interference of disruptive behaviors with the parents' ability to take the child in public, set limits, or leave the child with a caregiver and the child's ability to get along with others or learn at school. KDBDs impairment scores were viewed on a continuum, with behaviors considered impairing if the sum score exceeded 3. The second impairment indicator included maternal ratings on the Impact on the Family Scale (Preschool Disruptive Behavior Version; Sheeber & Johnson, 1992), which has 23 items assessing the social, financial, and family burden resulting from the child's disruptive behaviors (0 = Strongly Disagree to 4 = Strongly Agree). Clinicians considered the sum score and severity (e.g., parents quit their job because of behavior problems, financial burden). Sum scores falling in the 4–7 range suggested concerning level of impairment. The third impairment indicator included history of school expulsion and the fourth included mother and teacher independent rating of global impairment on the basis of the nonclinician version of the C-GAS (Shaffer et al., 1983). The C-GAS includes behaviorally oriented descriptors and life situations validated for children as young as 2 years of age (Hill, Maskowitz, Danis, & Wakschlag, 2008; Wakschlag & Keenan, 2001). Children's Global Assessment Scale scores range from 1 to 100, with each decile reflecting the impact of the child's behavior on school, family, and peer relations, with scores of 60 or lower indicating clinical impairment.

* Clinicians generated overall impairment by integrating the four impairment indicators while focusing on the extent to which disruptive behaviors interfered with (a) the child's ability to navigate normative developmental tasks (e.g., make friends, learn at school, enjoy time with family and peers) and (b) the impact of the behavior problems on the family (e.g., difficulty securing stable caregivers or maintaining employment). Clinicians generated an overall impairment score from 1 to 100 using the clinician version of the C-GAS (Setterberg et al., 1992). The clinician C-GAS score was incorporated into the final decision regarding presence of a disruptive behavior disorder. Pearson correlations on clinician C-GAS ratings made independently indicted high interrater reliability (r = .85, range: .79–.89).

* Clinicians proposed a binary diagnosis (not disruptive or disruptive). Not disruptive reflected behaviors falling generally in the normative range rather than an absence of concern. Disruptive reflected that the child met (1) DSM-IV criteria for oppositional defiant disorder (four symptoms) and/or conduct disorder (three symptoms) or Disruptive Behavior Disorder-NOS (defined a priori as the presence of at least three disruptive symptoms) and (2) Clinician C-GAS score of 60 or less.

Although differential diagnoses for other developmental problems were not made because of lack of full diagnostic information, the presence of other clinical problems observed during DB-DOS observations and reported by mothers were weighed. Given oppositional defiant and conduct symptoms were the focus of this study, clinicians coded inattention and hyperactivity symptoms as not disruptive in the absence of oppositional or more serious conduct problems. Only one child identified as not disruptive via Integrative Consensus met DSM-IV criteria for ADHD. Twenty-five children in the sample were comorbid for disruptive behaviors and ADHD.

Back to Top | Article Outline


Reliability of Integrative Consensus

The reliability of the Integrative Consensus procedures was assessed via random assignment of 25% of the participants to the two clinicians for double coding. Weighted κ values measured the proportion of weighted agreement corrected for by chance (Cohen, 1968). Weighted κ values reflect agreement beyond chance on the basis of the following guidelines: 0.75 or greater = Excellent; 0.60–0.74 = Good; 0.40–0.59 = Fair; less than 0.40 = Poor (Cicchetti & Sparrow, 1981). Results suggest that interrater reliability for Integrative Consensus was excellent for child disruptive behavior ratings (κ = 0.84; range: 0.80–0.87).

Back to Top | Article Outline
Developmental and contextual factors predicting agreement

Table 3 illustrates the percentage of the analytic sample categorized as disruptive and not disruptive on the basis of parent report using the Kiddie Disruptive Behavior Disorders interview (Keenan et al., 2007), teacher report using the ECI (Gadow et al., 2001), and Integrative Consensus ratings. Of the analytic sample of young children (N = 295), descriptive analyses reflected that Integrative Consensus consistently identified a greater percentage of children as disruptive (30%) compared with mother (14%) or teacher report (14%).

Multivariate analysis of variances examined whether child developmental factors (i.e., child's age in years, developmental functioning), life events and stressors (i.e., chronic family stress, maternal depression, relationship conflict), or mother/teacher report of disruptive behaviors or social competence taken into account during the Integrative Consensus review predicted agreement between pairs of raters and methods. For these analyses, we examined discrepancies between Integrative Consensus and the “or rule” (i.e., mother or teacher identified the child as disruptive). Only children categorized as disruptive by at least one caregiver/method were included in these analyses (n = 91). Pairs were classified as “agreeing” if they agreed on the disruptive rating (n = 65). Pairs were categorized as “disagreeing” if either the Integrative Consensus review categorized the child as disruptive but the “or rule” categorized the child as not disruptive or the Integrative Consensus review categorized the child as not disruptive but the “or rule” categorized the child as disruptive (n = 26). Few factors discriminated the “agree” from “disagree” groups, and only the multivariate analysis of variances examining mother and teacher report of child behavior was significant (Wilks' λ = 8.90; p < .001). Follow-up univariate analyses indicated that children jointly categorized as disruptive by Integrative Consensus and at least one caregiver were rated as having more teacher-reported disruptive behaviors and fewer teacher-reported social skills than children for which Integrative Consensus and other caregivers disagreed (p values < .01).

Back to Top | Article Outline
Incremental utility of Integrative Consensus

Finally, we examined whether Integrative Consensus ratings made at baseline added incremental utility in predicting service use and child impairment 1 year later at the follow-up assessment using multivariate hierarchical logistic regression. Impairment was measured using the clinician C-GAS ratings (i.e., a score of ≤60 indicates impairment) made during the follow-up assessment. Service use (i.e., parent-sought therapy for herself or the child at 1-year follow-up) and child-prescribed medication at 1-year follow-up was measured via parent report (described previously). For each regression, mother or teacher report of disruptive behavior was entered in the first step, and the Integrative Consensus rating was entered in the second step. Table 4 illustrates that being categorized as disruptive via Integrative Consensus increased the odds that the family sought services for the child between eightfold and 16-fold. In addition, children rated disruptive by Integrative Consensus were four times more likely to be rated as impaired by their mother 1 year later. In contrast, Integrative Consensus did not provide incremental utility in predicting which children were rated as impaired by their teacher 1 year later.

Back to Top | Article Outline


Disruptive behavior problems are the most common reason for referring young children for evaluations (Jones Harden et al., 2000; Webster-Stratton et al., 2001), with prevalence rates for disruptive behaviors almost three times national estimates in economically disadvantaged communities (Tolan & Henry, 1996). Evidence also suggests that some young children who struggle with aggression and noncompliance can experience longer-standing academic problems and school failure. (Keenan & Wakschlag, 2000; Webster-Stratton, Reid, & Hammond, 2004). Current policy calls for comprehensive assessments that emphasize teamwork, collaboration, and active involvement of parents at every phase of an early childhood assessment (Bruder, 2000; National Research Council, 2008; Neisworth & Bagnato, 2004). Despite the need for multisource, multimethod, multicontext assessments, more data can complicate decision making, and developmentally informed guidelines for integrating comprehensive assessments have rarely been described in the literature (Dirks et al., 2012; Westen & Weinberger, 2004). This study represents a preliminary effort to apply a priori principles for weighing different sources of information and integrating contextual data systematically.

Back to Top | Article Outline
Key findings

There were three main findings from this study. First, we found that Integrative Consensus could be reliably applied to capture meaningful variation in young children's behavior and that trained clinicians were able to integrate comprehensive assessment data to arrive at binary diagnoses in a reliable manner. Reliability estimates from this study were satisfactory when compared with those from best estimate procedures used in the studies by Strober, Green, and Carlson (1981; κ = 0.63–0.82) and Maziade et al. (1992; κ = 0.76–0.88). This finding is encouraging given Integrative Consensus reflected a more complicated approach to clinical decision making in which data were synthesized from multiple sources and methods rather than the norm of attaining agreement across one measure (Angold & Costello, 2000). In this study, Integrative Consensus helped clinicians identify “gray area” children in which data from the comprehensive assessment did not easily converge and could explain why Integrative Consensus identified more children than parent or teacher report. This difference in rates of identification may in part highlight that context does matter. The likelihood of impairment over time in children rated by Integrative Consensus was substantially higher as well, suggesting that this method may have promise in identifying children earlier who could benefit from intervention.

Our findings also reflect what is known about the conditions under which professionals conducting assessments can make reliable judgments, namely, when systematic procedures for quantifying inferences and observations are used (Ægisdóttir et al., 2006; Spengler et al., 2009; Westen & Weinberger, 2005), when guidelines for integrating and combining assessment data are described a priori (Klein, Ouimette, Kelly, Ferro, & Riso, 1994), and when professionals rely on sound psychometric instruments to make clinical and educational decisions rather than making global judgments of functioning (Bierman, Nix, Maples, & Murphy, 2006; Westen & Weinberger, 2005). Lord et al. (2006), in a study of the longitudinal stability of autism, argue that the use of standardized instruments to diagnose autism improved the stability of diagnoses directly via the use of clear-cut diagnostic algorithms and indirectly by organizing and contextualizing clinical judgment.

Second, results from this study highlighted that few child developmental factors or child and family contextual factors taken into account during Integrative Consensus predicted agreement between pairs of raters and methods. Previous investigations have found either inconsistent or null findings when informant discrepancies are examined in the context of child characteristics, such as age, developmental level, and gender (Achenbach, McConaughy, & Howell, 1987; De Los Reyes & Kazdin, 2005; Kolko & Kazdin, 1993). The relationship between maternal characteristics and family stress has also been mixed, with some studies linking elevated maternal depression and stress to parent-reported behavior problems that are not confirmed by other sources (Briggs-Gowan et al., 1996; Youngstrom, Loeber, & Stouthamer-Loeber, 2000). In this study, children rated as disruptive via Integrative Consensus and mother and/or teacher were discriminated by teacher-reported social skills, and disruptive behavior was not entirely unexpected. Further investigation, however, is needed to clarify the meaning of those patterns and to elucidate whether discrepancies that emerged in the context of teacher report reflected rater bias (i.e., teachers as source) and method bias (i.e., rating scale vs. interview vs. observation) versus situational specificity (i.e., more disruptive behaviors evidenced at school than other contexts).

Third, these results highlight that children rated as disruptive through Integrative Consensus (vs. parent interview or teacher rating scale) were more likely to be clinically impaired at 1-year follow-up, including being prescribed medications, receiving mental health services, and having elevated maternal C-GAS ratings. Findings from logistic regressions suggest that Integrative Consensus ratings reduced errors related to overidentification (e.g., correctly identifying children who continue experiencing clinically impairing symptoms at follow-up) and underidentification (e.g., incorrectly missing children who continue experiencing clinically impairing symptoms at follow-up). Integrative Consensus identified those children who continued to struggle clinically over time, which is important given how difficult it is to predict impairment prospectively (Hardt & Rutter, 2004). In general, incremental validity analyses suggested that Integrative Consensus may have “value added” with respect to clinical prediction. Research diagnosing autism spectrum disorders similarly found that clinical consensus procedures enhanced predictive validity over and above standardized instruments alone, with the odds of predicting diagnosis at 9 years of age nearly two times greater than the odds for either observation or interview alone (Lord et al., 2006).

Back to Top | Article Outline

Although these findings may generalize to low-income, young children with behavior problems, a group consistently underrepresented in the literature, it is unclear whether our findings generalize to other populations or developmental periods. Therefore, replication with a larger and more diverse sample is needed. In addition, although standardized diagnostic observations were incorporated into Integrative Consensus, clinicians had no direct interaction with child participants, and it is unclear whether the clinician's experience of and with the child could have enhanced clinical decision making. Furthermore, although this study provided some preliminary evidence that Integrative Consensus demonstrated incremental value in identifying disruptive behaviors in young children and predicting impairment over time, questions remain regarding the generalizability of the method when clinical decision making is not limited to binary choices regarding a single diagnostic group. Application of Integrative Consensus to only oppositional defiant disorder and conduct disorder not only narrowed the scope of our work but also may have underestimated the potential for disagreement when considering other clinical problems. Finally, although families were not provided with results from the Integrative Consensus review, for the subset of the sample (N = 119) referred with behavioral concerns, 42% (n = 49) were identified as not disruptive by Integrative Consensus compared with 59% (n = 70) identified as disruptive. If those families were expecting a diagnosis, the diagnosis itself may have become a self-fulfilling prophecy and subsequently increased the odds of a family seeking services.

Back to Top | Article Outline
Implications for practice

The Integrative Consensus framework included examining the multiple facets of the child's behavior within a developmental context and relied on clinical judgment guided by a priori principles for distinguishing the nature and boundaries of children's behavior and functional competencies. Via the Integrative Consensus process, clinicians integrated data gathered from significant adults along with contextual factors such as recent life events and stressors—a process that has application to everyday practice by making explicit what clinicians already do implicitly and by incorporating the meaningfulness and informativeness of disagreements that naturally emerge via collaborative, comprehensive assessments. No gold standard exists for determining when to worry about young children exhibiting behavioral problems that are concerning to key caregivers, and the field is divided regarding how to balance the risk of premature labeling with the cost of failing to intervene early. However, collaborative assessments that incorporate contextual factors in a systematic manner would increase reliable and valid diagnoses. Integrative Consensus also has implications for training students how to weigh different sources of information and resolve discrepancies that naturally emerge during a multidisciplinary assessment. This framework also has application to intervention research in which a diagnosis is often made on the basis of one instrument to help better bridge the realm of clinical practice and empirically validated treatments. Increased accuracy of diagnoses at young ages would also help us better target intervention and services to those children in need.

Back to Top | Article Outline


The Integrative Consensus framework was applied by clinicians with a sample of young children with behavior problems and has relevance to the wide array of professionals (e.g., speech–language therapists, occupational therapists, social workers, inclusion support specialists, early interventionists) who gather information from key adults as they conduct early childhood assessments. These professionals are faced with similar challenges in determining how to systematically consolidate and integrate complex assessment data to arrive at reliable and valid conclusions regarding those children who require services. These comprehensive assessments have implications not only for who is identified but also which problems are treated and monitored over time, particularly given the critical role that context plays in identifying functional strengths and domains for further support (De Los Reyes & Kazdin, 2005; Macy, 2012).

The Integrative Consensus framework more closely reflects real-world, clinical decision making that inherently assumes a certain level of clinical sophistication and judgment on the basis of extensive knowledge of child development, observation of a range of children with and without clinical problems, and access to multiple sources of information, including contextual data. Clinical judgment is often portrayed as an unsystematic, unreliable endeavor in which clinicians exercise “broad clinical judgment” regarding methods for gathering and weighing evidence (Piacentini et al., 1992). In this study, we found that clinical judgment that is guided by well-articulated principles for distinguishing normative from problematic behavior and that encourages examining multiple facets of the child's behavior and context within a developmental framework holds promise as a systematic method for clinicians to collectively integrate different sources of data about a child's behavior into a gestalt in which the whole is greater than the sum of its parts.

Back to Top | Article Outline


Achenbach T. M., McConaughy S. H., Howell C. T. (1987). Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin, 101, 213–232. doi:10.1037/0033-2909.101.2.213
Ægisdóttir S., White M. J., Spengler P. M., Maugherman A. S., Anderson L. A., Cook R. S., Rush J. D. (2006). The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The Counseling Psychologist, 34, 341–382. doi:10.1177/0011000005285875
American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). Washington, DC: Author. doi:10.1176/appi.books.9780890423349
Angold A., Costello J. E. (2000). The Child and Adolescent Psychiatric Assessment (CAPA). Journal of the American Academy of Child and Adolescent Psychiatry, 39, 39–48.
Atkins M. S., Hoagwood K. E., Kutash K., Seidman E. (2010). Toward the integration of education and mental health in schools. Administration and Policy in Mental Health, 37, 40–47. doi:10.1007/s10488-010-0299-7
Barnard K. E. (1989). Difficult life circumstances (DLC). Seattle, WA: NCAST Publications.
Beck A. T., Steer R. A. (1993). Manual for the Beck Depression Inventory. San Antonio, TX: Psychological Corporation.
Bierman K. L., Nix R. L., Maples J. J., Murphy S. A. (2006). Examining clinical judgment in an adaptive intervention design: The Fast Track Program. Journal of Consulting and Clinical Psychology, 74, 468–481. doi:10.1037/0022-006X.74.3.468
Bird H. R., Gould M. S., Staghezza B. (1992). Aggregating data from multiple informants in child psychiatry epidemiological research. Journal of the American Academy of Child and Adolescent Psychiatry, 31, 78–85. doi:10.1097/00004583-199201000-00012
Briggs-Gowan M., Carter A., Schwab-Stone M. (1996). Discrepancies among mother, child, and teacher reports: Examining the contributions of maternal depression and anxiety. Journal of Abnormal Child Psychology, 24, 749–765. doi:10.1007/BF01664738
Briggs-Gowan M. J., Carter A. S., Skuban E. M., Horwitz S. M. (2001). Prevalence of social-emotional and behavioral problems in a community sample of 1- and 2-year-old children. Journal of the American Academy of Child and Adolescent Psychiatry, 40, 811–819.
Bronfenbrenner U., Morris P. A. (2006). The bioecological model of human development. In: Lerner R. (Ed.), Handbook of child psychology: Vol. 1. Theoretical models of human development (6th ed., pp. 793–828). Hoboken, NJ: Wiley.
Bruder M. B. (2000). Family-centered early intervention: Clarifying our values for the new millennium. Topics in Early Childhood Special Education, 20, 105–115.
Campbell S. B., Ewing L. J. (1990), Follow-up of hard-to-manage preschoolers: Adjustment at age 9 and predictors of continuing symptoms. Journal of Child Psychology and Psychiatry, 31, 871–889. doi:10.1111/j.1469-7610.1990.tb00831
Cicchetti D. V., Sparrow S. A. (1981). Developing criteria for establishing interrater reliability of specific items: Applications to assessment of adaptive behavior. American Journal of Mental Deficiency, 86, 127–137.
Cohen J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.
De Los Reyes A., Henry D. B., Tolan P. H., Wakschlag L. S. (2009). Linking informant discrepancies to observed variations in young children's disruptive behavior. Journal of Abnormal Child Psychology, 37, 637–652. doi:10.1007/s10802-009-9307-3
De Los Reyes A., Kazdin A. E. (2005). Informant discrepancies in the assessment of childhood psychopathology: A critical review, theoretical framework, and recommendations for further study. Psychological Bulletin, 131, 483–509.
Dirks M. A., De Los Reyes A., Briggs-Gowan M., Cella D., Wakschlag L. S. (2012). Embracing not erasing contextual variability in children's behavior: Theory and utility in the selection and use of methods and informants in developmental psychopathology. Journal of Child Psychology and Psychiatry, 53, 558–574. doi:10.1111/j.1469-7610.2012.02537.x
Durston S. (2003). A review of the biological bases of ADHD: What have we learned from imaging studies? Mental Retardation and Developmental Disabilities Research Reviews, 9, 184–195. doi:10.1002/mrdd.10079
Elliott C. (1983). Differential abilities scales: Introductory and technical handbook. New York, NY: Psychological Corporation.
Foley D. L., Rutter M., Angold A., Pickles A., Maes H. M., Silberg J. L., Eaves L. J. (2005). Making sense of informant disagreement for overanxious disorder. Journal of Anxiety Disorders, 19, 193–210. Retrieved from
Gadow K. D., Sprafkin J., Nolan E. E. (2001). DSM-IV symptoms in community and clinic preschool children. Journal of the American Academy of Child and Adolescent Psychiatry, 40, 1383–1392. doi:10.1097/00004583-200112000-00008
Gray S. A. O., Carter A. S., Briggs-Gowan M. J., Hill C., Danis B., Keenan K., Wakschlag L. S. (2012). Preschool children's observed disruptive behavior: Variations across sex, interactional context, and disruptive psychopathology, Journal of Clinical Child and Adolescent Psychology, 41, 499–507. doi:10.1080/15374416.2012.675570
Gresham F. M., Elliott S. N. (1990). Social skills rating system manual. Circle Pines, MN: American Guidance Service.
Guralnick M. J. (2011). Why early intervention works: A systems perspective. Infants and Young Children, 24, 6–28. doi:10.1097/IYC.0b013e3182002cfe
Hanson M. J., Miller A. D., Diamond K., Odom S., Lieber J., Butera G., Fleming K. (2011). Neighborhood community risk influences on preschool children's development and school readiness. Infants and Young Children, 24, 87–100. doi:10.1097/IYC.0b013e3182008dd0
Hardt J., Rutter M. (2004). Validity of adult retrospective reports of adverse childhood experiences: Review of the evidence. Journal of Child Psychology and Psychiatry and Allied Disciplines, 45, 260–273. doi:10.1111/j.1469-7610.2004.00218.x
Hill C., Maskowitz K., Danis B., Wakschlag L. S. (2008). Validation of a clinically sensitive, observational coding system for parenting behaviors: The Parenting Clinical Observation Schedule. Parenting: Science and Practice, 8, 153–185. doi:10.1080/15295190802045469
Individuals With Disabilities Education Improvement Act of 2004. (2003). In: Retrieved January 20, 2013, from
Jones Harden B., Winslow M., Kendziora K., Shahinfar A., Rubin K., Fox N., Zahn-Waxler C. (2000). Externalizing problems in Head Start children: An ecological exploration. Early Education and Development, 11, 357–385.
Keenan K., Wakschlag L. (2000). More than the terrible twos: The nature and severity of behavior problems in clinic-referred preschool children. Journal of Abnormal Child Psychology, 28, 33–46.
Keenan K., Wakschlag L. S., Danis B., Hill C., Humphries M., Duax J., Donald R. (2007). Further evidence of the reliability and validity of DSM-IV ODD and CD in preschool children. Journal of the American Academy of Child and Adolescent Psychiatry, 46, 457–468. doi:10.1097/CHI.0b013e31803062d3
Kettler R. J., Feeney-Kettler K. A. (2011). Screening systems and decision-making at the preschool level: Application of a comprehensive validity framework. Psychology in the Schools, 48, 430–441.
Klein D. N., Ouimette P. C., Kelly H. S., Ferro T., Riso L. P. (1994). Test-retest reliability of team consensus best-estimate diagnoses of axis I and II disorders in a family study. American Journal of Psychiatry, 51, 1043–1047.
Kolko D. J., Kazdin A. E. (1993). Emotional/behavioral problems in clinic and nonclinic children: Correspondence among child, parent, and teacher reports. Journal of Child Psychology and Psychiatry, 34, 991–1006. doi:10.1111/j.1469-7610.1993.tb01103.x
Kraemer H. C., Measelle J. R., Ablow J. C., Essex M. J., Boyce W. T., Kupfer D. J. (2003). A new approach to integrating data from multiple informants in psychiatric assessment and research: Mixing and matching contexts and perspectives. American Journal of Psychiatry, 160, 1566–1577. doi:10.1176/appi.ajp.160.9.1566
Lord C., Risi S., DiLavore P. S., Shulman C., Thurm A., Pickles A. (2006). Autism from 2 to 9 years of age. Archives of General Psychiatry, 63, 694–701. doi:10.1001/archpsyc.63.6.694
Macy M. (2012). The evidence behind developmental screening instruments. Infants and Young Children, 25, 19–61. doi:10.1097/IYC.0b013e31823d37dd
Maziade M., Roy M. A., Fournier J. P., Cliche D., Merette C., Caron C. (1992). Reliability of best-estimate diagnosis in genetic linkage studies of major psychoses: Results from the Quebec pedigree studies. American Journal of Psychiatry, 149, 1674–1686.
McClellan J., Speltz M. (2003). Psychiatric diagnosis in preschool children. Journal of the American Academy Child and Adolescent Psychiatry, 42, 127–128. doi:10.1097/00004583-200302000-00002
National Research Council. (2008). Early childhood assessment: Why, what, and how. Committee on Developmental Outcomes and Assessments for Young Children, Board on Children, Youth, and Families, Board on Testing and Assessment, Division of Behavioral and Social Sciences and Education. Washington, DC: National Academies Press. Retrieved from
Neisworth J. T., Bagnato S. J. (2004). The mismeasure of young children: The authentic assessment alternative. Infants and Young Children, 17, 198–212.
No Child Left Behind Act of 2001. (2001). In: Retrieved January 20, 2013, from
Oliver R. M., Reschly D. J. (2010). Special education teacher preparation in classroom management: Implications for students with emotional and behavioral disorders. Behavioral Disorders, 35, 188–199.
Piacentini J. C., Cohen P., Cohen C. (1992). Combining discrepant diagnostic information from multiple sources: Are complex algorithms better than simple ones? Journal of Abnormal Child Psychology, 20, 51–63.
Richardson M., Henry J., Black-Pond C., Sloane M. (2008). Multiple types of maltreatment: Behavioral and developmental impact on children in the child welfare system. Journal of Child & Adolescent Trauma, 1, 1–14.
Setterberg S., Bird H., Gould M., Shaffer D., Fisher P. (1992). Parent and interviewer versions of the Children's Global Assessment Scale. New York, NY: Columbia University.
Shaffer D., Gould M., Brasic J., Ambrosini P., Fisher P., Bird H., Aluwahlia S. (1983). A children's global assessment scale (C-GAS). Archives of General Psychiatry, 40, 1228–1231.
Sheeber L. B., Johnson J. H. (1992). Applicability of the impact on family scale for assessing families with behaviorally difficult children. Psychological Reports, 71, 155–159. doi:10.2466/pr0.1992.71.1.155
Shernoff E. S., Kratochwill T. R. (2007). Transporting an evidence-based classroom management program for preschoolers with disruptive behavior problems to a school: An analysis of implementation, outcomes, and contextual variables. School Psychology Quarterly, 22, 449–472.
Shernoff E. S., Mehta T., Atkins M., Torf R., Spencer J. (2011). A qualitative study of the sources and impact of stress among urban teachers. New York, NY: Springer. doi:10.1007/s12310-011-9051-z
Shernoff E. S., Marinez-Lora A., Frazier S. L, Jakobsons L. J., Atkins M. S., Bonner D. (2011). Teachers supporting teachers in urban schools: What iterative research designs can teach us. School Psychology Review, 40, 465–485.
Speltz M., McMellan J., DeKlyen M., Jones K. (1999). Preschool boys with oppositional defiant disorder: Clinical presentation and diagnostic change. Journal of the American Academy of Child and Adolescent Psychiatry, 38, 838–845. doi:10.1097/00004583-199907000-00013
Spengler P. M., White M. J., Ægisdóttir S., Maugherman A. S., Anderson L. A., Cook R., Rush J. D. (2009). The meta-analysis of clinical judgment project: Effects of experience on judgment accuracy. The Counseling Psychologist, 37, 350–399.
Spinazzola J., Ford J. D., Zucker M., van der Kolk B. A., Silva S., Smith S. F., Blaustein M. (2005). Survey evaluates complex trauma exposure, outcome, and intervention among children and adolescents. Psychiatric Annals, 35(5), 433–439.
Straus M. A. (1979). Measuring intrafamily conflict and violence: The Conflict Tactics (CT) Scales. Journal of Marriage & Family, 41, 75–88. doi:10.2307/351733
Strober M., Green J., Carlson G. (1981). Reliability of psychiatric diagnosis in hospitalized adolescents: Interrater agreement using DSM-III. Archives of General Psychiatry, 38, 141–145. doi:10.1001/archpsyc.1981.01780270027002
Tolan P. H., Henry D. (1996). Patterns of psychopathology among urban poor children: Comorbidity and aggression effects. Journal of Consulting and Clinical Psychology, 64, 1094–1099.
Voight R. G., Liorente A. M., Jensen C. L., Fraley J. K., Barbaresi W. J., Heird W. C. (2007). Comparison of the validity of direct pediatric developmental evaluation versus developmental screening by parent report. Clinical Pediatrics, 46, 523–529. doi:10.1177/0009922806299100
Wakschlag L., Danis B. (2004). Assessment of disruptive behavior in young children: A clinical-developmental framework. In: DelCarmen-Wiggins R., Carter A. S. A. S. (Eds.), Handbook of infant, toddler and preschool mental health assessment (pp. 421–440). NY: Oxford University Press.
Wakschlag L., Shernoff E. S., Danis B., Hill C., Stein J., Leventhal B. (2005). Integrative Consensus Procedures Manual of the Observing Young Children and Families (DB-DOS) Study. Unpublished manuscript, Institute for Juvenile Research, University of Illinois at Chicago.
Wakschlag L., Briggs-Gowan M, Carter A., Hill C., Danis B., Keenan K., Leventhal B. (2007). A developmental framework for distinguishing disruptive behavior from normative misbehavior in preschool children. Journal of Child Psychology and Psychiatry and Allied Disciplines, 48, 976–987. doi: 10.1111/j.1469-7610.2007.01786.x
Wakschlag L. S., Briggs-Gowan M. J., Hill C., Danis B., Leventhal B. L., Keenan K., Carter A. S. (2008a). Observational assessment of preschool disruptive behavior, part II: validity of the Disruptive Behavior Diagnostic Observation Schedule (DB-DOS). Journal of the American Academy of Child & Adolescent Psychiatry, 47(6), 632–641.
Wakschlag L. S., Hill C., Carter A., Danis B., Egger H., Keenan K., Briggs-Gowan M. (2008b). Observational assessment of preschool disruptive behavior: Part I: Reliability of the Disruptive Behavior Diagnostic Observation Schedule (DB-DOS). Journal of the American Academy of Child and Adolescent Psychiatry, 47(6) 622–631.
Wakschlag L., Danis B. (2009). Characterizing early childhood disruptive behavior: Enhancing developmental sensitivity. In: Zeanah C. H. (Ed.), Handbook of infant mental health (3rd ed., pp. 392–408). New York, NY: Guilford.
Wakschlag L. S., Keenan K. (2001). Clinical significance and correlates of disruptive behavior symptoms in environmentally at-risk preschoolers. Journal of Clinical Child Psychology, 30, 262–275. doi:10.1207/S15374424JCCP3002_13
Webster-Stratton C., Reid J., Hammond M. (2001). Preventing conduct problems, promoting social competence: A parent and teacher training partnership in Head Start. Journal of Clinical Child Psychology, 30, 283–302.
Webster-Stratton C., Reid M. J., Hammond M. (2004). Treating children with early-onset conduct problems: Intervention outcomes for parent, child, and teacher training. Journal of Clinical Child and Adolescent Psychology, 33, 105–124.
Westen D., Weinberger J. (2004). When clinical description becomes statistical prediction. American Psychologist, 59, 595–614. doi:10.1037/0003-066X.59.7.595
Westen D., Weinberger J. (2005). In praise of clinical judgment: Meehl's forgotten legacy. Journal of Clinical Psychology, 61, 1257–1276. doi:10.1002/jclp.20181
Yoshikawa H., Zigler E. (2000). Mental health and Head Start: New directions for the twenty-first century. Early Education and Development, 11, 247–264.
Youngstrom E., Loeber R., Stouthamer-Loeber M. (2000). Patterns and correlates of agreement between parent, teacher, and male adolescent ratings of externalizing and internalizing problems. Journal of Consulting and Clinical Psychology, 68, 1038–1050.

behavior problems; clinical judgment; comprehensive assessments; multicontext assessments

© 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins.