Secondary Logo

Journal Logo


Assessing Residents’ Written Learning Goals and Goal Writing Skill

Validity Evidence for the Learning Goal Scoring Rubric

Lockspeiser, Tai M., MD; Schmitter, Patricia A., MEd; Lane, J. Lindsey, BM BCh; Hanson, Janice L., PhD, EdS; Rosenberg, Adam A., MD; Park, Yoon Soo, PhD

Author Information
doi: 10.1097/ACM.0b013e3182a352e6
  • Free


Self-directed lifelong learning (SDLL) is a key aspect of medical professionalism1 and an important skill set for physicians to develop in order to maintain proficiency. It is now included in the Accreditation Council for Graduate Medical Education’s requirements for pediatric residency training programs2 in the form of individualized learning plans (ILPs), as well as in many specialty boards’ requirements for maintenance of certification.3 Despite recognition of the importance of SDLL, physicians do not always effectively incorporate learning plans and goals into their daily work. Research has shown that both lack of time and understanding of the needed skills act as barriers to SDLL and use of ILPs.4–7 Residents, in particular, seem to struggle with identifying specific learning goals and formulating effective plans to achieve them.5

The forethought phase of self-regulated learning theory provides a conceptual framework for the use of learning goals.8 Within this theory, task analysis, which includes goal setting and strategic planning, is central to effective learning.8 Dedicating time and effort to goal setting and planning has been found to have a positive impact on learning across a broad range of activities outside medical education.8–10

Yet, valid and reliable tools to assess SDLL skills have not been reported in the medical education literature. Instead, SDLL skills are generally assessed by either self-report or attitudinal surveys. The most commonly used attitudinal assessment is the Jefferson Scale of Physician Lifelong Learning (JeffSPLL),11 a 14-item instrument designed to assess propensity for lifelong learning in physicians. It has been used with practicing full-time clinicians,12 academic clinicians,13 residents,14 and medical students.15

To address the need for valid, reliable tools to assess residents’ actual SDLL skills, we decided to develop a rubric for scoring written learning goals, an integral part of self-regulated learning in medical education. Our goal was to create a rubric that could be used both for summative evaluation of this aspect of residents’ SDLL skills and to provide feedback to residents as they learn the process of writing and using learning goals.

In this article, we report the results of a two-part study designed to gather validity evidence16,17 for the Learning Goal Scoring Rubric. Our goal was to determine whether this rubric could be a valid and reliable means of assessing the quality of written learning goals and a learner’s ability to write learning goals.


Prior to conducting this study, we developed the Learning Goal Scoring Rubric; the development process has been reported previously.18 Briefly, we based the rubric on the ISMART mnemonic, which is commonly used in management, education, and medical education to help in the forethought phase of learning.5,19,20 This mnemonic identifies the five aspects of a learning goal necessary for success: addresses important topics, is specific, includes measurable or clearly describable outcomes, has a mechanism of accountability, is realistic, and has a timeline for accomplishment.19 Although our rubric initially included criteria for each letter of the mnemonic, it was iteratively revised through discussion and application to example goals. The final Learning Goal Scoring Rubric has four criteria (see Chart 1). We previously reported some limited preliminary evidence for the validity of the response process.18

Chart 1 Learning Goal Scoring Rubric*

Data sources

Data for this two-part study were obtained from two sources: (1) a questionnaire completed by third-year pediatric residents and (2) learning goals created by third-year pediatric residents to guide their individualized learning experiences (ILEs).21 This study was approved by the Colorado Multiple Institutional Review Board.

Study 1: Questionnaire with goal writing prompt.

Over the course of two academic years (2010–2011 and 2011–2012), all pediatric residents ending their third-year of residency at the University of Colorado were asked to complete a questionnaire consisting of the JeffSPLL and a learning goal writing exercise, which used the following prompt:

You have been required to complete an ILP in every year of residency. After reflecting on your current learning needs, please complete the four sections of the learning goal. Although you may have multiple goals that you hope to work on in the near future, please pick only one for this exercise. 1. Goal: Please give a brief description of one learning goal that you currently have. 2. How I know I need to do this: Describe what evidence you have that suggests that this is an area you need to improve. 3. Plan for achieving this goal: Describe the strategies/activities by which you will achieve your goal. 4. Outcome: Describe how you will know that you have met this goal.

All learning goals were deidentified by one of the investigators (T.M.L.) and independently scored by all five investigators using the Learning Goal Scoring Rubric. All investigators were involved in the creation of the scoring rubric and shared an understanding of how to apply it that came from repeated scoring and discussion of example learning goals prior to the scoring of any goals for this study. Approximately half of the residents who completed the questionnaire had also elected to participate in an ILE that included a multipart training on writing and using learning goals during their third year.21 These residents completed the workshop described below.

Study 2: Written goals from ILEs.

Residents who elected to participate in an ILE were required to attend an interactive workshop on learning goals. During this session, residents wrote at least one goal and received feedback from their peers on how to improve it. They were given a worksheet that framed the different parts of goal writing; they used this worksheet to develop the two goals they were required to write each month to shape their learning in clinical settings for the four months of their ILEs. Over the course of their ILEs, residents received oral and written feedback about their goals from one of the investigators (T.M.L.) based on the scoring rubric. For this study, we randomly selected four learning goals written by each resident who participated in an ILE during 2011–2012. These were deidentified by one of the investigators (T.M.L.). Using the Learning Goal Scoring Rubric, all five investigators independently scored a subset of 24 of these goals; the remaining 24 were scored independently by two of the investigators (T.M.L., P.A.S.).

Validity evidence

We conceptualized validity as a hypothesis to be tested on the basis of evidence accumulated from five different sources: content, response process, internal structure, relationship to other variables, and consequences of testing.16,17


Prior to creating the rubric,18 we conducted a detailed literature search. The primary evidence for content validity came from all five investigators’ comparison of the rubric with the results of our literature review and with the materials used to teach residents about learning goals.

Response process.

We used intraclass correlation coefficient (ICC) to measure rater reliability22 using the data from study 1 and the subset of the data from study 2 that were scored by all investigators (74 total goals). In addition, we assessed the percentage of variance accounted for by the raters using a generalizability study (G study).

Internal structure.

We used generalizability theory to analyze the reliability of scores on the rubric.23,24 We conducted two separate fully crossed G studies using the questionnaire data from study 1 and the ILE data from study 2. Both analyses had the following design variables: goal (G), rater (R), and scoring rubric criterion (C; i.e., specific goal, important goal, realistic multisource plan, or measurable outcome) as well as interaction terms. We added another facet, person (P), in the analysis of the goals from the ILE. All facets were assumed to be random samples from a population. The object of measurement for study 1 was goal (G), whereas it was person (P) in study 2. By selecting different objects of measurement for the two studies, we could determine the rubric’s reliability in assessing the quality of goals as well as a person’s (i.e., a resident’s) ability to write goals. As described above, five raters scored all goals for study 1, whereas two raters scored the goals for study 2. The phi coefficient was chosen over the G coefficient for all G studies given that the rubric represents a criterion-based measure. We also conducted decision studies (D studies) to project improvements in reliability for varying the number of raters or scoring rubric criteria and to determine the number of goals that would need to be scored to assess a resident’s goal writing skill.

Relationship to other variables.

We assessed the relationship to other variables by correlating residents’ scores on the scoring rubric with their scores on the JeffSPLL using only the data from study 1. The JeffSPLL includes 14 questions rated on a four-point Likert-type scale; possible scores range from 14 to 56. Spearman rho was used for this analysis. We hypothesized that scores on the JeffSPLL would not correlate with scores on the Learning Goal Scoring Rubric because the two tools assess related but independent concepts—the JeffSPLL measures attitudes, whereas our scoring rubric assesses skills. In addition, we assessed the relationship to training in the use of learning goals by comparing learning goal scores from the group of residents in study 1 who had received training on writing effective learning goals with the scores of those who had not received training. We hypothesized that residents with training would score higher than residents without training. This comparison was conducted with an independent-samples Mann–Whitney U test. Nonparametric methods were used in the analysis to avoid any distributional assumptions about the data.

Consequences of testing.

We did not seek validity evidence from this source because this study was exploratory in nature and there were no clear consequences to the assessment.


In study 1, we analyzed 48 third-year residents’ questionnaires, each of which included a written learning goal in response to the prompt. In study 2, we analyzed 48 written learning goals completed by 12 different residents as part of their ILEs. Table 1 provides the mean scores on the criteria of the Learning Goal Scoring Rubric for each study.

Table 1
Table 1:
Mean Scores for Residents’ Written Learning Goals Assessed Using the Learning Goal Scoring Rubric, by Study


Following our review of the literature used in the creation of the scoring rubric and the materials used to teach residents about goals, we concluded that the rubric’s four criteria match the prompts on the worksheet given to residents for creation of their ILE learning goals and also match the four key questions identified by Challis25 for successful personal learning planning. This alignment of the assessment with both the literature and educational methods provides evidence of content validity.

Response process

ICCs for all four criteria as well as the overall rubric indicated good agreement among the raters (Table 2). Raters accounted for only a small proportion of the variance in G studies (Tables 3 and 4).

Table 2
Table 2:
Intraclass Correlation Coefficients (ICCs) for Total Score and the Four Criteria of the Learning Goal Scoring Rubric*
Table 3
Table 3:
Written Learning Goal as Object of Measurement: Study 1 Generalizability Study Results*
Table 4
Table 4:
Person (i.e., Resident’s Goal Writing Skill) as Object of Measurement: Study 2 Generalizability Study Results*

Internal structure

Study 1: Goals as the object of measurement.

The overall phi coefficient from the G study based on the questionnaire data was 0.867 with five raters independently rating each goal on four different criteria (Table 3). The results of the G study revealed that the largest variance component was the goal. Raters accounted for only about 5% of the total variance. The second largest source of variance came from the interaction of goal and criterion; that is, certain goals were scored higher on one criterion as compared with another.

The D-study revealed that sufficient reliability (phi coefficient > 0.8) could be achieved using the scoring rubric’s four criteria with only two raters. The number of criteria could be reduced as well and still provide reliable results.

Study 2: Person (resident) as the object of measurement.

In this study, two raters scored 48 goals to assess the goal writing skill of 12 individual residents rather than the quality of the goals themselves. The results of the G study are shown in Table 4. The overall phi coefficient was 0.751, suggesting that the rubric is fairly reliable in distinguishing between people with different goal writing skill levels. The variance due to person (resident) was 23.35%, which accounts for a large portion of the variance. There was a very low variance component from both raters and criteria. A large proportion of the residual error variance came from interactions between facets.

The D study revealed that an assessment of a resident’s goal writing skill could be achieved with sufficient reliability (phi coefficient > 0.8) if five goals were scored by three different raters or 10 goals were scored by two raters.

Relationship to other variables

The correlation between residents’ scores on the JeffSPLL (Cronbach alpha = 0.841) and their scores on the Learning Goal Scoring Rubric was not significant (r = −0.051, P = .733). In addition, there was no significant correlation between any of the subscales of the JeffSPLL and the scoring rubric. Using an independent-samples Mann–Whitney U test, we found a relationship to training: Residents who had received training in writing learning goals had significantly higher total scores on the rubric than did residents without training (7.54 versus 4.98, P < .001). Within these two subgroups, scores on the JeffSPLL were still not correlated with those on the rubric (no training: r = 0.029, P = .891; training: r = −0.393, P = .065).


This study provides preliminary evidence of validity for the Learning Goal Scoring Rubric from four sources: content, response process, internal structure, and relationship to other variables. The content matched the literature and the materials used to teach residents about goals.18 The ICCs for all four criteria and the entire rubric were quite high, suggesting good rater agreement. In study 1, the results of the G study suggest that scores given to goals using the scoring rubric are reliable. In study 2, the G study provides evidence that the scoring rubric can be used to assess a resident’s skill at writing learning goals.

The G study in study 1 focused on the reliability of scores on individual goals, and it had a high overall phi coefficient. In addition, the high goal variance (47.05%) provides evidence that the rubric is able to discriminate quality of different goals even after taking into account variance from raters, criteria, and other interactions. The relatively low rater variance (5.15%) and the high ICCs suggest that systematic rater error did not have a significant effect on the scores. Finally, the high variance (17.25%) from the interaction effect of goal and criterion suggests appropriately that certain residents are better able to write particular aspects of goals. This fits with what we have seen empirically: One resident may report that the most difficult aspect of writing a goal is making it specific, whereas another may report it is thinking of an outcome measure or appropriate plan. This interaction provides further evidence that this instrument’s four assessment criteria are relatively distinct.

Study 1’s D study, which focused on the reliability of scores on individual goals, suggests that the rubric can yield reliable results even when used by fewer raters. To keep the reliability above 0.8, as few as two raters are needed. This makes the tool feasible to use. However, the importance of rater training needs to be considered: All five raters who participated in this study had significant past experience with the rubric. They had participated in its development and had previously applied it to multiple goals to gain a shared understanding of each criterion. Before other raters use the rubric in other settings, they should be trained using sample learning goals and discuss their ratings. We have created a user’s guide with training materials, including example goals for practice scoring, to aid in this process.18

In study 2, the G study and D study provide initial evidence that the scoring rubric can be used as a means of assessing a resident’s skill in writing learning goals, one component of SDLL. The overall phi coefficient for this G study was lower than that for study 1, yet still fairly high. There was a significant proportion of the variance accounted for by the person/resident (23.35%), which was the goal of the study. However, there were larger variance components from the interaction terms, suggesting that there may be other factors not fully accounted for by this model that affect assessment of a resident’s goal writing skill (e.g., the topic of the goal or when the resident wrote the goal). Additional, larger studies with more heterogeneous samples may help to clarify these additional factors. Despite these limitations, the D study suggests that a reliable assessment of a resident’s skill at goal writing can be made by two raters scoring 10 goals or by three raters scoring 5 goals.

There are several limitations to this study. First, we used data from only third-year residents at a single residency program, so all residents had similar expectations for writing learning goals. Similarly, the sample size used for both G studies was small. In addition, all raters had participated in the development of the rubric and had a strong shared understanding of how to apply it. Additional validity evidence is required to ensure that this tool can be used across a wider spectrum of educational contexts. Further validation of the rubric will entail obtaining validity evidence from goals written by a larger and more heterogeneous sample of learners in multiple contexts and using raters with less intimate knowledge of the rubric who have trained only with the materials available in the rubric user’s guide.18

On the basis of the results of this study, the Learning Goal Scoring Rubric is a useful tool for assessing a learner’s skill in writing goals, one key aspect of SDLL. It is reliable when used by trained raters and has appropriate preliminary validity evidence from four sources. Using the scoring rubric as both an assessment tool and a feedback trigger can help facilitate the forethought phase of self-regulated learning in which goal setting and strategic planning are central. Next steps will involve assessing the impact of the quality of a written goal on learning outcomes.


1. Fallat ME, Glover J. Professionalism in pediatrics: Statement of principles. Pediatrics. 2007;120:895–897
2. Accreditation Council for Graduate Medical Education. . ACGME Program Requirementsfor Graduate Medical Education in Pediatrics. (Effective July 1, 2007.) Accessed June 22, 2013
3. Miller SH. American Board of Medical Specialties and repositioning for excellence in lifelong learning: Maintenance of certification. J Contin Educ Health Prof. 2005;25:151–156
4. Li ST, Favreau MA, West DC. Pediatric resident and faculty attitudes toward self-assessment and self-directed learning: A cross-sectional study. BMC Med Educ. 2009;9:16
5. Li ST, Burke AE. Individualized learning plans: Basics and beyond. Acad Pediatr. 2010;10:289–292
6. Stuart E, Sectish TC, Huffman LC. Are residents ready for self-directed learning? A pilot program of individualized learning plans in continuity clinic. Ambul Pediatr. 2005;5:298–301
7. Nothnagle M, Anandarajah G, Goldman RE, Reis S. Struggling to be self-directed: Residents’ paradoxical beliefs about learning. Acad Med. 2011;86:1539–1544
8. Zimmerman BJ. Becoming a self-regulated learner: An overview. Theory Pract. Spring 2002;41:64–72
9. Cleary TJ, Zimmerman BJ. Self-regulation differences during athletic practice by experts, non-experts, and novices. J Appl Sport Psychol. 2001;13:185–206
10. Bandura A, Schunk DH. Cultivating competence, self-efficacy, and intrinsic interest through proximal self-motivation. J Pers Soc Psychol. 1981;41:586–598
11. Hojat M, Nasca TJ, Erdmann JB, Frisby AJ, Veloski JJ, Gonnella JS. An operational measure of physician lifelong learning: Its development, components and preliminary psychometric data. Med Teach. 2003;25:433–437
12. Hojat M, Veloski J, Nasca TJ, Erdmann JB, Gonnella JS. Assessing physicians’ orientation toward lifelong learning. J Gen Intern Med. 2006;21:931–936
13. Hojat M, Veloski JJ, Gonnella JS. Measurement and correlates of physicians’ lifelong learning. Acad Med. 2009;84:1066–1074
14. Li ST, Tancredi DJ, Co JP, West DC. Factors associated with successful self-directed learning using individualized learning plans during pediatric residency. Acad Pediatr. 2010;10:124–130
15. Wetzel AP, Mazmanian PE, Hojat M, et al. Measuring medical students’ orientation toward lifelong learning: A psychometric evaluation. Acad Med. 2010;85(10 suppl):S41–S44
16. Downing SM. Validity: On meaningful interpretation of assessment data. Med Educ. 2003;37:830–837
17. Downing SM, Yudkowsky R Assessment in Health Professions Education. 2009 New York, NY Routledge
18. Lockspeiser T, Schmitter P, Lane J, Hanson J, Rosenberg A. A validated rubric for scoring learning goals. MedEdPORTAL. March 15, 2013 Accessed June 17, 2013
19. Doran GT. There’s a SMART way to write management’s goals and objectives. Manage Rev. 1981;70:35–36
20. Li ST, Paterniti DA, Co JP, West DC. Successful self-directed lifelong learning in medicine: A conceptual model derived from qualitative analysis of a national survey of pediatric residents. Acad Med. 2010;85:1229–1236
21. Rosenberg AA, Jones MD Jr. A structured career-immersion experience in the third year of residency training. Pediatrics. 2011;127:1–3
22. Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull. 1979;86:420–428
23. Shavelson RJ, Webb NM Generalizability Theory: A Primer. 1991 Newbury Park, Calif SAGE Publications
24. Bloch R, Norman G. Generalizability theory for the perplexed: A practical introduction and guide: AMEE guide no. 68. Med Teach. 2012;34:960–992
25. Challis M. AMEE medical education guide no. 19: Personal learning plans. Med Teach. 2000;22:225–236
© 2013 by the Association of American Medical Colleges