Secondary Logo

Journal Logo

Program Evaluation

Enhancing Evaluation in an Undergraduate Medical Education Program

Gibson, Kathryn A. BMBCh, PhD; Boyle, Patrick MEd; Black, Deborah A. PhD; Cunningham, Margaret MSW; Grimm, Michael C. MBBS, PhD; McNeil, H Patrick MBBS, PhD

Author Information
doi: 10.1097/ACM.0b013e31817eb8ab
  • Free


The implementation of a new undergraduate medical education program at the University of New South Wales (UNSW) in Sydney, Australia, brought many challenges, not the least of which was determining whether the new program was effective. In this article, we describe the development and initial progress in implementing a comprehensive whole-program evaluation and improvement strategy for the new undergraduate medical education program at UNSW. Our literature review found relatively little practical guidance about establishing systems to define and maintain the quality of undergraduate medical education programs that encompass multiple aspects of quality. In parallel with the introduction of an innovative, nontraditional medical education program at UNSW,1 we set out to develop systems and processes to evaluate the program’s effectiveness and monitor its implementation to enable continuous improvement. In this article, we discuss the background, context, and drivers that led us to develop a whole-program evaluation framework; we describe the process of developing a multicomponent model that defines the quality of the educational program; we outline how we are addressing evaluation of each quality component; and, finally, we describe initial examples of how this strategy is leading to improvement in the quality of the program. We believe our approach and experience could be useful for other schools undertaking similar development projects.

Background: Development of a Whole-Program Evaluation Framework

A new Medicine program

In March 2004, the Faculty of Medicine at UNSW implemented an innovative six-year, three-phase undergraduate Medicine program.1 Compared with the previous content-based Flexnerian-style curriculum, the new program is explicitly outcome based, requiring students to demonstrate specified levels of performance in a range of medicine-specific capabilities (biomedical science, social aspects of health, clinical performance, and ethics) and generic capabilities (critical evaluation, reflection, communication, and teamwork) at defined levels as they progress through the program. All courses are interdisciplinary and highly integrated both horizontally and vertically. Important features include early clinical experience, small-group teaching, flexibility in courses and assessments, and a high degree of alignment between graduate outcomes, learning activities, and assessments. Each two-year phase uses a distinct learning process which aims to develop autonomous learning progressively during six years. The approaches emphasize important adult education themes: student autonomy, learning from experience, collaborative learning, and adult teacher–learner relationships.

The program’s assessment system is particularly important and incorporates many novel features, including criterion referencing of results, interdisciplinary examinations, a balance between continuous and barrier assessments, peer feedback, and performance assessments of clinical competence. To examine the generic capabilities that are not easily measured by traditional assessments, and to ensure overall alignment between assessments, learning, and outcomes, a data-driven portfolio examination occurs in each phase.2 The portfolio assessment is supported by an information technology system, eMed,3 which records student assessment grades, examiner feedback on assessment tasks, and peer comments on teamwork skills. At the end of each two-year phase, students submit evidence of achievement for each capability using reflections on and reference to their performance recorded in eMed, compared with the expected level of achievement for that phase of the program. For example, to be deemed to have achieved an expected level at the end of phase 1 in the teamwork capability, a student should demonstrate effective participation in peer groups (e.g., as evidenced by comments given to and received by peers and tutors in the eMed:teamwork system) and show an understanding of the operation of health care teams (e.g., as evidenced by satisfactory grades in an assignment or group project that focused on such teams). Portfolio examiners look at the patterns of grades, teamwork comments, and other evidence cited by a student to see what lies behind the student data, and then they make a judgment from all the evidence. The portfolio assessment system is discussed extensively elsewhere.2,3

In contrast to previous arrangements in which individual disciplines and departments managed content-specific components of the old curriculum, responsibility for the planning and implementation of the new program was given to a single unit—the Office of Medical Education under the direction of the faculty’s associate dean of education. However, as part of a planned change management process to maintain faculty ownership and encourage disciplinary integration, much of the work was devolved to interdisciplinary course design and implementation groups, each responsible for the delivery of one or more modular, integrated, eight-week courses, or a limited number of vertical strands, such as clinical and communication skills and ethics. The interdisciplinary groups were assisted by instructional designers and/or included faculty members with pedagogical training or experience.

The need for effective evaluation

Such major pedagogical and organizational change required the development of systems and processes to (1) evaluate the effectiveness of the change, (2) monitor its implementation to enable continual improvement, and (3) use evaluation to recognize and report on excellence in teaching. An important driver was the accreditation authority, the Australian Medical Council (AMC), which requires highly specific standards with respect to ongoing monitoring, evaluation, feedback, and reporting of schools’ operations,4 similar to those specified by the Liaison Committee for Medical Education for North American medical schools.5 The AMC lists nine relevant standards that include the need for “ongoing monitoring … [of] curriculum content, quality of teaching, assessment and student progress,” the need for “teacher and student feedback,” and “using the results [of evaluation] for course development.” Schools are required to analyze “the performance of student cohorts … in relation to the curriculum and the outcomes of the medical course” and “evaluate the outcomes of the course in terms of postgraduate performance, career choice and career satisfaction.” Finally, schools are required to report “outcome evaluation … to academic staff, students … [and] … the full range of groups with an interest in graduate outcomes.”4

Meeting these standards for an integrated program containing many innovative features required the development of new strategies and approaches to evaluation and improvement. Initially, we conducted a review of previous evaluation processes at UNSW and found these to be largely content driven and not integrated into an overall program evaluation framework. We then reviewed the medical and health education literature, which acknowledges the difficulty in conducting methodologically rigorous research or evaluation studies in education.6 We found many examples of evaluations of individual aspects of particular programs but few descriptions of evaluations of multiple aspects of a program and very little around the optimal strategy for systematically evaluating and then improving a medical program as a whole. Of those reviewed, two examples stand out for their breadth of evaluation approach adopted—the McGill dental curriculum7 and the medical program at Dundee University.8 For the McGill dental curriculum, a range of evaluation tools were used, focusing on clinical productivity, instructor effectiveness surveys, graduating student surveys, patient feedback surveys, and board examinations. Although this approach had the advantage of combining external assessments (board examinations and patient surveys) with internal assessments (clinical productivity, instructors’ surveys, and student surveys), it is unclear whether it appropriately encompassed all aspects of the program.

The University of Dundee group, in describing its approach to evaluation of the medicine curriculum introduced in 1995,8 used evidence from a number of sources, which included internal reviews, external reviews, and student examination data. The internal review data included results of internal university monitoring of academic standards and quality awards, staff and student evaluations, student examination results, peer review of teaching, and review of some of the more novel elements, such as portfolio assessment and student diaries. External reviews included formal review by various accreditation authorities and informal evaluative reviews collected from visitors to the medical school. Whereas these data seem to be comprehensive and it is clear that, at times, they have guided changes in the curriculum, it is not clear how consistently the data are collected, what weight is placed on individual components, or how it is ensured that the evaluation encompasses all key elements of their program. Overall, we found little practical guidance in developing frameworks for systematic evaluation of undergraduate educational programs; the limited literature that does exist is more applicable to GME.9–11 In the wider evaluation literature, a topic of currency is the importance of program evaluation leading to explicit actions, particularly improvement for students and other key stakeholders,12–14 an issue commonly referred to as “closing the loop.”15

Developing a Multicomponent Model to Evaluate Program Quality

To develop a comprehensive evaluation process, a Program Evaluation and Improvement Group (PEIG) was established by the dean of medicine with administrative support from the Office of Medical Education and the associate dean of education. Members of the group included senior campus-based academics, clinical academics based in teaching hospitals, the director of the university’s Quality System Development Group, who augmented the group’s expertise in educational measurement and evaluation, and a full-time senior project officer whose salary was funded by the dean’s office. Membership has never exceeded six people. Terms of membership are not fixed, and two members who resigned from the PEIG were replaced by personal invitation from the dean. On the basis of our experience, we recommend that a high-level strategic group such as the PEIG should include the associate dean of education, key staff responsible for or influential in each phase of the program, personnel with evaluation or measurement expertise, and a project officer with experience in quantitative or qualitative evaluation. At least one member of the group should be represented on the school’s key curriculum committee.

A primary task of the PEIG was to establish a framework or model that we considered would encompass all facets of the curriculum at UNSW. With the AMC standards4 as an important driver, we formulated six strategic principles to guide development of a program evaluation and improvement strategy (List 1). We adopted a holistic view of evaluation, based on the principles that both student and staff experiences are important and provide valuable information, that there needs to be measurement of student and graduate outcomes and that an emphasis on action after evaluation is critical. The need for a multiple-aspects view of the academic program (and, by extension, program quality) was also considered important. Finally, we agreed that the evaluation process should also seek to develop explicit and systematic means for fostering and recognizing the commitment of faculty and professional staff to the continuing improvement of the program.

List 1 The Six Strategic Principles of the Program Evaluation Improvement Group at the Faculty of Medicine of the University of New South Wales, Sydney, Australia

With these issues and ideas in mind, we undertook the development of a progressive model for program evaluation and improvement. We used workshops to identify the critical elements or quality aspects that, together, we viewed as defining quality in the Medicine program at UNSW. We considered that articulation of these program quality aspects (PQAs) was a necessary condition for the development of a model for continuing comprehensive evaluation and improvement of the program. The four PQAs adopted were curriculum and resources, staff and teaching, student experience, and student and graduate outcomes (Table 1). For each PQA, we formulated three to four components to describe the principal building blocks of the PQA. For example, we decided that learning and teaching, administration and support, sense of community, and the admission/transition process were the important defining components for the student experience. We next identified a set of one or more key quality indicators for each component to define, in more practical terms, particular criteria and characteristics of that component that are amenable to implementation of program evaluation and improvement processes. For example, we considered that the learning and teaching component of student experience encompasses four important and potentially evaluable indicators: student satisfaction with learning and teaching, student perception of the quality of learning and teaching materials, student perception of the quality of the physical environment, and student perception of the quality of the learning culture.

Table 1
Table 1:
Program Quality Aspects (PQAs) and Key Quality Indicators of the Undergraduate Medical Curriculum of the Faculty of Medicine of the University of New South Wales, Sydney, Australia

In deriving the key quality indicators, we considered the following questions: (1) Would the indicator facilitate action for improvement? (2) Is valid information available for the indicator? (3) Will the indicator convey clear meaning to stakeholders? (4) Would the information content of the indicator be considered trustworthy and valuable? and (5) Does the indicator represent a balance of perspectives for multiple stakeholders? In addition, we were cognizant of the general principles of content validity and relative importance. A final list of 23 indicators across the four PQAs was agreed on, representing a compromise between achieving a broad assessment of the quality of the program whilst limiting the indicators to a manageable number in light of finite resources for evaluation efforts (Table 1). The process of identifying the key quality indicators reinforced the notion that there is inevitable overlap among the four PQAs and that careful design of evaluation and improvement processes would allow information to be gathered across a number of the PQAs simultaneously.

We recognized that any such model can only be effective if there is ownership of it by key stakeholders and if these stakeholders are actively participating in the processes that are developed as an extension of the model. A key practice principle we adopted to implement the model is that, wherever possible, evaluation activities should be viewed as an integral part of day-to-day practice rather than as externally imposed additional tasks. Thus, an important philosophy is that teachers, course and phase convenors, and relevant administrators should undertake evaluation and improvement activities as an inherent part of effective and scholarly teaching. For example, course convenors are instructed to include one page in the front of student course outlines headed, “Changes made to this course as a result of student feedback.” This ensures that convenors reflect on the informal and formal student feedback received and implement improvements, as well as communicating to students that their feedback is valued and acted on. The PEIG sees its principal role as supporting and facilitating this process, rather than being an external body undertaking such tasks. Importantly, evaluation and improvement processes, together with other mechanisms, such as recognition programs, should lead to evident improvements, changes and/or rewards in the program, and, particularly, to enhancement of student and teacher outcomes and experiences.

Addressing Evaluation and Improvement of Each Quality Aspect

During the last three years, we created three working parties to establish the practical processes required to enable evaluation of, improvement of, and reporting on the quality of the Medicine program. These working parties are aligned with three of the four PQAs: staff and teaching, student experience, and student and graduate outcomes. No working party was established for the fourth PQA, curriculum and resources, because a considerable amount of recent effort had been focused on this area during the development of the Medicine program, and we felt that each of the other working groups would evaluate some aspects included in this PQA during their work. We intend for a specific group to be formed to focus on this PQA in future years. The creation of these three working parties enabled additional staff and students to participate in the program evaluation and improvement processes. Before establishing specific projects, each working party reviewed their respective indicators (Table 1) to identify relevant available data, areas where instrument development needed to occur, and the reporting, communication, and implementation implications for indicator development. To illustrate the evaluation model in practice, two areas of work to date are elaborated on below.

Evaluation and improvement of the medical student experience

In 2005, the student experience working party established a baseline project to research and evaluate the medical student experience at UNSW and to establish a sustainable process for its evaluation and improvement. Four stages were involved: (1) identifying the students’ understanding of the construct “student experience,” (2) developing a trial instrument and process for collecting evaluative data from students, (3) undertaking appropriate analyses and communication of findings to stakeholders, and (4) establishing an agreed and sustainable process within the faculty for evaluation and improvement of the medical student experience. Before developing any instrumentation for tapping student perceptions of their experience as medical students at UNSW, we decided to investigate what students understood by the term student experience. In response to invitations to participate, a sample of approximately 80 students was consulted using semistructured focus groups and interviews. The ideas generated by students were analyzed and combined with factors identified in the literature as important to the student experience. Five main facets emerged and were adopted for the trial medical student experience questionnaire (MEDSEQ): learning, teaching and assessment; organization and student understanding of the program; community interaction and value; student support; and resources. These encompassed most of the key quality indicators for the student experience aspect shown in Table 1.

The trial MEDSEQ used 32 fixed-response items arranged under the five facets referred to above, plus the option of open-ended comments to collect student participants’ responses. We used a five-point scale (ranging from only rarely to almost always) for students to assess the frequency with which they had experienced the circumstances described by each facet (Chart 1). The survey was administered using a purpose-designed online process in August 2006. A total of 505 students responded to the survey, representing a response rate of nearly 50% of the approximately 1,100 undergraduate medical student population, including equal representations of students enrolled in both the old and new Medicine programs. The data generated have been analyzed and findings communicated to faculty leaders, academics, and students. The quality of learning resources, learning in clinical settings, and collaborative learning between students were all identified as specific areas of strength in the program. An area identified as needing improvement was the provision of clearer information, both to help students with transitions within the program and to give better feedback on their learning. Although there are some caveats, the experience of students in the new Medicine program seems improved in certain areas compared with the experience of students studying in the old Medicine program (e.g., 50.4% of students in the new program compared with 23.1% of students in the old program responded positively to the item concerning communication between students and teachers).

Chart 1 Example Items from the Medical Student Experience Questionnaire (MEDSEQ) of the Faculty of Medicine of the University of New South Wales, Sydney, Australia

In early May 2007, after review of the major MEDSEQ report, the Faculty of Medicine’s principal curriculum committee endorsed embedding the MEDSEQ evaluation and improvement process in the Faculty of Medicine’s continuing operations. Data gathering, analysis, and communication of findings will occur every two years, and a range of responses to the findings in the inaugural report will be initiated to ensure provision of more effective feedback on learning and increased availability of appropriate mentoring.

Improvement of teaching and the management of teaching workload

The working party on staff and teaching has been addressing two of the key quality indicator components for this program aspect: support for teaching and improving the quality of teaching (Table 1). With respect to support for teaching, we developed a policy document and framework for implementation entitled “Managing Teaching Workloads in the Faculty of Medicine,” which was endorsed by the faculty’s major management committee and is being used by department heads to help manage their teaching and research missions. Teaching activities have been assigned a weighting (e.g., one hour of lecturing or tutoring is valued at two units, marking or examining is valued at one unit/hour, and undertaking clinical activities with students present is valued at 0.5 units per hour). A benchmark of approximately 200 units per academic session (14 weeks) has been suggested for full-time academic staff employed by the university. We believe this framework will assist in enabling individual academics to negotiate workloads more effectively, clarify expectations of new staff about teaching workload, facilitate career development planning, assist department heads in management of staff and teaching workloads, and enable promotion committees to make better, data-driven judgments on evaluating teaching contributions. Although some may not see this work as evaluation, we contend that it is a good example of critical enablers that contribute to and define a high-quality program.

With respect to improving the quality of teaching, guided by the principle of “closing the loop,” efforts are being made to enhance the extent to which student evaluative data are acted on to implement improvement, stimulate reflection, and foster more scholarly activities in relation to learning and teaching. Improving teaching on the basis of student feedback has proved challenging at UNSW in the past because feedback on individuals’ teaching is reported only to the teacher affected and is not available to curriculum leaders or coordinators. A project was instituted targeting small-group facilitation in phase 1 of the new Medicine program, whereby the university’s data-analysis unit provided the PEIG with deidentified data (student ratings) from the standard UNSW teaching evaluation instrument for all staff engaged in this style of teaching for a single calendar year. The data for each of the 10 student-rated items in the instrument were analyzed, and pentile values were determined. A simple “star rating” was then assigned to each pentile (one to five stars). A practical database program was built for use by teachers to reference their feedback against all other staff teaching in the Faculty of Medicine in similar contexts. Individual teachers can enter their own ratings for each item into the program, which then reports a one- to five-star rating for each item, an interpretation of the rating, and links to suggested resources for improvement if necessary. The deidentified data are also being used to plan tutor-training activities.

Expected Outcomes and Initial Evidence of Effectiveness

The key outcomes we expect as a result of the processes and strategies we have implemented can be distilled to the following: multiple aspects of the quality of the UNSW Medicine program are able to be evaluated and reported (e.g., student and staff experiences, student outcomes, content and resources); the evaluation process is continual; action to improve the program follows measurement; and the processes we describe are an inherent part of the responsibility of teachers and course coordinators. We have described in this article the overall framework and approach we have adopted, and, to date, this approach remains a work in progress. We believe the examples described above show initial evidence that we are achieving a number of these outcomes. One final example illustrates our principles that evaluation and improvement should be continual rather than episodic and that action should follow data collection.

Phase 1 (medical school years 1 and 2) of the program consists of four integrated eight-week courses per year, the majority of which have been formally evaluated using a standard UNSW course evaluation instrument. As shown in Table 2, initial evaluations in 2004 showed strong evidence that the new program encouraged active student participation and collaborative learning (92% and 97% agreement, respectively). However, students expressed significant dissatisfaction with the provision of feedback and with assessment tasks. The phase 1 curriculum committee adopted a number of initiatives aimed at improving communication with students (publication of newsletters and a comprehensive annual report providing feedback on assessments, staff contributions, student evaluations, and proposed improvements). The PEIG organized a workshop for representatives from course design groups to reflect on and improve the instructional design of small-group tutorial sessions. Course outlines for students were continually updated to improve the information provided to students on assessments. Formative assessments were provided to students in each course to help them prepare for end-of-course examinations.

Table 2
Table 2:
Percentage of Phase 1 (First- and Second-Year) Students Who Agree or Strongly Agree With Each Item From a Course Evaluation Instrument, Faculty of Medicine of the University of New South Wales, Sydney, Australia

Over subsequent years, there has been significant evidence of improved student experience of phase 1 with progressively greater levels of agreement that the courses provided adequate information about assessments and that the assessments were appropriate (both up to 81% agreement) (Table 2). Students continued to express high levels of agreement that the phase 1 courses encouraged active student participation (92%) and collaborative learning (96%), and overall satisfaction with the course quality has improved from 79% in 2004 to 90% in 2007. Despite this early evidence that the courses in phase 1 have been progressively improved, students continue to express low levels of agreement that they receive adequate feedback (38% in 2007 compared with 31% in 2004)—clearly an area where further improvement is needed. A major episodic evaluation of phase 1 is planned for 2008 and will include external review. However, the data presented in Table 2 affirm our strategy of continuous evaluation and implementing actions as a successful approach to improvement.

Summing up

We have developed an innovative approach to evaluation and improvement for a complete undergraduate medical education program. Our approach evaluates the quality of the program in terms of four main and related aspects: curriculum and resources, staff and teaching, student experience, and student and graduate outcomes. Twenty-three key quality indicators aligned with these four aspects provide a broad, but manageable framework to guide the development and implementation of sustainable program evaluation and improvement processes (Table 1). This framework allows the PEIG to make coherent, explicit, and strategic decisions on which quality aspects to allocate resources at any given time. Although still at an early stage, the examples of the strategy in action we describe provide initial validation for the value of this approach.


1 McNeil HP, Hughes CS, Toohey SM, Dowton SB. An innovative outcomes-based medical education program built on adult learning principles. Med Teach. 2006;28:527–534.
2 Toohey S, Kumar RK. A new program of assessment for a new medical program. Focus Health Prof Educ. 2003;5:23–33.
3 Watson EGS, Moloney PJ, Toohey SM, et al. Development of eMed: A comprehensive, modular curriculum-management system. Acad Med. 2007;82:351–360.
4 Australian Medical Council. Assessment and Accreditation of Medical Schools: Standards and Procedures. Available at: ( Accessed April 16, 2008.
5 Liaison Committee for Medical Education. Functions and Structure of a Medical School. Available at: ( Accessed April 16, 2008.
6 Dauphinee WD, Wood-Dauphinee S. The need for evidence in medical education: The development of best evidence medical education as an opportunity to inform, guide, and sustain medical education research. Acad Med. 2004;79:925–930.
7 Dagenais ME, Hawley D, Lund JP. Assessing the effectiveness of a new curriculum: Part I. J Dent Educ. 2003;67:47–54.
8 Davis MH, Harden RM. Planning and implementing an undergraduate medical curriculum: The lessons learned. Med Teach. 2003;25:596–608.
9 Durning SJ, Hemmer P, Pangaro LN. The structure of program evaluation: An approach for evaluating a course, clerkship, or components of a residency or fellowship training program. Teach Learn Med. 2007;19:308–318.
10 Musick DW. A conceptual model for program evaluation in graduate medical education. Acad Med. 2006;81:759–765.
11 Holmboe ES, Rodak W, Mills G, McFarlane MJ, Schultz HJ. Outcomes-based evaluation in resident education: Creating systems and structured portfolios. Am J Med. 2006;119:708–714.
12 Harvey L. Student feedback. Qual High Educ. 2003;9:3–20.
13 Centers for Disease Control and Prevention. Framework for Program Evaluation in Public Health. Atlanta, Ga: Centers for Disease Control and Prevention; 1999.
14 Boyle P, Bowden J. Educational quality assurance in universities: An enhanced model. Assess Eval High Educ. 1997;22:111–121.
15 Watson S. Closing the feedback loop: Ensuring effective action from student feedback. Tertiary Educ Manage. 2003;9:145–157.
© 2008 Association of American Medical Colleges