McLachlan, John C. PhD; Finn, Gabrielle; Macnaughton, Jane MB, ChB, PhD
There is evidence that negative student behavior during undergraduate programs is related to the likelihood of subsequent negative behavior in later careers or postqualification practice.1–5 This, in addition to recent concerns about public perceptions of the medical profession,6 is the main reason why the issue of professionalism—how to teach, assess, and evaluate it—has become such a hot topic in medical education. This journal recently devoted a special issue to the topic.7 Research has focused on attempts to define professionalism,8 for without a definition or clear understanding of what professional behavior consists of (it is argued), it cannot be taught and certainly cannot be measured. Professionalism is a slippery concept, and definitions are difficult. Most comprise a list of attributes variably including altruism, honor, integrity, excellence, accountability, respect for others,9 compassion, continuous improvement, excellence, and working in partnership.10 Each of the attributes requires further explanation and agreement about what it means in the context of undergraduate education for it to be of any use either to guide teaching content or measurement.11 These complexities may lead to differing interpretations of professionalism on occasion.12
A Working Understanding of Professionalism
Although there is not one single definition that is universally accepted in all circumstances, staff and students in our medical school share a working understanding of professionalism based on the General Medical Council (GMC) publication Good Medical Practice.13 (The GMC is the body that registers doctors to practice in the United Kingdom). This document is given to all students on induction, and course outcomes relating to professionalism are drawn from it, explicitly and often verbatim. Students are advised that the principles in Good Medical Practice apply to them, and breaches could represent grounds for disciplinary proceedings.
Because we have used this document as the source of our working understanding of professionalism for the purposes of this study, an overview of the document is in order. It is best described as guidance rather than as a statutory code, although sentences containing the words “you must” are to be regarded as “an overriding duty or principle.” For example, “In providing care, you must work within the limits of your competence.” Six key principles are identified as the essential summary:
* Make the care of the patient the doctors’ first concern.
* Protect and promote the health of patients and the public.
* Ensure a good standard of practice and care, which includes keeping knowledge and skills up to date, working within personal limitations, and working cooperatively.
* Treat patients as individuals and respect their dignity, which includes treating them considerately but also with confidentiality.
* Work in partnership with patients, listen to their concerns, give them information appropriately, respect their decisions, and support their self-care.
* Practice medicine with honesty, openness, and integrity. This category includes nondiscrimination, meriting trust, and acting without delay if the doctor or a colleague is putting patients at risk.
Altruism is not specifically named as an overriding duty, although “make the care of the patient the doctors’ first concern” does carry that implication. Further detailed guidance is given on providing good clinical care; maintaining good medical practice; teaching, training, appraising, and assessing; relationships with patients; working with colleagues; probity; and personal health and self-care.
As the above summary indicates, the document includes well-accepted criteria for professionalism, although it does not define the term per se. The entire Good Medical Practice is available at (http://www.gmc-uk.org/guidance/good_medical_practice/index.asp).
A Tool to Explore Professionalism
Because professionalism is a qualitative entity, measurement in a scalar fashion that requires assessment of one person’s degree of professionalism against another’s is problematic.14 As a result, attempts to construct meaningful scales for the measurement of professionalism might seem doomed to failure.15 In addition, if professionalism is thought of as a competence,16 like examining the cardiovascular system or like some aspects of the teaching of communication skills, it then becomes the subject of a training approach in which students might be encouraged merely to practice honest behavior or good partnership. Assessment, accordingly, becomes focused on observing what has been practiced, and medical students are good at regurgitating required behavior as well as facts. Thus, we may not be sure that what is measured as a result of a competency-based approach is what the student might actually do in practice.17
In light of these challenges, we constructed a scale for an aspect of professional behavior that has two advantages in this problematic context: first, it is amenable to meaningful quantitative analysis; and second, it records spontaneous behavior. We call this scale the Conscientiousness Index (CI) of student performance and constructed it during a full academic year (2006-2007) for medical students in year 1 and year 2. (We used the word “conscientiousness” because conscientious students will do those things that are required to score well, whereas less conscientious students will not). As will be clear from our methodology, no qualitative judgments were required for the collection of CI data, so each individual student was legitimately given a score for conscientiousness that could also be compared directly with the scores of other students.
Students were advised orally and through our school’s virtual learning environment (VLE) at the beginning of the year that their teachers would be scoring them using the CI and that the CI had no summative or formative role. (The VLE is a Blackboard Web-based learning system supporting instruction, communication, and assessment, on which all medical students are registered). They also were informed that their anonymity would be preserved in all analyses of the data and that no academic harm would come to them from this process. However, this approach was not designed as an interventional research study; all the data involved in the CI were already being collected routinely, and the only difference from our standard practice was to gather it together in one place.
It should be noted that at our school, educational studies that do not require unusual approaches and that do not compromise the anonymity of the students have been considered as not requiring prior ethical approval from internal review boards.18
However, the applicability of that general principle to this particular study was confirmed in writing by the chair of the school’s ethics committee.
In the year under consideration (2006-2007), there were 116 first-year students and 108 second-year students. Faculty members or administrative staff in each course taken by students recorded the data which led to the award or deduction of specified numbers of conscientiousness points for each student for the reasons summarized below.
Awarding or deducting conscientiousness points
Attendance at compulsory teaching sessions during year 1 and year 2 of our medical program (the School of Medicine and Health, Durham University) was recorded by teaching faculty or administrative staff through the use of registers. At the start of the school year, each student was awarded 50 points for each of the three terms that year. One point was then deducted from this total for each unauthorized absence. (The school authorizes absences in advance or in retrospect for good cause, such as hospital appointments or illnesses).
Submission of data.
Students were advised in writing on two separate occasions of the necessity to provide, by a deadline, specified information on their Criminal Records Bureau status (i.e., previous criminal convictions) and immune status (in the United Kingdom, students are required to declare on entry if they are positive for a defined list of transmissible diseases, including HIV/AIDS and hepatitis B). Students were awarded two points for submitting the required information before the deadline, one point for submitting partial documentation before the deadline, and no points for failing to submit any of the required information before the deadline.
Student feedback was gathered using the university’s VLE, described earlier. Students keep track of timetables, access teaching materials, complete formative and summative assignments, and submit course evaluation feedback through the system. The VLE has the facility to track those students who have submitted course evaluation feedback without identifying the content of their feedback. Students were advised that failure to submit feedback would be taken into consideration in the monitoring of student conscientiousness. Students gained one point for each evaluation they submitted, but they did not lose points if they failed to submit.
Students were trained in the use of the Speedwell Learning Systems Optical Mark Reader (OMR) format employed in the medical school. The OMR enables students to mark responses to multiple-choice and extended-matching-item-format assessments (given in connection with their classes) on preprinted sheets, using a pencil. An introductory training session, with practice, and subsequent individual detailed feedback on incorrect usage, was provided. The training session information was made available through the VLE. A full formative examination using the OMR was conducted, again with individualized feedback. Students were then awarded one conscientiousness point for (1) each correct use of anonymous examination numbers during summative exams and (2) each correct completion of OMR responses during summative exams. All assessments that involved the OMR and that contributed to progression decisions were included. There were two such assessments in January and two in May, which is the usual pattern for such assessments.
Subsequent to a training session on how to submit assignments, the posting of assignment submission guidance on the VLE, and a practice formative opportunity, students received one conscientiousness point for each assignment correctly submitted on time. Those who failed to submit an assignment on time, without prospective or retrospective authorization for good cause, received no points. All assignments that contributed to progression decisions were included. There are three of these in each academic year.
Those students who participated in voluntary activities relating to the medical school were awarded one point for each separate activity. Such activities included sessions devoted to widening the range of candidates who apply to medical school. Such sessions included working with school students on visits to the medical program, assisting at extended “master classes” in which school students from deprived areas attend events over several days and are given an introduction to the kinds of activities found in undergraduate medical courses, and visits to local schools to encourage interest and answer questions.
In addition to the above categories, students could receive and lose points for individual, uncategorized events, as advised by the program manager (a senior faculty member who is responsible for the day-to-day administration of the program). These events were generally unique to a student. Positive events included responding professionally to a genuine medical emergency, advising staff of a possible examination impropriety, and advising staff of a possible breach of patient confidentiality. Negative events included reading but failing to respond to repeated e-mails from staff, and attending a teaching session in an unfit state. These negative events would not be significant enough in themselves to trigger a critical incident report (see below) or a fitness-to-practice procedure.
Establishing the validity of our approach
The GMC has listed a number of the above categories in their document Medical Students: Professional Behaviour and Fitness to Practice,19 supporting the face validity of our use of these categories as indicators of undergraduate professional behavior. However, to establish the concurrent validity of the approach (whether the CI correlated with other views or estimations of professionalism), we looked to see whether there was any relationship between the CI and staff views on individual students’ professional behavior and with data on critical incident reporting.
Validity 1: Correspondence with staff judgments on professionalism.
We classified the top 10, middle 10, and bottom 10 students in each year on the basis of their CI scores. The RAND function in Microsoft Excel was used to randomize the order of their names. A group of nine experienced staff members were asked to express an expert judgment on the professionalism of these students in each of the two years, using the three options listed in the next paragraph. These staff were isolated from knowledge of the CI scores of students throughout the year. Because our cohort of students is relatively small (of the order of 100-120 per year) and their two years with us are spent on a residential campus that is also the base for our staff, there is close and frequent contact between staff and students. The inclusion criteria for staff were (1) experience in working with medical students in general, (2) familiarity with the GMC definition of professionalism, and (3) close and repeated contact with students in teaching and support capacities throughout the relevant time period. The nine staff members included eight members of the academic staff at the lecturer level or above. The ninth was a senior technician who also plays a teaching role. The academic staff included teachers in classes entitled Personal and Professional Development, Medicine in the Community, Anatomy, Physiology, and Clinical Skills. They received no additional compensation for the task and received written guidance as to the general purpose of the exercise and a summary of the outcomes after the completion of the draft manuscript.
The staff were asked to choose one of the following options:
* I am happy with the professionalism shown by this student.
* I have some concerns with the professionalism shown by this student.
* I do not know this student well enough to comment.
The staff choices for each student were then compared. To further explore the continuous nature of the relationship, a Professionalism Index (PI) was calculated for each student, consisting of the “happy” score minus the “concerns” score. For instance, a student receiving five positive evaluations, three negative evaluations, and one “don’t know” would have a PI of +2. The maximum score is 9, because nine evaluators were involved. No adjustment was made on the basis of the absolute number of evaluations made; in other words, a student receiving five positive and four negative evaluations was given a score of +1, as was a student receiving two positive evaluations, one negative valuation, and six “don’t knows.” The correlation coefficient between the CI and PI scores was then calculated.
Validity 2: Correspondence with critical incident reports.
A critical incident reporting system is in place in the undergraduate medical program, modeled on the UK National Learning and Reporting System, which was created by the National Patient Safety Agency. A critical incident report is completed by staff members or students when they observe and choose to report a critical incident. Students are then invited to respond and reflect, using forms adopted from the National Health Service reporting forms. The occurrence of such recorded critical incidents for each student was also compared with his or her CI score.
To explore the reliability of the results, the academic year was split into two, and performance on the first half-year was compared with performance on the second half-year, for each of the two years under study. The Spearman rank correlation coefficient was calculated for each group.
Distribution of the CI scores
Conscientiousness scores ranged from 153 to 205 for year 1 students and from 116 to 195 for year 2 students. These scores were converted to percentages of the maximum possible scores and are displayed as histograms in Figures 1 and 2. Even casual comparison of these two figures reveals that the percentages are quite similar. The descriptive statistics are shown in Table 1.
The results (separately and combined) therefore represent a negatively skewed (toward the right) leptokurtic (values clustered close to the mean) distribution, with a major peak at 97%. The similar shapes of the year 1 and 2 graphs strongly suggest that there is a similar property underlying these distributions in each of the two years, despite the differences in the components that make up the CI in each year. (For instance, immune status and criminal records declarations are made only at entry into the program at year 1).
Validity 1. Relationship between CI scores and staff views on the trait of professionalism
The number of responses in each category were summed, and the summary outcomes are shown in Table 2. Comparing the “happy” and “concerns” categories by chi-square analysis shows statistically significant differences (P > .001 for both year groups). Combining both year groups, 67 out of 79 expressions of concern were found in the lowest-scoring group, 9 in the middle group, and 3 in the top group. The values obtained for the PI (described in the Methods section) for each year and each group of students are shown in Table 3.
Correlation coefficients between scores on the CI and the PI
These results relate to averages from the three samples from each cohort. As explained earlier, to further explore the continuous nature of the relationship, a PI was calculated for each student as the “happy” score minus the “concerns” score, and the Spearman rank correlation coefficient was calculated between the CI and PI. There was a statistically significantly positive correlation between the CI and PI in each year (0.603 for year 1 and 0.587 for year 2).
It was notable that students in the bottom 10 in each year who received the great majority of the expression of concerns also received a significant number of “happy” scores. This indicates that staff views of these students are mixed, with significant numbers of staff having no concerns about them. Staff views of students in the middle and top groups were much more consistent and favorable. Individual results cannot be presented, in view of risks to anonymity, but they are consistent with this description. A number of outlier results relate to students with declared disabilities. Such students were frequently perceived as showing good professionalism, even though their CI scores were low.
Validity 2. Relationship between CI scores and completion of a critical incident report
For year 2, there were 10 completed critical incident reports. Of these, 7 (70%) occurred in those 10 students who were ranked in the bottom 10 in terms (a group representing 9% of the cohort). All instances in which more than one critical incident report was associated with a particular student lay in this group, but further details cannot be given because of the risk of breach of identifiability.
For year 1, there were four critical incident reports recorded, three of which occurred in the 10 students with the lowest CI scores. One of these students received two critical incident reports, the only instance of multiple forms being completed in the year group. We cannot comment further on multiple instances because of the risk of breaching anonymity.
When the academic year split into two and performance on the first half-year was compared with performance on the second half-year, calculation of the Spearman rank correlation coefficient showed statistically significantly positive correlations of 0.60 for year 2 and 0.59 for year 1, which indicates reasonable consistency for an educational effect.
Discussion and Conclusions
It can be seen that there is a relationship between the scalar, objective trait of conscientiousness as we have defined it and the property, or construct, of professionalism as perceived by the nine staff members’ independent judgments of the students. A total of 67 out of 79 expressions of concern were found in the lowest-scoring CI group, 9 in the middle CI group, and 3 in the top CI group for the two years together. Calculation of the correlation coefficient between the CI and a summary PI statistic derived by subtracting the “concerns” score from the “happy” score shows that there is a positive correlation, but it is less striking than the comparison with the CI and “happy” scores. This is because the relationship is plainly nonlinear—as can be seen in Table 2, there is relatively little difference between the top 10 and the middle 10 in terms of expressions of concern, but there is a marked difference between the bottom group and the middle and top groups. There also seems to be a relationship, not yet quantifiable, with the frequency of critical incident reports, particularly in the year 2 student group. Together, these findings suggest there is concurrent validity for the measure of professionalism through the measures of conscientiousness we have used. This is in line with previous studies1–5 that suggest that behaviors similar to those measured here have either concurrent or predictive validity for future practice. We argue that the CI has the benefit of including a wider range of measures of conscientiousness than have been used before, thus increasing its validity and probable reliability.
Students about whom some staff expressed concerns also frequently received indications that other staff were happy with their approaches to professionalism. This shows that student behavior is not consistently negative across encounters in different contexts and with different staff members, in line with previous findings. It also may show the subjective nature of staff members’ evaluations—what concerns one staff member may not concern another.
The presence of outlier scores of students with declared disabilities (an outlier meaning that the student has a low CI score but very good staff ratings for professionalism) suggests that the impact of their disabilities was not adequately captured by the CI calculation, and students with declared disabilities should be treated separately in further studies or summative use of the CI.
It is noteworthy that the “don’t knows” are highest in the middle group in each year. Staff in this survey frequently spontaneously expressed the view that they would know the really good students and the really poor students in each year, but they might not know the “average” students so well. This was confirmed by the observations. Staff also expressed the view that they knew the year 2 students better than the year 1 students, through longer exposure, and this view was also confirmed. These results also suggest that the data have validity.
That measurement by the CI is reliable is suggested by two observations. First, the distribution of scores for year 1 and year 2 is virtually identical in terms of shape and statistical parameters, suggesting that the same trait is being observed in each year. Second, there is a good correlation between each half of the year, even though the number of CI points differs between the half-years, as the number of occasions for recording changes. Higher total scores were recorded by year 1 students because there were more scorable events in year 1 (associated with the process of induction, such as bringing photographs to induction sessions).
This study has a number of limitations. It was conducted in a single institution, and there may be context-specific factors which affect the conclusions that can be drawn from it. The numbers are too small to explore statistically some of the relationships, notably that between the critical incident report forms in year 1, and the CI. Finally, although we used a shared understanding of the meaning of professionalism, we did not attempt to define that concept or establish criteria that the nine staff members could have used in making their rankings. Nevertheless, we have demonstrated that in circumstances such as those at our school, the CI clearly correlates with individuals’ subjective views of what constitutes professional behavior.
We will continue our research on the CI, and our findings will become stronger with longitudinal sampling, which will take a number of years (perhaps even decades) to complete. The staff ratings of professionalism are subject to all the difficulties that faculty assessment is associated with. However, we felt the results were sufficiently striking that it would be valuable to bring forward the findings and the methodology at this stage.
These observations suggest that the CI measures a scalar objective trait and that this corresponds to, or is a valid surrogate for, the construct of professionalism, however defined, when the CI is used by experienced staff familiar with the students. Because the individual decisions making up the CI are objective, the measure also has a high degree of interrater reliability. It also has the advantage that the data are relatively easy and uncontroversial for administrative staff to collect, because only a record of Yes/No decisions is required (such as whether an individual student has or has not attended compulsory sessions or submitted work), and it does not depend on value judgments. The CI could therefore be used as a surrogate measure of professionalism in summative contexts. There are doubts and uncertainties surrounding such a step relating, for instance, to the sensitivity and specificity of the measure or to setting a cut score for failure. These are resolvable in further studies using the same methodology; most currently employed measures of professionalism do not have this facility. Sensitivity and specificity could be established by comparison with an entire year group over an extended period, when adverse behaviors in clinical settings begin to be reported, and cut scores could be determined by a contrasting groups exercise. In the meantime, we would suggest that this measure might be used initially to detect students whose behaviors require investigation and challenge at an early stage in their professional development, and it may subsequently also be of value in measuring the effect of strategies to help students improve. Our findings suggest that in encouraging desirable professional behavior, targeting students’ conscientiousness might be a good place to start.