Selecting an assessment method for measuring students’ performances remains a daunting task for many medical institutions in Africa where attempts to revamp existing assessment methods have been hampered by serious financial and personnel shortcomings and the lack of suitable alternatives.1 The difficulty of this task is further compounded by the fact that changing an assessment method may call for changing medical curriculum.2
Therefore, in Africa, measuring students’ performances is not the sole determinant for choosing an assessment method. Other factors such as cost, suitability, and safety have profound influences on the selection of an assessment method and, most probably, are the major cause of inter-institutional variations not only in their selection of assessment methods but also in their success rates.3 Even at institutions where several assessment methods have been used, their integration has been greatly influenced by the aforementioned factors. We feel strongly that a comprehensive evaluation of cost, suitability, and safety should be a prerequisite to selecting an assessment method.4–6
Unfortunately, the assessment methods used by most medical institutions in Africa have been selected empirically and, hence, they are highly variable and inconsistent both intrainstitutionally and interinstitutionally. Because this situation may have arisen from difficulties in evaluating the potential influencing factors, it highlights the need for guidelines on the selection of assessment methods. Such guidelines are needed especially in South Africa and other countries in Africa where medical schools are transforming their learning curricula and, therefore, assessment methods. A lack of guidance on the selection of assessment methods may lead institutions to adopt models that cannot be sustained locally. There is no literature on assessment methods in an African context, which underscores the grimness of the situation. Providing a context for the selection of assessment methods that is specific for African medical institutions would give due regard to the extremely limited resources in Africa, which are confounded by adverse socioeconomic and political factors. In this article, using our experience as both trainees and trainers in Africa, we propose a standard approach for selecting an assessment method for testing students’ performance in an African institution. Although this proposal was stimulated by the need to guide African institutions on the selection of assessment methods, we believe the selection criterion can be used by any institution and in any situation.
Our Formula for Evaluating Assessment Methods
We compared six assessment methods: essay examination, short-answer questions (SAQ), multiple-choice questions (MCQ), patient clinical examination (PCE), problem-based oral examination (POE), and objective structured clinical examination (OSCE) for their abilities to test for students’ performance and their ease of adoption with regard to cost, suitability, and safety. Each of these factors is described below, as are the rating scales we used to evaluate them. We based our evaluation on a standardized class size of 150 students. The six methods were selected because they are used broadly at most African medical institutions.
Description of Assessment Methods
To avoid complexities that might arise from minor interinstitutional variations in administering the assessment methods, we standardized the description of each method and examination time for our evaluation. The essay examination is the traditional examination where students respond in writing to one or a set of questions,7,8 and the SAQ involves writing short answers to short questions sampled from a large part of the curriculum.9 Both the essay and SAQ examinations are normally two hours long and are scored with 100 marks each. Preparing these examinations involves setting the questions and a memorandum of answers, which must be reviewed by a moderator. Preparation of the MCQ examination has similar requirements, but the length of its administration differs (duration, 1.5 hours).
The PCE is a two-hour examination involving one long case and two short cases using nonstandardized patients. The long case tests students’ skills inpatient clerking (e.g., history and patient examination) and takes 1.30 hours (45 minutes for clerking, 15 minutes for presentation, and 30 minutes for evaluation). The two short cases test students’ skills in examination techniques and diagnosis, and each lasts for 15 minutes. All examination patients are picked at random, are not previously seen by the student, and may be different for different students.
The POE is a two-hour examination where students first write for an hour on one long case, which tests their clinical problem-solving skills. After the long case is marked, the students are examined in a one-hour oral (viva) examination (30 minutes covers the long case and 30 minutes covers two short cases). The short cases are picked at random, are not previously known to the student, and may be different for different students.
The OSCE is a two-hour examination during which students move through several stations (10 minutes per station) where they are examined on different aspects of the station’s subject. The effective examination time is 100 minutes, with 20 minutes allowed for moving between stations.
The PCE, POE, and OSCE each require several examination stations or rooms with at least three examiners in each room.
Our Protocol for Evaluating the Assessment Methods
We scored each assessment method for performance in terms of the method’s ability to test for clinical problem solving, knowledge and recall of facts, communication, and practical skills. The maximum performance score for each assessment method was 10 points. Because problem-solving skills are considered a fundamental outcome of a medical curriculum, we weighted it more (a possible four points) than we weighted the other three skills (a maximum of two points each). If an assessment method reliably tests for clinical problem-solving skills we scored it as four; one with a limited scope for testing this skill was scored two. A method with unsatisfactory testing for problem solving was scored one, and one not involving problem-solving skills was scored 0. A method with the ability to test for knowledge (or to sample a wide range of knowledge) was scored two, whereas one with a limited ability to test for this skill (or limited sampling range) was scored one, and one with a lack of application was scored zero. The evaluation of each method’s ability to test for communication and practical skills were scored along the same scale as the test for knowledge.
We evaluated the cost of implementing an assessment method based on the cost of materials for examination purposes, examiners (paid according to hours of examination and marking), and patients or patient simulators. Again, we used a 10-point scale for this evaluation. The cost of examiners was weighted the most (six points), and the extra material and patient costs were each scored to a maximum of two points.
The cost of examiner were evaluated in terms of the number of manpower hours (MPH); that is, the time required to complete the whole examination process, including examination supervision and marking. MPH excludes preparation of examinations, which we considered under suitability. The scale for scoring for MPH was as follows: 0–10 MPH = 6; 11–100 MPH = 5; 101–200 MPH = 4; 201–300 MPH = 3; 301–400 MPH = 2; 401–500 MPH = 1; and >500 MPH = 0.
We defined extra materials to be stationary (pens and paper), textbooks, computers, videos, x-ray films, specimens, laboratory test results, etc. If an assessment method required only stationary, we scored it as two. If it required one extra item in addition to the stationary, we scored it as one; if it required two or more extra items other than stationary, we scored it as zero.
We defined patient costs as expenditures, direct and indirect, on patients or simulators kept in hospital for examination purposes. The need for patients was scored separately to enable application of the selection criteria to both clinical and nonclinical subjects. Also, separating this scoring emphasizes the importance and cost implications of using patients or patient simulators. More important, the need for patients distinguishes the POE from the PCE or OSCE. We scored an assessment method that does not involve use of patients or simulators as two. If the institution incurs only the indirect costs of the patients’ stay for the assessment, we scored it as one, and we scored methods that use patients or patient simulators who are paid as zero.
We scored an assessment method’s suitability to the institution based on the time required to prepare an examination and use an examination venue (cubicle hours), and the need for special examination requirements at the venue, termed here as resources. The maximum possible score was 10. We considered examination preparation under suitability because, in addition to setting questions, it involves preparing the venue and other special examination arrangements.
The venue was considered as the most important suitability factor (maximum score of six), and resources and examination preparation were each scored to a maximum of two points. Venue was calculated as cubicle hours (CH) required for examining all the 150 students as follows: 0–10 CH = 6; 11–100 CH = 5; 101–200 CH = 4; 201–300 CH = 3; 301–400 CH = 2; 401–500 CH = 1; and >500 CH = 0.
Resources beyond those normally used in training, such as a table and chair for every student, were thought to include beds, shield curtains, major modification of rooms or environment, etc. If an assessment did not call for special resources, we scored it as 2; an addition of one extra resource was scored 1, and an addition of two or more resources was scored 0.
The time of examination preparation was expressed as effective work-hours. If the examination preparation took less than 12 effective work-hours, we scored it as two, 12 to 24 hours was scored one1, and greater than 24 hours was scored zero.
Safety or proof of an assessment method to resist leakage and cheating of any nature was also scored. A score of six was given to an assessment method where communication between students is impossible. If communication is possible but it doesn’t improve a student’s performance, the method was scored as four. When communication could indirectly influence the student’s performance in the examination, we scored it as two. Examinations such as “the project execution test and portfolio,” where communication between students is allowed and improves their performance, were scored as zero. This specific type of examination, however, was not part of this evaluation.
Similarly, examination leakage could not be tolerated. For leakage, we scored assessment methods as follows: those that alluded to any probability of leakage were scored as zero, those that had no probability of leakages, but examiner bias may influence a candidate’s performance were scored two, and those with no probability of leakage were scored as four.
Table 1 shows the evaluation scores for the four parameters of students’ performance. Overall, the best assessment method for students’ performance is the OSCE; the MCQ faired the worst. The PCE and POE assessment methods, which have limited application on testing knowledge and recall of facts, together with OSCE, received high scores for problem solving, communication, and practical skills.
Table 2 shows the assessment methods’ scores for cost of examiners, examination materials, and patients. Overall, the MCQ cost the least due to computerized marking. Its total examination time, 9.5 MPH, was scored a six. One extra examination resource (computer) was required, and patients were not applicable.
The essay and SAQ papers differed only in estimated marking time (1.5 hours per student versus 1.3 hours per student, respectively). Both required five supervisors (two hours each) and required no additional materials (students only needed standard stationary).
Total examination time for the PCE was 675 MPH, giving a score of zero, and more than one extra examination materials would be required (i.e., x-rays and laboratory investigation results), and nonpaid patients would be used.
Total examination time for the POE was 530 MPH, giving a score of zero. The cost of examination material and patients were not applicable, as students only needed standard stationary.
The OSCE was the most costly examination. With several stations and three examiners each, 150 students would take 900 MPH. More than one extra examination resource would be required (i.e., computers, photographs, videos recorders, x-rays, laboratory investigation results and specimens), and paid patients simulators would be used.
Table 3 shows the suitability scores of assessment methods. In general, the essay, SAQ, and MCQ examinations were the most appropriate for suitability. The PCE and OSCE required a lot of time for occupation of venues, examination preparation, and many additional resources.
In the case of the essay, SAQ, and MCQ examinations, all 150 students can sit in one room at one time. They require less than ten CH, for a score of six each. They require nothing beyond chairs and tables. The POE requires two CH per student (300 CH) and more than one extra resource (i.e., shield curtains and beds). The time of examination preparation is more than 24 work-hours. The PBO requires one CH per student (151 CH, including one hour for long-case write up), but no special resource beyond that of chairs and tables. Its time of preparation was less than 12 hours. The OSCE requires two CH per student (300 CH), and more than one extra resource would be required (i.e., beds, shield curtains and modification of rooms to make examination stations). The time of examination preparation would be more than 24 work-hours.
Table 4 contains examination safety scores for the assessment methods. Communication between students is not possible during essay, SAQ, and MCQ examinations. Whereas it is true that some students may discuss their PCE, POE, and OSCE examination experiences, there is no proof that this communication improves students’ performance. The probability of examination leakage is higher for examinations where questions are predetermined, such as the essay, SAQ, MCQ, and the written part of the POE.
Models for Selecting Assessment Methods
Based on our findings, we have developed the following models for selecting assessment methods at African medical institutions.
The Ideal Model
The ideal model applies to any institution where finances are not so critically lacking as to impair the development of effective learning and assessment programs. This is primarily the case at North American, UK, and Australian medical schools. In this model, overall scores for the four major factors (i.e., performance, cost, suitability, and safety) are weighted according to their importance at a particular institution and a total score is used to determine the most appropriate assessment method.
Table 5 illustrates the application of the ideal model for selecting an assessment method. For this hypothetical institution, the ability to test for performance is the most important determinant of the choice for an assessment method. Accordingly, performance is weighted by a factor of seven. Cost is weighted by a factor of two because it is a problem for any medical institution. Suitability and examination safety issues are not concerns, except for the need for continuous policing, so each is weighted by a factor of only 0.5.
In this ideal situation, the OSCE would be the most appropriate assessment method for testing students’ performance. And, in fact, OSCEs are used as the best method for students’ assessment in North America, the United Kingdom, and Australia.10–12 Note that even when an institution elects to use more than one assessment method, this model would be a useful guide to the selection process. The ideal model shows that the PCE and POE would be preferable alternatives to OSCE than the essay and SAQ examinations. MCQ are not recommended any longer.
Africa’s Current Model
Table 6 illustrates the current situation at many medical institutions in Africa, and probably in other developing countries elsewhere. In Africa, the poor socioeconomic status makes cost a very important determinant of the choice for an assessment method. In this model, we weighted cost by a factor of three. Suitability is also a cause for concern because many African medical institutions are experiencing admission of students beyond the capacity of the training facilities and personnel. As such, we weighted suitability by a factor of two. Examination safety problems, due to inadequate supervision and insufficient materials or facilities for examination administration, may also arise. Therefore, we have weighted safety by a factor of one.
In this scenario, the essay and SAQ examinations appear to be the most appropriate assessment methods, which reaffirms the current practice of student assessment in Africa, particularly in the basic medical sciences. The essay and SAQ examinations are usually supplemented by MCQ and/or oral examinations (viva), whereas during clinical years the PCE are used as supplements. Because patients constitute part of the clinical training program, their use in PCE would not constitute an extra examination resource, except when they are paid.
A New Model for Africa
Because essay, SAQ, and MCQ examinations are not optimal ways of assessing students’ performance, institutions using these assessment methods cannot confidently claim to have achieved the objectives of the medical curriculum (i.e., to enable a student to solve patient problems).7,8 The daunting task for Africa’s medical institutions, therefore, is to search for assessment methods that suit the local conditions without compromising educational standards. Table 7 illustrates a new model for selecting assessment methods at African medical institutions.
This new model calls for substantive improvisations and restructuring of our needs to minimize the influence of cost, suitability, and safety. Sharing or borrowing examination facilities, as well as use of open wards or halls with shield curtains for examinations, would obviate many issues under suitability. The rechanneling of finances to where they are most wanted, such as to the recruitment of more examiners and/or the purchase of essential examination materials, would alleviate examination time and improve safety issues. Some of the essential examination materials, such as computers with the required software programs or Internet capabilities, may reduce the time of examination preparation considerably. With such measures, the influence of suitability and safety would be greatly minimized (perhaps to a weighting factor of 0.5 each). Unfortunately, because the cost-cutting measures we describe are temporary, cost remains a factor in the implementation of assessment methods and, therefore, its weighting factor remains at three.
In the new model, the POE would be the most appropriate assessment method for an African medical institution. This is good news because POE augurs well the objective of the medical curriculum. However, the implications of changing the assessment method in Africa’s institutions must be considered before implementation of the POE. This is because assessment methods influence students’ learning behaviors, such that, a change in the assessment method would call for a change in the medical curriculum.12 We know that if the assessment method requires recall of facts, students tend to work on memorizing, whereas if the examination tests problem-solving skills, they will focus on acquiring problem-solving skills.13 Because the POE assessment would suit students instructed under a PBL program, using it on students instructed under the traditional program would be unfair and may lead to failure. Therefore, adoption of the POE assessment by Africa’s medical institutions would require them to change their medical curriculum to PBL.
Here, we have presented guidelines by which to select an assessment method through systematic evaluation of several factors. The model is easy to understand and can be a tool for promoting change in the method of student assessment and medical curriculum.
The authors thank the organizers of the third African course on pharmacotherapy training at Cape Town, (i.e., The World Health Organization and the Department of Pharmacology, University of Cape Town), for making this fruitful interaction of Africa’s professionals happen, and the South Africa Drug Action Program, for sponsoring the participants.