The last two decades have seen a groundswell in the recognition of the importance of medical professionalism, especially in academic settings.1–4 A physician charter5,6 was published in 2002 and is now endorsed by many professional associations and societies around the world,7 reflecting the growing importance of medical professionalism. There are several possible explanations for this movement, including greater societal interest in medical professionalism, higher expectations for medical professionals, the growing impact of commercialism on health care systems, growing malpractice litigation, and the conflict between altruism and work–life balance among younger generations of medical professionals. Whatever the reasons, society both needs and expects high standards of professional behavior from its physicians.5,6,8 There are several specific interventions that can foster medical professionalism among residents and medical students: setting expectations, providing experiential learning (e.g., formal/informal/hidden curriculum, ethics courses, role models), and evaluating outcomes using valid and reliable tools and methods.9–11
It is widely accepted that the assessment of professionalism itself conveys a strong message about what the evaluators consider to be important to the practice of medicine.9–11 Moreover, it is acknowledged that appropriate evaluation of medical professionalism can help prevent professional lapses.12 Although a variety of evaluation methods for medical professionalism have been developed and are currently available, the absence of a standard tool that is valid and reliable has constituted a significant challenge.8,10,13,14
Wilkinson and colleagues15 conducted a systematic review between 2007 and 2008 in order to analyze and categorize the various definitions and evaluation tools of medical professionalism currently available in the literature. They identified more than 30 evaluation tools that could be employed in a wide variety of methods and contexts and found a lack of a universally accepted tool. They categorized the evaluation tools into nine groups based on their respective characteristics. Among the nine categories, they placed the highest emphasis on “assessment of an observed clinical encounter,” based on the perspective that “doing” is placed at a higher level than simply “knowing” with regard to clinical competence.15,16 In keeping with this perspective, they concluded that of the tools that they analyzed, the Professionalism Mini-Evaluation Exercise (P-MEX)17 could be a core component of an assessment program, combined with other supplemental evaluation methods (multisource feedback, patients' opinion, paper-based tests, etc.).15
The P-MEX was originally developed based on the mini-CEX (clinical evaluation exercise). The mini-CEX is an assessment tool for the clinical skills of residents, in which the examiner rates residents on interviewing, physical examination, professionalism, clinical judgment, counseling, organization, and efficiency, using a nine-point scale.18 One hundred forty-two behaviors relevant to medical professionalism were identified by 92 faculty members at McGill University; from these, 24 items were retained because they had the potential to evaluate the largest number of attributes of professionalism. The selected behaviors were inserted into the mini-CEX format and scored on a four-point scale from “exceeded expectations” to “unacceptable.” The reliability and validity of the instrument were found to be good.17 Exploratory factor analysis of the instrument in this study showed clustering into four domains of skills representative of medical professionalism: These were labeled doctor–patient relationship skills, reflective skills, time management, and interprofessional relationship skills. As part of a study of the P-MEX's applicability in a Japanese cultural setting, the original P-MEX was translated into Japanese and the back translation was proofread by two authors (R.C. and S.C., two of the developers of the original P-MEX). The methods used to translate the P-MEX into Japanese have been described elsewhere.19
The P-MEX has been considered one of the most promising tools for evaluating medical professionalism8,15 for several reasons: It evaluates sets of objectively observable behaviors, and its high validity and reliability are applicable in Canada.17 Although the applicability of the P-MEX to the Japanese cultural context has been studied in a relatively small, single-center study,19 its generalizability in the Japanese cultural context had remained undetermined. As current concepts of medical professionalism have arisen from Western Anglo-Saxon countries, it is recommended to revalidate an assessment tool with attention to cultural relevance when a tool is to be used in a new context.8 In this study, therefore, we examined the validity, reliability, and generalizability of the P-MEX in Japan in a multicenter, cross-sectional study conducted at seven hospitals with varying characteristics from diverse areas of Japan.
Study settings and participants
Seven hospitals from different areas of Japan participated in this study: St. Luke's International Hospital (Tokyo), National Hospital Organization Tokyo Medical Center (Tokyo), Mito Kyodo General Hospital (Ibaraki), Kyoto University Hospital (Kyoto), Yokohama City University Hospital (Kanagawa), National Hospital Organization Nagasaki Medical Center (Nagasaki), and Teine Keijinkai Hospital (Hokkaido). Their locations were diverse, ranging from Hokkaido (northeastern Japan) to Kyushu (southwestern Japan). Five were community hospitals, and two were university hospitals, with markedly different educational settings and areas of faculty specialization. The residents and fellows who were evaluated for this study were assessed using P-MEX forms between November 2009 and March 2010. Evaluations were conducted in 360-degree fashion; evaluators were attending physicians, nurses, peers, and junior residents who worked closely in a clinical setting with those evaluated. Evaluators in the same postgraduate year as the resident/fellow being rated were defined as peers, whereas a junior resident was defined as a physician whose postgraduate year of training was at least one year junior to that of the resident/fellow being evaluated. Attending physicians, nurses, peers, and junior residents who had worked with the resident/fellow in the same inpatient ward or outpatient clinic for more than one month were considered eligible to be evaluators. The authors on-site selected the residents/fellows to be assessed and the evaluators, based on their eligibility. Prior to their evaluation, all evaluators were given detailed written instructions on appropriate completion of the P-MEX forms. The confidentiality and the anonymity of the evaluations were guaranteed for the evaluators and for the evaluated residents/fellows, in order to prevent any potential bias. Ethical approval was obtained from the institutional review board of all the participating hospitals.
Four different skill categories were assessed by the P-MEX: doctor–patient relationship skills, reflective skills, time management, and interprofessional skills. Each category contained between 3 and 9 items, for a total of 24 items. The score for each item was calculated based on ratings from the four-point rating scale. The validity of the P-MEX was assessed in the following three ways: First, the content of the P-MEX was examined in a working group that was held prior to the study evaluations, which consisted of attending doctors and nurses at St. Luke's International Hospital. After the completion of the P-MEX evaluations, authors performed a paper-based survey of evaluators (attending doctors, nurses, fellows, and junior residents) assessing the appropriateness of P-MEX items. The survey asked, “Are the contents of the P-MEX appropriate and valid for evaluating residents' medical professionalism?” Evaluators were asked to answer this question using a four-point scale, where 4 = strongly agree, 3 = agree, 2 = disagree, and 1 = strongly disagree.
Second, the criterion-related validity was evaluated based on the correlation of the average P-MEX score and the external criterion. After assembling the P-MEX assessments, the evaluators were asked to rate the resident's overall performance using the following statement: “The resident/fellow demonstrated high standards of professional behavior.” The rating of the external criterion was performed using a four-point scale: 4 = exceeded expectations, 3 = met expectations, 2 = below expectations, and 1 = unacceptable.
Third, construct validity was determined by carrying out a confirmatory factor analysis through structural equation modeling on all P-MEX items where the same model developed in the original P-MEX study in Canada17 was constructed and compared, and the goodness of fit of the model was tested. Structural equation modeling is a statistical technique for testing and estimating relations between variables using a combination of statistical data and qualitative causal assumptions.20,21 The goodness of fit of the model was determined by examining the comparative fit index (CFI), which indicates the percentage improvement in fit over the null model, and the root mean square error of approximation (RMSEA), which examines residual error. CFI values > 0.90 indicate a good fit to the data.22 An RMSEA < 0.05 indicates a close approximate fit, whereas an RMSEA < 0.10 indicates acceptable fit.23
The reliability of the P-MEX in the Japanese cultural context was assessed using a generalizability analysis and decision study.24 Dependability coefficients were calculated for each evaluator subgroup to determine the number of forms required to obtain a dependable estimate of the calculated average score for each subgroup. A dependability coefficient of 0.80 indicates good reproducibility based on the preliminary study of the mini-CEX.18 AMOS 17.0 software (Amos Development Corporation, Crawfordville, Florida) was used for the structural equation modeling. All other statistical analyses were performed using STATA/IC 11.0 (StataCorp LP, College Station, Texas).
A total of 837 P-MEX forms were completed and collected for 165 residents and fellows. Four hundred fifty-three (54.1%) of the P-MEX forms were completed by attending physicians, 165 (19.7%) by nurses, 96 (11.5%) by peers, and 123 (14.7%) by junior residents. A total of 378 evaluators enrolled in this study and completed the P-MEX assessment. An average of 5 evaluators per resident/fellow conducted the evaluation.
The mean score for the 24 P-MEX items was calculated for each evaluation form, and scores were then aggregated for all trainees being evaluated. Mean item scores with standard deviations of each dimension are presented in Table 1. The mean average score on all 837 forms was 3.25, identical to that of the original study conducted at McGill University.17 The mean average score by evaluator subgroup was 3.30 for attending physicians, 3.04 for nurses, 3.36 for peers, and 3.29 for junior residents.
The validity of the P-MEX
The contents of the P-MEX achieved good acceptance in the working group held before the study. The postevaluation questionnaires examining the content validity of P-MEX were completed by 318 (84.1%) of the evaluators. Of these, 302 (79.9%) agreed that the contents of the P-MEX were appropriate and valid for evaluating residents' medical professionalism, indicating good content validity of the P-MEX in Japan. The Pearson correlation coefficient between the average P-MEX score and the external criterion was 0.78 (P < .001), suggesting good criterion-related validity of the P-MEX.
Construct validity was analyzed using structural equation modeling, as described above. Standardized coefficients of the model ranged from 0.60 to 0.99, except for one item (Figure 1). The standardized coefficient of “doctor–patient relationship” to “Maintained appropriate boundaries with patients/colleagues (P12)” turned out to be negative (standardized path coefficient = −0.23), which was identical to the result that was obtained in the previous single-center study performed in Japan.19 The CFI was 0.911 and the RMSEA was 0.079, indicating acceptable goodness of fit for this model.
The reliability of the P-MEX
The reproducibility of the average score was estimated using generalizability theory, with the evaluators nested within residents/fellows. Dependability coefficients and standard errors of measurement were computed for completing between 1 and 28 evaluations (Table 2). Although it was demonstrated that 10 to 12 forms would be sufficient to obtain a dependability coefficient of 0.80 in Canada,17 as few as 6 to 8 evaluations for attending physicians and 4 to 6 assessments for nurses were required to obtain an equivalent dependability coefficient in Japan. In contrast, 26 forms were necessary to obtain a comparable dependability coefficient if the evaluators were peers or junior residents. A total of 18 forms were required to guarantee a dependability coefficient of 0.80 in 360-degree assessment methods using the P-MEX. From the standard error of measurement, corresponding 95% confidence intervals of average P-MEX scores of each evaluator categories were computed over incremental numbers of forms (Table 2).
There has been increasing global recognition of the importance of medical professionalism and, in turn, an increasing need for a valid and reliable tool for its assessment.8,10 However, a significant challenge to assessing medical professionalism has been the absence of tools that can be used in different cultural and educational settings.8,10,13–15 Our study has demonstrated evidence of adequate validity, reliability, and generalizability of the assessment of medical professionalism using the P-MEX to evaluate Japanese residents and fellows. To the best of our knowledge, the P-MEX is the only evaluation tool verified in both a Western Anglo-Saxon (Canadian) and East Asian (Japanese) cultural context.
The hospitals participating in this study varied to a great extent. Their locations were diverse; five were community hospitals, and two were university hospitals. This institutional diversity enabled us to verify the generalizability of the P-MEX in a Japanese context.
The results of our confirmatory factor analysis through structural equation modeling were equivalent to those of the original study in Canada, except for one item, “Maintained appropriate boundaries with patients/colleagues.” This item was categorized into two factors, “doctor–patient relationship skills” and “interprofessional relationship skills,”17 which was identical to the result we found in our previous preliminary study.19 A reasonable explanation for this overlap is that this item embraces two objectives: appropriate interactions with “patients” and “colleagues.” Regarding appropriate boundaries, there may be a stronger emphasis on professional conduct in interprofessional relationships than in doctor–patient relationships in the Japanese cultural context, probably due to the presence of significant issues of gender inequality and sexual harassment in the Japanese medical education system.25 According to a multiinstitute survey of Japanese residents, doctors were more often reported to be abusers compared with patients.26
A dependability coefficient of 0.80 indicates good reproducibility based on the preliminary study of the mini-CEX.18 Between 10 and 12 forms were necessary to obtain a dependability coefficient of 0.80 in the Canadian study of the P-MEX.17 In contrast, 6 to 8 evaluations for attending physicians, 4 to 6 assessments for nurses, and 26 forms for peers and junior doctors were required to obtain the equivalent dependability coefficient in our current study. However, using the 95% confidence intervals of average P-MEX scores shown in Table 2, educators may feel comfortable implementing the P-MEX in their institutions with fewer evaluators. For example, if P-MEX forms are completed for one resident by four attending doctors (as evaluators), the 95% confidence intervals of the average P-MEX score would be between 2.90 and 3.70. The educators might consider rewarding the resident whose score was higher than 3.70. A resident whose average P-MEX score was less than 2.90 could be considered for investigation and remediation.
Our findings show that the reproducibility of the P-MEX is relatively low when the evaluators are peers and junior residents. There are several possible explanations for this phenomenon. First, unfortunately, the majority of residents and medical students in Japan are taught little about the fundamental nature of medical professionalism and are often not exposed to organized and consistent standards of professionalism during their training years. Second, evaluating peers or seniors may be uncomfortable for the younger residents, and this might have biased the outcome of the assessments. Third, personal feelings (likes and dislikes) and personal relationships, rather than objective assessments as professional colleagues, may have caused the large variance in ratings. The relatively high P-MEX scores among peer evaluators support this hypothesis. Finally, the P-MEX might be an inherently unsuitable evaluation tool when evaluations are completed by peers and junior residents; a different tool may be more reliable in this specific subgroup. Further research is necessary to define the low reliability of P-MEX among peers and junior doctors and to determine whether alternative methods of assessing professionalism exist.
In summary, our findings from this multicenter, cross-sectional study demonstrate that the P-MEX is a useful tool with overall good reliability, validity, and generalizability for measuring medical professionalism as it is practiced in Japanese settings. The applicability of the P-MEX in Japan implies that although the cultural, social, and historical roots of medical professionalism vary between Western and East Asian cultural contexts, the core of medical professionalism in both cultures shares fundamentally identical concepts. Despite its limitation among peers and junior residents, the P-MEX may serve as a most promising evaluation tool for medical professionalism because the P-MEX is the only tool for which reliability and validity have been verified in two different cultural contexts. Further research is necessary to define the reasons for relatively low reliability of P-MEX among peers and junior doctors and to determine whether alternative methods exist. Revalidation research on P-MEX in other cultural contexts is also needed to demonstrate its universal generalizability.
The authors acknowledge the contributions of Kei Mukohara (National Hospital Organization Nagasaki Medical Center, Nagasaki, Japan), Gautam A. Deshpande (St. Luke's Life Science Institute, Tokyo, Japan), and Kayo Ichikawa (St. Luke's Life Science Institute, Tokyo, Japan). The authors are also grateful for the contributions of Noriaki Hayashida, Sadamu Okada, Naoki Nishimura, Shuzo Nishihara (St. Luke's International Hospital, Tokyo, Japan), Eiji Goto (Yokohama City University Hospital, Kanagawa, Japan), Hajime Kojima, and Nobuyuki Ura (Teine Keijinkai Hospital, Hokkaido, Japan).
St. Luke's Life Science Institute (Tokyo, Japan) funded this study. The funding source had no role in the design and conduct of the study, in the reporting and review of the manuscript, or in the decision to submit the manuscript for publication.
This study was approved by St. Luke's International Hospital, The Research Ethics Committee (St. Luke's International Hospital, Tokyo); Tokyo Medical Center Ethics Committee (National Hospital Organization Tokyo Medical Center, Tokyo); Ethics Committee of Tsukuba University Mito Medical Center (Mito Kyodo General Hospital, Ibaraki); Kyoto University Graduate School and Faculty of Medicine, Ethics Committee (Kyoto University Hospital, Kyoto); Yokohama City University Institutional Review Board (Yokohama City University Hospital, Kanagawa); Nagasaki Medical Center Ethics Review Board (National Hospital Organization Nagasaki Medical Center, Nagasaki); and Teine Keijinkai Hospital Clinical Residency Committee (Teine Keijinkai Hospital, Hokkaido).
Presentations about this study were made at the 25th International Ottawa Conference (Florida, 2010) and the 42nd Annual Meeting of the Japan Society for Medical Education (Tokyo, Japan, 2010).
1 Reynolds PP. Reaffirming professionalism through the education community. Ann Intern Med. 1994;120:609–614.
2 Blumenthal D. The vital role of professionalism in health care reform. Health Aff (Millwood). Spring 1994;13:252–256.
3 Cruess SR, Cruess RL. Professionalism must be taught. BMJ. 1997;315:1674–1677.
4 Wynia MK, Latham SR, Kao AC, Berg JW, Emanuel LL. Medical professionalism in society. N Engl J Med. 1999;341:1612–1616.
5 Medical professionalism in the new millennium: A physician charter. Ann Intern Med. 2002;136:243–246.
6 Medical professionalism in the new millennium: A physicians' charter. Lancet. 2002;359:520–522.
7 Blank L, Kimball H, McDonald W, Merino J. Medical professionalism in the new millennium: A physician charter 15 months later. Ann Intern Med. 2003;138:839–841.
8 Hodges B, Ginsburg S, Cruess RL, et al. Assessment of professionalism: Recommendations from the Ottawa 2010 conference. Med Teach. 2011;33:354–363.
9 Stern DT, Papadakis M. The developing physician—Becoming a professional. N Engl J Med. 2006;355:1794–1799.
10 Epstein RM. Assessment in medical education. N Engl J Med. 2007;356:387–396.
11 Cruess RL, Cruess SR, Steinert Y. Teaching Medical Professionalism. New York: Cambridge University Press; 2009.
12 Sullivan C, Arnold L. Assessment and remediation in programs of teaching professionalism. In: Cruess RL, Cruess SR, Steinert Y, eds. Teaching Medical Professionalism. New York: Cambridge University Press; 2009:124–150.
13 Stern DT, ed. Measuring Medical Professionalism. New York, NY: Oxford University Press; 2006.
18 Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): A preliminary investigation. Ann Intern Med. 1995;123:795–799.
19 Tsugawa Y, Tokuda Y, Ohbu S, et al. Professionalism Mini-Evaluation Exercise for medical residents in Japan: A pilot study. Med Educ. 2009;43:968–978.
20 Kaplan DW. Structural Equation Modeling: Foundations and Extensions. Thousand Oaks, Calif: Sage Publications; 2008.
21 Hoyle RH, Smith GT. Formulating clinical research hypotheses as structural equation models: A conceptual overview. J Consult Clin Psychol. 1994;62:429–440.
22 Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107:238–246.
23 Steiger JH. When constraints interact: A caution about reference variables, identification constraints, and scale dependencies in structural equation modeling. Psychol Methods. 2002;7:210–227.
24 Shavelson RJ, Webb NM. Generalizability Theory: A Primer. Thousand Oaks, Calif: Sage Publications; 1991.
25 Nagata-Kobayashi S, Sekimoto M, Koyama H, et al. Medical student abuse during clinical clerkships in Japan. J Gen Intern Med. 2006;21:212–218.
26 Nagata-Kobayashi S, Maeno T, Yoshizu M, et al. Universal problems during residency: Abuse and harassment. Med Educ. 2009;43:628–636.