Letters to the Editor
In Reply to Prakash:
Professor Prakash makes some very good points regarding guidelines for writing test items. Item stems, in the form of a question, that call for most, least, best, worst, and other superlatives are very effective in measuring more complex aspects of clinical problem solving. All options may be correct in some sense, but one clearly stands out as the most, least, best, or worse.
Two important criteria must be met to ensure that a guideline is valid for item-writing and for testing programs. Both criteria must be met to ensure that a guideline is universally endorsed for testing programs.
First, a group of subject-matter experts should convene to review each item and agree on the correct answer. They should also agree that the distractors for each item are plausible but would clearly be judged as wrong by low-performing test takers.
Second, when each item is field tested, it should perform as intended. Those of us who create tests prefer moderately difficult items with good discrimination among high and low performers. When item-writing guidelines are proposed, we generally like to know if some research informs us about the validity of the guideline.
With respect to Professor Prakash’s proposal, the community of researchers who study and write about item-writing would like to see some accumulated research that better informs us about which guidelines are most likely to produce the most valid items. Ironically, not following the guidelines may produce some highly effective items, but in the long run, as research by Downing1,2 and others3 shows, violating these guidelines results in a higher percentage of poorly performing items.
Tom Haladyna, PhD
Professor emeritus, Arizona State University, Phoenix, Arizona; email@example.com.
1. Downing SM. Construct-irrelevant variance and flawed test questions: Do multiple-choice item-writing principles make any difference? Acad Med. 2002;77(10 suppl):S103–S104
2. Downing SM. The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract. 2005;10:133–143
3. Haladyna TM Developing and Validating Multiple-Choice Test Items. 20043rd ed. Mahwah, NJ: Lawrence Erlbaum Associates