PRO-Bookmarking is an alternative to traditional methods for deriving cut scores and applying qualitative modifiers to score ranges.
In PRO-Bookmarking, a working group of stakeholders identifies ranges of scores they judge to credibly define different levels of a patient-reported outcome (PRO). Subsets of items and responses, called “clinical vignettes,” are woven into a narrative to represent different levels of the PRO. Working individually, stakeholders bookmark thresholds between clinical vignettes, ordered by PRO level, to define thresholds (eg, no problems, mild problems). Discussion of individual bookmark placements is led by a moderator with the goal of consensus with regard to bookmark locations.
The value of PRO measures depends on the extent to which different stakeholders are able to interpret scores. The PRO-Bookmarking method provides credible evidence on the common-language meaning of different ranges of scores. This evidence supplements other interpretative methods such as normative comparisons and comparisons with an external standard. PRO-Bookmarking is particularly valuable when, as is often the case with PRO measures, there is no clear external standard or even a useful external reference with which to compare PRO scores.
The PRO-Bookmarking procedure is a qualitative method that engages key stakeholders in in-depth consideration of the semantic meaning of ranges of PRO scores. Measures based on item banks calibrated using item response theory are ideal for PRO-Bookmarking. Response probabilities conditioned on different levels of the PRO are derived directly from the item response theory model, and item banks contain more items than traditional measures. Having a large number of items provides flexibility in the choice and variety of items that can be used to comprise the clinical vignettes. There is much to learn about Bookmarking in the PRO context and, more generally, about all methods for establishing PRO score thresholds. Issues for further study include the role of context of use for classifications, selection of semantic labels for levels of a PRO, and the extent to which findings generalize to clinical utility.