Skip Navigation LinksHome > May 2007 - Volume 45 - Issue 5 > Psychometric Evaluation and Calibration of Health-Related Qu...
Medical Care:
doi: 10.1097/01.mlr.0000250483.85507.04
Original Article

Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)

Reeve, Bryce B.*; Hays, Ron D.†; Bjorner, Jakob B.‡; Cook, Karon F.§; Crane, Paul K.∥; Teresi, Jeanne A.¶; Thissen, David∥; Revicki, Dennis A.**; Weiss, David J.††; Hambleton, Ronald K.‡‡; Liu, Honghu††; Gershon, Richard§§; Reise, Steven P.††; Lai, Jin-shei§§; Cella, David§§; on behalf of the PROMIS Cooperative Group

Free Access
Article Outline
Collapse Box

Author Information

From the *National Cancer Institute, NIH, Bethesda, Maryland; †UCLA Division of General Internal Medicine & Health Services Research, Los Angeles, California; ‡QualityMetric Inc., Lincoln, Rhode Island, and the Health Assessment Lab, Waltham, Massachusetts; §University of Washington, Seattle; ¶Columbia University Stroud Center and Faculty of Medicine; New York State Psychiatric Institute, and Research Division, Hebrew Home for the Aged in Riverdale, New York, New York; ∥Psychology Department, University of North Carolina at Chapel Hill; **Center for Health Outcomes Research, United BioSource Corporation, Bethesda, Maryland; ††Psychology Department, University of Minnesota, Minneapolis; ‡‡Center for Educational Assessment, University of Massachusetts at Amherst; and §§Northwestern University Feinberg School of Medicine and Evanston Northwestern Healthcare, Evanston, Illinois.

Preparation of this work by non-NIH employees was supported by the National Institutes of Health through the NIH Roadmap for Medical Research Grant (AG015815), PROMIS Project.

Reprints: Bryce B. Reeve, PhD, Outcomes Research Branch, National Cancer Institute, NIH, EPN 4005, 6130 Executive Blvd. MSC 7344, Bethesda, MD. 20892-7344. E-mail: reeveb@mail.nih.gov.

Collapse Box

Abstract

Background: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project.

Objectives: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks.

Analyses: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking.

Recommendations: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment.

The Patient-Reported Outcomes Measurement Information System (PROMIS) project provides a unique opportunity to use advanced psychometric methods to construct, analyze and refine item banks, from which improved patient-reported outcome (PRO) instruments can be developed.1,2 PRO measures include instruments that measure domains like health-related quality of life (HRQOL) and satisfaction with medical care. Presented in this report are the methodological considerations for analyzing both existing data from a number of sources and new data to be collected by the PROMIS. These methods and approaches were adopted by the PROMIS network. The PROMIS project will produce item banks that will be used for both computerized-adaptive testing (CAT)3 and nonadaptive (ie, fixed length) assessment of HRQOL domains including pain, fatigue, emotional distress, physical functioning, and social-role participation as the initial focus.

In the beginning, PROMIS investigators identified available datasets containing more than 50,000 respondents (n >1000 per dataset) and multi-item PRO responses in cancer, heart disease, HIV disease, diabetes, gastrointestinal disorders, hepatitis C, mental health, and other chronic health conditions. Results from analyses of these datasets were used to refine the proposed methods and offer candidate item banks before the development of the PROMIS item banks. In particular, secondary data analyses allowed PROMIS investigators to examine the dimensionality of domains; identify candidate items that represent the domains of interest; and evaluate the optimal number of response categories to field in the PROMIS data collection phase. The secondary analyses also allowed PROMIS researchers to anticipate psychometric challenges in developing the PROMIS item banks. For example, analyses suggested substantial floor and/or ceiling effects for many domains, which underscored the importance of identifying items that discriminated well at very low and at very high levels of the traits being measured. The same psychometric considerations apply to analysis of newly collected PROMIS data: confirm assumptions about dimensionality of the data; examine item properties; test for differential item functioning (DIF)4 across sociodemographic or clinical groups; and calibrate the items for CAT and short forms.

Because researchers recognize the many challenges in analyzing HRQOL data, the plan provides flexibility with respect to the methods used to explore psychometric properties. Some methods were identified as primary and others as exploratory. The results obtained using exploratory methods will be evaluated based on whether they add substantively to the results obtained using the primary methods. Examples of applying the methods discussed in this analysis plan can be found in articles included in this supplement.5,6 Further, the psychometrics field is evolving for measuring PROs, both in terms of methods development (eg, recent advances in the bi-factor model and the full information factor analysis for polytomous response items discussed later in this article) and application (eg, advances in electronic-PRO assessment). As the state of the measurement field changes, the PROMIS network will adapt their analytic plans.

Back to Top | Article Outline

PROMIS DATA COLLECTION AND SAMPLING PLAN

From July 2006 to March 2007, the PROMIS research sites collected data from the US general population (∼n = 7523) and multiple disease populations including those with cancer (∼n = 1000), heart disease (∼n = 500), rheumatoid arthritis (∼n = 500), osteoarthritis (∼n = 500), psychiatric conditions (∼n = 500), spinal cord injury (∼n = 500), and chronic obstructive pulmonary disease (∼n = 500). The general population sample will be constructed to ensure adequate representation with respect to key demographic characteristics such as gender (50% each), age (20% of each age group in years: 18–29, 30–44, 45–59, 60–74, 75+), ethnicity (12% black, 12% Hispanic), and education (25% with high school education or less). A health condition checklist will also be included in the assessment. Beyond demographic and clinical characteristics, PRO data in the areas of pain, fatigue, emotional distress, physical functioning, and social-role participation will be collected for inclusion in item banks developed within the PROMIS network. All candidate items for the PROMIS item banks have been thoroughly examined using qualitative methods such as cognitive testing and expert item review panels.7 The first wave of data are being collected via a computer or laptop linked to a web-based questionnaire.

A detailed data sampling plan was developed for collecting initial item responses to the candidate items from the targeted PROMIS domains. This sampling plan was designed to accommodate best a number of purposes: (1) create item calibrations for all of the items in each of the subdomains; (2) estimate profile scores for various disease populations; (3) create linking metrics to legacy questionnaires (eg, SF-36); (4) confirm the factor structure of the primary and subdomains; and (5) conduct item and bank analyses. However, because of the large total number of items (>1000), it is not possible for participants to respond to the entire set of items in each pool. Based on an estimate of 4 questions per minute, the length of the PROMIS questionnaires in the first wave of testing is limited to approximately 150 items which are expected to take about 40 minutes to answer. Two data collection designs (“full bank” and “block administration”) will be implemented during wave 1 to address the 5 purposes listed previously.

“Full bank” testing will be conducted using the general population sample (n = 3507). Each respondent will answer all of the items in 2 of the primary item banks, for example, depression and anxiety or fatigue impact and fatigue experience. Data collected from full bank testing will be analyzed to confirm the factor structure of the PROMIS domains, test for DIF, and to perform CAT simulations.

For “blocked administration” in both the general population sample and the samples of individuals with chronic diseases, a balanced incomplete blocked design will be used in which a subset of items from each item pool is administered to every person.8,9 Item blocks will be designed to allow simultaneous item response theory (IRT)-based estimation of item parameters, and of population mean differences and standard deviation ratios.

Back to Top | Article Outline

ANALYTIC METHODS

Advanced psychometric methods will be used throughout the instrument development process to inform our understanding of the latent constructs, particularly with respect to the populations studied, and to develop adaptive and nonadaptive instruments with appropriate psychometric properties for implementation in a range of research applications.

This process, outlined in Table 1, will include the analysis of item and scale properties using both traditional (ie, classic) and modern (ie, IRT) psychometric methods. Factor analysis will be used to examine the underlying structures of the measured constructs and to evaluate the assumptions of the IRT model. DIF testing will evaluate whether items perform differently across key demographic or disease groups when controlling for the underlying level of the trait assessed by the scale. Finally, items will be calibrated to an IRT model and used in CAT. The plan builds on previous PRO item bank development work by different research groups10–12; however, the scale of the PROMIS project is a more extensive testing strategy than performed previously. The steps noted in Table 1 are presented sequentially, but often many steps can be carried out in parallel and results from later steps may suggest returning to earlier steps to re-evaluate findings based on different interpretations or methods. Herein, we describe each of these methods, review available analytic options and, when evidence supports it, suggest preferred methods and criteria. Decisions about model selection, fit, and, or satisfaction of assumptions will not be based solely on statistical criteria, but will incorporate expert judgment from both psychometric and content experts who will review the evidence to make interpretations and to determine the next steps.

Table 1
Table 1
Image Tools
Back to Top | Article Outline

DESCRIPTIVE STATISTICS

A variety of descriptive statistics will be used, including measures of central tendency (mean, median), spread (standard deviation, range), skewness and kurtosis, and response category frequencies. Patterns and frequency of missing data will be examined to identify the likelihood of systematic or random patterns. For example, if missing data were more prevalent later in the sequence of administered items, this would suggest that the cause may be response burden or lack of time for completing the questionnaire. The content of items that draw substantial missing responses will be examined by content experts to evaluate whether missing responses may be due to sensitive item content.

Several basic classic test theory statistics will be estimated to provide descriptive information about the performance of the item set. These include inter-item correlations, item-scale correlations, and internal consistency reliability. Cronbach’s coefficient alpha13 will be used to examine internal consistency with 0.70 to 0.80 as an accepted minimum for group level measurement and 0.90 to 0.95 as an accepted minimum for individual level measurement. Internal consistency estimates are based on the assumption that the item set is homogeneous; because high internal consistency can be achieved with multidimensional data, this statistic does not provide sufficient evidence of unidimensionality.

Back to Top | Article Outline

EVALUATE ASSUMPTIONS OF THE IRT MODEL

Before applying IRT models, it is important to evaluate the core assumptions of the model: unidimensionality, local independence, and monotonicity. To follow are the described methods for testing these assumptions. The order in which assumptions are tested can vary.

Back to Top | Article Outline
Unidimensionality

One critical assumption of IRT models is that a person’s response to an item that measures a construct is accounted for by his/her level (amount) on that trait, and not by other factors. For example, a highly depressed person is more likely to endorse “true” for the statement “I don’t care what happens to me” than a person with low depression. The assumption is that a person’s depression level is the main factor that gives rise to his/her response to the item. No item set will ever perfectly meet strictly defined unidimensionality assumptions.14 Thus, one wants to assess whether scales are “essentially” or “sufficiently” unidimensional15 to permit unbiased scaling of individuals on a common latent trait. One important criterion is the robustness of item parameter estimates, which can be examined by removing items that may represent a significant dimension. If the item parameters (in particular the item discrimination parameters or factor loadings) significantly change, then this may indicate insufficient unidimensionality.16,17 A number of researchers have recommended methods and considerations for evaluating essential unidimensionality as reviewed below.14,15,18–20

Back to Top | Article Outline
Factor Analytic Methods to Assess Unidimensionality

Confirmatory factor analysis (CFA) will be performed to evaluate the extent that the item pool measures a dominant trait that is consistent with the content experts’ definition of the domain. CFA was selected over an exploratory analysis as the first step because each potential pool of items was carefully selected by experts to represent a dominant PRO construct through an exhaustive literature review and feedback from patients through focus groups and cognitive testing.7 Because of the ordinal nature of the PRO data, appropriate software (eg, MPLUS21 or LISREL22) is required to evaluate polychoric correlations using an appropriate estimator (eg, the weighted least squares with adjustments for the mean and variance (WLSMV23 in MPLUS21) estimator or the diagonally weighted least squares (DWLS in LISREL22) estimator) for factor analysis.

CFA model fit will be assessed by examining multiple indices. Noting that statistical criteria like the χ2 statistic are sensitive to sample size, a range of practical fit indices will be examined such as the comparative fit index (CFI >0.95 for good fit), root mean square error of approximation (RMSEA <0.06 for good fit), Tucker-Lewis Index (TLI >0.95 for good fit), standardized root mean residuals (SRMR <0.08 for good fit), and average absolute residual correlations (<0.10 for good fit).15,24–28 If the CFA shows poor fit, then we will conduct an exploratory factor analysis and examine the magnitude of eigenvalues for the larger factors (at least 20% of the variability on the first factor is especially desirable), differences in the magnitude of eigenvalues between factors (a ratio in excess of 4 is supportive of the unidimensionality assumption), scree test, parallel analysis, correlations among factors, and factor loadings to determine the underlying structural patterns.

An alternate method to determine whether the items are “sufficiently” unidimensional is McDonald’s bifactor model15 (see also Gibbons29,30). McDonald’s approach to assessing unidimensionality (which he terms “homogeneity”) is to assign each item to a specific subdomain based on theoretical considerations. A model is then fit with each item loading on a common factor and on a specific subdomain (group factor). The common factor is defined by all the items, whereas each subdomain is defined by a subset of items in the pool. The factors are constrained to be mutually uncorrelated so that all covariance is partitioned either into loadings on the common factor or onto the subdomain factors. If the standardized loadings on the common factor are all salient (defined as >0.30) and substantially larger than loadings on the group factors, the item pool is thought to be “sufficiently homogeneous.”15 Furthermore, one can compare individual scores under a bifactor and unidimensional model. If scores are highly correlated (eg, r >0.90), this is further evidence that the effects of multidimensionality is ignorable.31

To illustrate the active evolution of psychometric procedures applicable to the analysis of PROs, during the writing of this article an implementation of full information (exploratory) factor analysis for polytomous item responses became available in version 8.8 of the computer software LISREL22,32 In addition, Edwards33 has illustrated the use of a Markov chain Monte Carlo (MCMC) algorithm for CFA of polytomous item responses such as those obtained in measurement of PROs. It is likely that those procedures and others that may become available soon, will also be useful in the examination of dimensionality of the PROMIS scales.

Back to Top | Article Outline
Local Independence

Local independence assumes that once the dominant factor influencing a person’s response to an item is controlled, there should be no significant association among item responses.34–36 The existence of local dependencies that influence IRT parameter estimates poses a problem for scale construction or CAT implementation. Further, scoring respondents based on miss-specified models will result in inaccurate estimates of their level on the underlying trait. In other words, uncontrolled local dependence (LD) among items in a CAT assessment could result in a score different from the HRQOL construct being measured.

Identification of LD among polytomous response items includes examining the residual correlation matrix produced by the single factor CFA. High residual correlations (greater than 0.2) will be flagged and considered as possible LD. In addition, IRT-based tests of LD will be used; among them are Yen’s Q3 statistic37 and Chen and Thissen’s LD indices.38 These statistics are based on a process that involves fitting a unidimensional IRT model to the data, and then examining the residual covariation between pairs of items, which should be zero if the unidimensional model fits. For example, Steinberg and Thissen34 described the use of Chen and Thissen’s G2 LD index to identify locally dependent items among 16 dichotomous items on a scale measuring history of violent activity.

The modification indices (MIs) of structural equation modeling (SEM) software may also serve as statistics to detect LD. When inter-item polychoric correlations are fitted with a one-factor model, the result is a limited information parameter estimation scheme for the graded normal ogive model. The MIs for such a model are 1 degree of freedom χ2 scaled statistics that suggest un-modeled excess covariation between items, which in the context of item factor analysis is indicative of LD. Hill, Edwards, Thissen, et al describe the use of MIs to detect LD in the PedsQL™ Social Functioning Scale and other examples.6

Items that are flagged as LD will be examined to evaluate their effect on IRT parameter estimates. One test is to remove one of the items with LD, and to examine changes in IRT model parameter estimates and in factor loadings for all other items.

One solution to control the influence of LD on item and person parameter estimates is omitting one of the items with LD. If this is not feasible because both items provide a substantial amount of information, then LD items can be marked as “enemies,” preventing them from both being administered in a single assessment to any individual. Further, the LD must be controlled in the calibration step to remove the influence of the highly correlated items. In all cases, the LD items should be evaluated to understand the source of the dependency. LD may exist for nonsubstantive reasons such as structural similarity in wording or content when the wording of 2 or more item stems are so similar that the respondent can’t differentiate what the questions are asking. Thus, they will mark the same response for both items.

Back to Top | Article Outline
Monotonicity

The assumption of monotonicity means that the probability of endorsing or selecting an item response indicative of better health status should increase as the underlying level of health increases. This is a basic requirement for IRT models for items with ordered response categories. Approaches for studying monotonicity include examining graphs of item mean scores conditional on “rest-scores” (ie, total raw scale score minus the item score) or fitting a nonparametric IRT model39 to the data that yields initial IRT probability curve estimates, using programs such as Mokken scale analysis for polytomous items (MSP40) software. A nonparametric IRT model fits trace lines for each response to an item without any a priori specification of the order of the responses. The data analyst may then examine those fitted trace lines to determine which response alternatives are (empirically) associated with lower levels of the domain and which are associated with higher levels. The shapes of the trace lines may also indicate other departures from monotonicity, such as bimodality, if they exist. Although nonparametric IRT may not be the most (statistically) efficient way to produce the final item analysis and scores for a scale, it can be very informative about the tenability of the assumptions of parametric IRT.

Back to Top | Article Outline

FIT ITEM RESPONSE THEORY MODEL TO DATA

Once the assumptions have been confirmed, IRT models are fit to the data both for item and scale analysis and for item calibration to set the stage for CAT. IRT refers to a family of models that describe, in probabilistic terms, the relationship between a person’s response to a survey question and his or her standing (level) on the PRO latent construct (eg, pain) that the scale measures.41,42 For every item in a scale, a set of properties (item parameters) are estimated. The item slope or discrimination parameter describes how well the item performs in the scale in terms of the strength of the relationship between the item and the scale. The item difficulty or threshold parameter(s) identifies the location along the construct’s latent continuum where the item best discriminates among individuals. This information can be used to evaluate properties of the items in the scale or used by the CAT algorithm to select items that are appropriately matched to the respondent’s estimated level on the measured trait, based on their responses to previously administered items.

Although there are well more than 100 varieties of IRT models41–43 to handle various data characteristics such as dichotomous and polytomous response data, ordinal and nominal data, and unidimensional and multidimensional data, only a handful have been used in item analysis and scoring. In initial analyses of existing data sets, the PROMIS network evaluated both a general IRT model, Samejima’s Graded Response Model44,45(GRM), and 2 models based on the Rasch model framework, the Partial Credit Model46 and the Rating Scale Model.47,48 On the basis of these analyses, the PROMIS network decided to focus on the GRM in future item bank development work.

The GRM is a very flexible model from the parametric, unidimensional, polytomous-response IRT family of models. Because it allows discrimination to vary item by item, it typically fits response data better than a one-parameter (ie, Rasch) model.43,49 Further, compared with alternative 2-parameter models such as the generalized partial credit model, the model is relatively easy to understand and illustrate to “consumers” and retains its functional form when response categories are merged. Thus, the GRM offers a flexible framework for modeling the participant responses to examine item and scale properties, to calibrate the items of the item bank, and to score individual response patterns in the PRO assessment. However, the PROMIS network will examine further the fit and added value of alternate IRT models using PROMIS data.

The unidimensional GRM is a generalization of the IRT 2-parameter logistic model for dichotomous response data. The GRM is based on the logistic function that describes, given the level of the trait being measured, the probability that an item response will be observed in category k or higher. For ordered responses X = k, k = 1,2,3,…, mi, where response m reflects the highest θ value, this probability is defined44,45,50 as:

Equation (Uncited)
Equation (Uncited)
Image Tools

This function models the probability of observing each category as a function of the underlying construct. The subscript on m above indicates that the number of response categories does not need to be equal across items. The discrimination (slope) parameter ai varies by item i in a scale. The threshold parameters bik varies within an item with the constraint bk − 1 <bk <bk+1, and represents the point on the θ axis at which the probability passes 50% that the response is in category k or higher.

Figure 1 presents the category response curves (CRCs) for a 4-response category item with IRT GRM parameters: a = 2.26, b1 = −1.00, b2 = 0.00, and b3 = 1.50. Each curve (one for each response category) represents the probability of a respondent selecting category k, given his/her level (θ) on the underlying construct. If a person’s estimated θ is less than −1.00, then he/she is more likely to endorse the first response category. Likewise, if a person’s estimated θ is between −1.00 and 0.00, then he/she is more likely to endorse the second category. A person with estimated θ above 1.50 will have the greatest likelihood of endorsing the fourth response category. In a calibration of the GRM to the item responses, category response curves such as those shown in Figure 1 are estimated for every item.

Figure 1
Figure 1
Image Tools

Once these response curves are estimated on a group of respondents from the first wave of PROMIS data collection, the curves are then used to estimate the θ levels of new respondents to the PROMIS questionnaires. For example, if a person selects response 3 for the item in Figure 1, it is likely their θ level is between 0.0 and 1.5. Using this kind of information for additional items, a person’s θ level is estimated by identifying which response they chose for each administered item. Thus, a person’s level on the trait (θ) and an associated standard error are estimated, using maximum likelihood or Bayesian estimation methods, based on the complete pattern of responses given by each person in conjunction with the probability functions associated with each item response.

IRT model fit will be assessed using a number of indices, recognizing that universally accepted fit statistics do not exist. Also note that if model assumptions are supported by the data, then strict adherence to model fit statistics is not vital, given the limits of acceptable fit indices. Residuals between observed and expected response frequencies by item response category will be compared based on analyses of the size of the differences (residuals). Common fit statistics such as Q1, Bock’s χ2, and others43,51 will be examined; also considered will be generalizations of Orlando and Thissen’s SX2 to polytomous data.52,53 The ultimate issue is to what degree misfit affects model performance in terms of the valid scaling of individual differences.54

Once analysts are satisfied with the fit of the IRT model to the response data, attention is shifted to analyzing the item and scale properties of the PROMIS domains. The psychometric properties of the items will be examined by review of their item parameter estimates, CRCs, and item information curves.55,56 Information curves indicate the range of θ where an item is best at discriminating among individuals. Higher information denotes more precision for measuring a person’s trait level. The height of the curves (denoting more information) is a function of the discrimination power (a parameter) of the item. The location of the information curves is determined by the threshold (b) parameter(s) of the item. Information curves indicate which items are most useful for measuring different levels of the measured construct. This is critical for the item selection process in CAT and in the development of short-forms.

Poorly performing items will be reviewed by content experts before the item bank is established. Misfitting items may be retained or revised when they are identified as clinically relevant and no better-fitting alternative is available. Low discriminating items in the tails of the theta distribution (at low or at high levels of the trait being measured) also may be retained or revised to add information for extreme scores where they would not have been retained in better-populated regions of the continuum. It is at the extremes of the trait continuum that CAT is most effective, but only if items exist that provide good measurement along these portions of the continuum.

Future research by PROMIS will examine the added value of more complex models including multidimensional IRT models.57–63 The attraction of these methods is reduced respondent burden and a more realistic model for the underlying measurement model. Multidimensional models take advantage of the correlations among subdomains to inform the measurement of the target constructs, thus precise theta estimates are obtained with fewer items. However, it should be noted that multidimensional IRT has all the rotation problems and complexity that factor analysis does; it greatly complicates DIF analyses, and the meaning of scores is often unclear when subscales are highly correlated. In addition, essentially unidimensional constructs are more often desirable from a theoretical and clinical perspective.

Back to Top | Article Outline

EVALUATION OF DIFFERENTIAL ITEM FUNCTIONING

According to the IRT model, an item displays differential item functioning (DIF) if the probabilities of responding in different categories vary across studied groups, given equivalent levels of the underlying attribute.4,41,43,64 In other words, DIF exists when, for example, women at moderate levels of emotional distress are more likely to report crying than are men at the same moderate level of distress. One reason that instruments containing items with DIF may have reduced validity for between-group comparisons is because their scores indicate attributes other than the one the scale is intended to measure.64 The impact of DIF on CAT may be greater than in fixed-length assessments because only a small item set is administered.

In the context of PROMIS, DIF may occur across groups of different races, gender, age groups, or disease conditions. The question of whether or not DIF should be tested with respect to a specific disease category is one that should be considered by content experts. Roussos and Stout18 recommended a first step in DIF analyses that includes substantive (qualitative) reviews in which DIF hypotheses are generated, and it is decided whether or not unintended “adverse” DIF is present as a secondary factor. Because this process is largely based on judgment, there may be some error at this step. Substantive reviewers may use 4 sources to inform the DIF hypotheses: previously published DIF analyses; substantive content considerations and judgment regarding current items; review of archival data—review of contexts present in other similar data; using archival or pretest data for testing bundles of items according to some organizing principle. The stage 2 statistical analyses are comprised of confirmatory tests of DIF hypotheses. This type of procedure can be extended to health-related quality of life measures through use of qualitative methods proposed in the PROMIS effort, including the use of expert review, focus groups, cognitive interviews and the generation of possible hypotheses regarding subgroups for which DIF might be observed.

IRT provides a useful framework for identifying items with DIF. The category response curves of an item calibrated based on the responses of 2 different groups can be displayed simultaneously. If the model fits, IRT item parameters (ie, threshold and discrimination parameters) are assumed to be linearly invariant with respect to group membership. Therefore, differences between the CRCs, after linking the θ metric between each group, indicate that respondents at the same level of the underlying trait, but from different groups, have different probabilities of endorsing the item. DIF can occur in the threshold or discrimination parameter. Uniform DIF refers to DIF in the threshold parameter of the model, which indicates that the focal and reference groups have uniformly different response probabilities for the tested item. Nonuniform DIF appears in the discrimination parameter and suggests interaction between the underlying measured variable and group membership; that is, the degree to which an item relates to the underlying construct depends on the group being measured.41,64,65

Determination of DIF is optimized when the samples are as representative as possible of the populations from which they are drawn. Most DIF procedures rely on the identification of a core set of anchor items that are thought to be free of DIF and are used to link the 2 groups on a common scale. DIF detection methods use scores based on these items to control for underlying differences between the comparison groups while testing for DIF in the item under scrutiny. There are numerous approaches to assessing DIF.66 Herein, we describe the DIF methods being considered by the PROMIS analytic team. It is prudent to evaluate DIF using multiple methods and flag those items identified consistently.

There are basically 2 IRT-based methods that will be used to identify DIF; these are the log-likelihood IRT approach accompanied by byproducts of differential functioning of items and tests (DFIT) to examine DIF magnitude, and the IRT/ordinal logistic regression (OLR) approach with built-in tests of magnitude. The approach recommended is that used by PROMIS investigators, in which significant DIF is first identified using either likelihood ratio (LR)-based significance tests (IRT-LR), or significance tests and changes in beta coefficients (IRT/OLR). The IRT-LR approach also incorporates a correction for multiple comparisons. Finally, both approaches examine magnitude of DIF in the determination of the final items that are flagged. If either method flags an item as having DIF according to these rules, the item will be considered as having DIF. Details regarding the steps in the analyses can be found elsewhere.67–69

The IRT-LR test64 will be used to identify both uniform and nonuniform DIF. The procedure compares hierarchically nested IRT models; with 1 model that fully constrains the IRT parameters to be equal between the 2 comparison groups and other models that allow the item parameters to be freely estimated between groups. One key difference between the IRT-LR method and many other DIF methods is how differences between comparison groups are estimated from the anchor items. Other DIF methods use the simple summed score of the anchor set, but the IRT-LR procedure estimates a person’s theta score based on his/her responses to the anchor set. This approach is similar to that used in CAT. Thus, IRT-LR procedures make an easy transition to the detection of DIF for data collected in a CAT environment.70

Used in conjunction with IRT-LR, and based on IRT, are Raju’s signed and unsigned area tests, combined with the Differential Functioning of Items and Tests (DFIT) framework.71 This framework includes a noncompensatory DIF (NCDIF) index which reflects the average squared difference between the item-level scores for the focal and reference groups. Several magnitude measures are available in the context of area statistics and the DFIT methodology developed by Raju and colleagues.71–73 For binary items, the exact area methods compare the areas between the item response functions estimated in 2 different groups; Cohen et al74 extended these area statistics for the graded response model.

The second DIF method is ordinal logistic regression (OLR)75 in which a series of 3 logistic models predicting the probability of item response are compared. The independent variables in Model 1 are the trait estimate (eg, raw scale score or theta estimate), group, and the interaction between group and trait. Model 2 includes only the main effects of trait and group, and Model 3 includes only the trait estimate. Nonuniform DIF is detected if there is a statistically significant difference between the likelihood values for Model 1 and Model 2. Uniform DIF is evident if there is a significant difference between the likelihood values for Models 2 and 3. Crane et al76 suggested that, in addition to statistical significance, the relative change in beta coefficients between Model 2 and 3 should be considered. On the basis of simulations by Maldonado and Greenland,77 a 10% change in beta has been recommended as a criterion for uniform DIF.

PROMIS also will evaluate items for DIF using a hybrid approach that combines the strengths of OLR and IRT.68,78 This iterative approach uses IRT theta estimates in OLR models to determine whether items have uniform or nonuniform DIF. To account for spurious DIF (false-positive or false-negative DIF found due to DIF in other items), demographic-specific item parameters are estimated for items found on the initial run to have DIF; items free of DIF serve as anchor items. DIF detection is repeated using these updated IRT estimates, and these procedures (DIF detection and IRT estimation) are repeated until the same items are identified on successive runs.

Advantages of the techniques reviewed above include the rapid empirical identification of anchor items and the determination of the presence and magnitude of DIF. Another advantage is the possibility of using demographic-specific item parameters in a CAT context if that is considered a viable option.

The multiple-indicator, multiple cause models (MIMIC) offer an attractive framework for examining DIF in the context of evaluation of the impact.79 Based on a modification of structural equation modeling, the single group MIMIC model permits examination of the direct effect of background variables on items, while controlling for the level of the attribute studied.80 The MIMIC model also allows background variables like demographic characteristics to be used as covariates to account for differences among the comparison populations when examining DIF. Although the MIMIC model does not permit tests of nonuniform DIF, an advantage is that impact can be examined by comparing the estimated group effects in models with, and without, adjustment for DIF.81

There are several options for treating items with DIF. One extreme option is to eliminate the item from the bank. If the analyses suggest that there are large numbers of items without consequential DIF, this option will be considered. On the other hand, if many items have DIF, especially in key areas of the trait continuum that are sparsely populated by items, or if content experts determine that the items with DIF are central to the meaning of the construct, other options are to ignore DIF if it is small, to revise items to be free of DIF, to tag items that should not be administered to specific groups, or to control for DIF by using demographic-specific item parameters.

Back to Top | Article Outline

ITEM CALIBRATION FOR BANKING

After a comprehensive review of the item properties, including evaluation of DIF across key demographic and clinically different groups, the final selected item set will be calibrated using the GRM and CAT algorithms developed. One set of IRT item parameters will be established for all items unless DIF evidence suggests that some items should have different calibrations based on key groups to be measured by the PROMIS system. The item pools for each unidimensional PROMIS domain will include a large set of items with most pools containing more than 50 items.

To identify the metric for the PROMIS item parameter estimates, the scale for person parameters must be fixed in some manner—typically by specifying that the mean in the reference population is 0 and the standard deviation is 1. The PROMIS network has selected the reference population to be the US general population. This will allow interpretation of difficulty (threshold) parameter(s) relative to the general US population mean and the discrimination parameters relative to the population standard deviation. Calibrated in this manner, in the dichotomous response case, an item with a difficulty parameter estimate of b = 1.5 suggests that a person who is 1.5 standard deviations above the mean will have a 50% probability of endorsing the item. Population mean differences and standard deviation ratios will be computed for each disease population tested within PROMIS to allow benchmarking. Thus, a person can compare his/her symptom severity or functioning to people with similar disease or to the general US population.

This standardized metric will facilitate the conversion of the IRT z-score metric to the T-score distribution adopted by the PROMIS steering committee. For the purposes of computing the proportion of the norming/calibration sample that score below each theta level and identifying the z-score corresponding to that percentage from a normal distribution, the IRT scale score estimates will be treated as raw scores. These pseudo-normalized z-scores will be converted to T-scores with mean of 50 and standard deviation of 10. For PRO domains where the normal distribution is not appropriate, theta estimates will be converted to T-scores by a linear conversion.

Each of the PROMIS item banks calibrated from the wave one data will be examined for its ability to provide precise measurement across the construct continuum, as assessed by scale information and standard error of measurement curves. Further, CAT simulations will examine the discriminative ability of the item bank at any level of the construct continuum.82,83 The ideal is to have high precision and discrimination ability across the continuum of symptom severity or functional ability. Likely, there will be less precision in the extremes of the distributions (eg, high physical functioning or absence from depression); however, the PROMIS content experts are taking great care to write items that may help reduce floor and ceiling effects. The PROMIS network will review the findings from these analyses, and will follow-up with additional work to: (1) write new items to fill gaps in the construct continuum; (2) examine alternate psychometric methods that may improve precision or efficiency; (3) evaluate the items and scales for clinical application; and (4) review the bank items to ensure its relevance in different disease and demographic populations not covered or poorly covered in the calibration data.

Back to Top | Article Outline

CONCLUSIONS

This report has presented an overview of the psychometric methods that will be used in the PROMIS project, both to examine the properties of the items and domains and to calibrate items with properties that will allow the CAT procedure to select the most informative set of items to estimate a person’s level of health. The PROMIS project is faced with an enormous challenge to create psychometrically sound and valid banks in a short amount of time. Multiple item banks will be developed and at least 7 disease populations and a general US population that vary across a range of key demographic characteristics will be represented in the initial calibration sample collected in wave one. The enormity of the project requires the PROMIS psychometric team to be flexible in terms of the methods used. The design presented herein was developed to be robust to violations of the assumptions required to reach project goals. It is also expected that a large-scale evaluation phase will follow the initial wave of testing to examine alternative methods that may yield more interpretable and efficient results.

Back to Top | Article Outline

REFERENCES

1. Ader DN. Developing the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007;45(Suppl 1):S1–S2.

2. Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap Cooperative Group during its first two years. Med Care. 2007;45(Suppl 1):S3–S11.

3. Cook KF, O’Malley KJ, Roddey TS. Dynamic assessment of health outcomes: time to let the CAT out of the bag? Health Services Res. 2005;40(Part II):1694–1711.

4. Teresi JA. Statistical methods of examination of differential item functioning with applications to cross-cultural measurement of functional, physical and mental health. J Mental Health Aging. 2001;7:31–40.

5. Hays RD, Liu H, Spritzer K, et al. Item response theory analyses of physical functioning items in the Medical Outcomes Study. Med Care. 2007;45(Suppl 1):S32–S38.

6. Hill CD, Edwards MC, Thissen D, et al. Practical issues in the application of item response theory: a demonstration using items from the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales. Med Care. 2007;45(Suppl 1):S39–S47.

7. DeWalt DA, Rothrock N, Yount S, et al. Evaluation of item candidates: the PROMIS qualitative item review. Med Care. 2007;45(Suppl 1):S12–S21.

8. Kutner MH, Nachtsheim CJ, Neter J, et al. Applied Linear Statistical Model. 5th ed. New York, NY: McGraw-Hill/Irwin; 2005:664–665, 1173–1183.

9. Armitage P, Berry G, Matthews JNS. Statistical Methods in Medical Research. 4th ed. Malden, MA: Blackwell Science; 2002:261–264.

10. Ware JE Jr, Bjorner JB, Kosinski M. Practical implications of item response theory and computerized adaptive testing: a brief summary of ongoing studies of widely used headache impact scales. Med Care. 2000;38;II73–II82.

11. Bjorner JB, Kosinski M, Ware JE Jr. Calibration of an item pool for assessing the burden of headaches: an application of item response theory to the headache impact test (HIT). Quality of Life Res. 2003;12:913–933.

12. Lai JS, Cella D, Chang CH, et al. Item banking to improve, shorten and computerize self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale. Quality of Life Res. 2003;12:485–501.

13. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:297–334.

14. McDonald RP. The dimensionality of test and items. Br J Mathematical Stat Psychol. 1981;34:100–117.

15. McDonald RP. Test Theory: A Unified Treatment. Mahwah, NJ: Lawrence Erlbaum; 1999.

16. Drasgow F, Parsons CK. Application of unidimensional item response theory models to multidimensional data. Appl Psychol Measure. 1983;7:189–199.

17. Harrison DA. Robustness of IRT parameter estimation to violations of the unidimensionality assumption. J Educational Stat. 1986;11:91–115.

18. Roussos L, Stout W. A multidimensionality-based DIF analysis paradigm. Appl Psychol Measure. 1996;20:355–371.

19. Stout W. A nonparametric approach for assessing latent trait unidimensionality. Psychometrika. 1987;52:589–617.

20. Lai J-S, Crane PK, Cella D. Factor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue. Qual Life Res. 2006;15:1179–1190.

21. Muthén LK, Muthén BO. Mplus User’s Guide. Los Angeles, CA: Muthen & Muthen; 1998.

22. Jöreskog KG, Sörbom D, Du Toit S, et al. LISREL 8: New Statistical Features. Third printing with revisions. Lincolnwood: Scientific Software International; 2003.

23. Muthén B, du Toit SHC, Spisic D. Robust inference using weighted least squared and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Psychometrika. 1997.

24. Kline RB. Principles and Practice of Structural Equation Modeling. New York, NY: Guilford Press; 1998.

25. Bentler P. Comparative fit indices in structural models. Psychol Bull. 1990;107:238–246.

26. West SG, Finch JF, Curran PJ. SEM with nonnormal variables. In: Hoyle RH, ed. Structural Equation Modeling: Concepts Issues and Applications. Thousand Oaks, CA: Sage Publications; 1995;56–75.

27. Hu LT, Bentler P. Cutoff criteria for fit indices in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling. 1999;6:1–55.

28. Browne MW, Cudeck R. Alternative ways of assessing model fit. In: Bollen KA, Long JS, eds. Testing Structural Equation Models. Newbury Park, CA: Sage Publications; 1993.

29. Gibbons RD, Hedeker DR, Bock RD. Full-information item bi-factor analysis. Psychometrika. 1992;57:423–436.

30. Gibbons RD, Bock RD, Hedeker D, et al. Full-information item bi-factor analysis of graded response data. Appl Psychol Measure. In press.

31. Reise SP, Haviland MG. Item response theory and the measurement of clinical change. J Personality Assess. 2005;84:228–238.

32. Jöreskog KG, Moustaki I. Factor analysis of ordinal variables with full information maximum likelihood. Unpublished manuscript downloaded from http://www.ssicentral.com. Accessed January 6, 2007.

33. Edwards MC. A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis. [dissertation]. Chapel Hill, NC: University of North Carolina; 2005.

34. Steinberg L, Thissen D. Uses of item response theory and the testlet concept in the measurement of psychopathology. Psychol Methods. 1996;1:81–97.

35. Wainer H, Thissen D. How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measure. 1996;15:22–29.

36. Yen WM. Scaling performance assessments: strategies for managing local item dependence. J Educational Measuret. 1993;30:187–213.

37. Yen WM. Effect of local item dependence on the fit and equating performance of the three-parameter logistic model. Appl Psychol Measure. 1984;8:125–145.

38. Chen W-H, Thissen D. Local dependence indexes for item pairs using item response theory. Educational Behav Stat. 1997;22:265–289.

39. Ramsay JO. A functional approach to modeling test data. In: van der Linden WJ, Hambleton RK, eds. Handbook of Modern Item Response Theory. New York, NY: Springer; 1997:381–394.

40. Molenaar IW, Sijtsma K. Users Manual MSP5 for Windows: A Program for Mokken Scale Analysis for Polytomous Items [software manual]. Groningen, the Netherlands: iec ProGAMMA; 2000.

41. Hambleton RK, Swaminathan H, Rogers H. Fundamentals of Item Response Theory. Newbury Park, CA: Sage; 1991.

42. Embretson SE, Reise SP. Item Response Theory for Psychologists. Mahwah, NJ: Lawrence Erlbaum; 2000.

43. van der Linden WJ, Hambleton RK, eds. Handbook of Modern Item Response Theory. New York, NY: Springer-Verlag; 1997.

44. Samejima F. Estimation of latent ability using a response pattern of graded scores. Psychometrika Monogr. 1969;No. 17.

45. Samejima F. Graded response model. In: van der Linden WJ, Hambleton RK, eds. Handbook of Modern Item Response Theory. New York, NY: Springer; 1997:85–100.

46. Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;47:149–174.

47. Andrich D. A rating formulation for ordered response categories. Psychometrika. 1978;43:561–573.

48. Wright BD, Masters GN. Rating Scale Analysis. Chicago, IL: MESA Press; 1982.

49. Thissen D, Orlando M. Item response theory for items scored in two categories. In: Thissen D, Wainer H, eds. Test Scoring. Mahwah, NJ: Lawrence Erlbaum; 2001:73–140.

50. Thissen D, Nelson L, Rosa K, et al. Item response theory for items scored in more than two categories. In: Thissen D, Wainer H, eds. Test Scoring. Mahwah, NJ: Lawrence Erlbaum; 2001:141–186.

51. Yen WM. Using simulation results to choose a latent trait model. Appl Psychol Measure. 1981;5:245–262.

52. Orlando M, Thissen D. Likelihood-based item-fit indices for dichotomous item response theory models. Appl Psychol Measure. 2000;24:50–64.

53. Orlando M, Thissen D. Further examination of the performance of S-X2, an item fit index for dichotomous item response theory models. Appl Psychol Measure. 2003;27:289–298.

54. Hambleton RK, Han N. Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In: Lenderking WR, Revicki D, eds. Advances in Health Outcomes Research Methods, Measurement, Statistical Analysis, and Clinical Applications. Washington, DC: International Society for Quality of life Research; 2005:57–78.

55. Reeve BB. Item response theory modeling in health outcomes measurement. Exp Rev Pharmacoeconomics Outcomes Res. 2003;3:131–145.

56. Reeve BB, Fayers P. Applying item response theory modeling for evaluating questionnaire item and scale properties. In: Fayers P, Hays RD, eds. Assessing Quality of Life in Clinical Trials: Methods of Practice. 2nd ed. Oxford, NY: Oxford University Press; 2005:55–73.

57. Briggs DC, Wilson M. An introduction to multidimensional measurements using Rasch models. J Appl Measure. 2003;4:87–100.

58. te Marvelde JM, Glas CAW, Landeghem GV, et al. Application of multidimensional item response theory models to longitudinal data. Educational Psychol Measure. 2006;66:5–34.

59. Segal DO. Multidimensional adaptive testing. Psychometrika. 1996;61:331–354.

60. van der Linden WJ. Multidimensional adaptive testing with a minimum error-variance criterion. J Educ Behav Stat. 1999;24:398–412.

61. Gardner W, Kelleher KJ, Pajer KA. Multidimensional adaptive testing for mental health problems in primary care. Med Care. 2002;40:812–823.

62. Ackerman TA, Gierl MJ, Walker CM. Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measure. 2003;22:37–51.

63. Petersen MA, Groenvold M, Aaronson N, et al. Multidimensional computerized adaptive testing of the EORTC QLQ-C30: Basic developments and evaluation. Quality Life Res. 2006;15:315–329.

64. Thissen D, Steinberg L, Wainer H. Detection of differential item functioning using the parameters of item response models. In: Holland PW, Wainer H, eds. Differential Item Functioning. Hillsdale, NJ: Lawrence Erlbaum Associates; 1993:67–113.

65. Teresi JA, Kleinman MK, Ocepek-Welikson K. Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures. Stat Med. 2000;19:1651–1683.

66. Millsap RE, Everson HT. Methodology review: statistical approaches for assessing measurement bias. ApplPsychol Measure. 1993;17:297–334.

67. Orlando M, Thissen D, Teresi J, et al. Identification of differential item functioning using item response theory and the likelihood-based model comparison approach: application to the Mini-Mental State Examination. Med Care. 2006;44(Suppl 3):S134–S142.

68. Crane PK, Gibbons LE, Jolley L, et al. Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Med Care. 2006;44(Suppl 3):S115–S123.

69. Teresi JA. Different approaches to differential item functioning in health applications: advantages, disadvantages, and some neglected topics. Med Care. 2006;44(Suppl 3):S152–S170.

70. Thissen D. IRTLRDIF v2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning; 2001.

71. Raju NS, van der Linden WJ, Fleer PF. IRT-based internal measures of differential functioning of items and tests. Appl Psychol Measure. 1995;19:353–368.

72. Flowers CP, Oshima TC, Raju NS. A description and demonstration of the polytomous DFIT framework. Appl Psychol Measure. 1999;23:309–326.

73. Raju NS. DFITP5: A Fortran Program for Calculating Dichotomous DIF/DTF [computer program]. Chicago, IL: Illinois Institute of Technology; 1999.

74. Cohen AS, Kim SH, Baker FB. Detection of differential item functioning in the graded response model. Appl Psychol Measure. 1993;17:335–350.

75. Zumbo BD. A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense; 1999.

76. Crane PK, van Belle G, Larson EB. Test bias in a cognitive test: differential item functioning in the CASI. Stat Med. 2004;23:241–256.

77. Maldonado G, Greenland S. Simulation study of confounder-selection strategies. Am J Epidemiol. 1993;138:923–936.

78. Crane PK, Hart DL, Gibbons LE, et al. A 37-item shoulder functional status item pool had negligible differential item functioning. J Clin Epidemiol. 2006;59:478–484.

79. Muthén BO. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika. 1984;49:115–132.

80. Fleishman JA, Lawrence WF. Demographic variation in SF-12 scores: true differences or differential item functioning. Med Care. 2003;41:III-75–III-86.

81. Jones RN, Gallo JJ. Education and sex differences in the Mini-Mental State Examination: effects of differential item functioning. J Gerontol. 2000;55B:273–282.

82. Fliege H, Becker J, Walter OB, et al. Development of a computer-adaptive test for depression. Quality Life Res. 2005;14:2277–2291.

83. Hart DL, Cook KF, Mioduski JE, et al. Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function. J Epidemiol. 2006;59:290–298.

Cited By:

This article has been cited 117 time(s).

Archives of Physical Medicine and Rehabilitation
Rasch Analysis of the 22 Knee Injury and Osteoarthritis Outcome Score-Physical Function Items in Italian Patients With Knee Osteoarthritis
Franchignoni, F; Salaffi, F; Giordano, A; Carotti, M; Ciapetti, A; Ottonello, M
Archives of Physical Medicine and Rehabilitation, 94(3): 480-487.
10.1016/j.apmr.2012.09.028
CrossRef
Archives of Physical Medicine and Rehabilitation
Can We Scientifically and Reliably Measure the Level of Consciousness in Vegetative and Minimally Conscious States? Rasch Analysis of the Coma Recovery Scale-Revised
La Porta, F; Caselli, S; Ianes, AB; Cameli, O; Lino, M; Piperno, R; Sighinolfi, A; Lombardi, F; Tennant, A
Archives of Physical Medicine and Rehabilitation, 94(3): 527-535.
10.1016/j.apmr.2012.09.035
CrossRef
Physical Therapy
Item Response Theory Analysis of the Outpatient Physical Therapy Improvement in Movement Assessment Log (OPTIMAL)
Elston, B; Goldstein, M; Makambi, KH
Physical Therapy, 93(5): 661-671.
10.2522/ptj.20120120
CrossRef
Physical Therapy
Development of a Computerized Adaptive Test for Assessing Activities of Daily Living in Outpatients With Stroke
Hsueh, IP; Chen, JH; Wang, CH; Hou, WH; Hsieh, CL
Physical Therapy, 93(5): 681-693.
10.2522/ptj.20120173
CrossRef
Journal of Sexual Medicine
Development of the NIH PROMIS (R) Sexual Function and Satisfaction Measures in Patients with Cancer
Flynn, KE; Lin, L; Cyranowski, JM; Reeve, BB; Reese, JB; Jeffery, DD; Smith, AW; Porter, LS; Dombeck, CB; Bruner, DW; Keefe, FJ; Weinfurt, KP
Journal of Sexual Medicine, 10(): 43-52.
10.1111/j.1743-6109.2012.02995.x
CrossRef
General Hospital Psychiatry
Evaluating brief screeners to discriminate between drug use disorders in a sample of treatment-seeking adults
Wu, LT; Swartz, MS; Pan, JJ; Burchett, B; Mannelli, P; Yang, CM; Blazer, DG
General Hospital Psychiatry, 35(1): 74-82.
10.1016/j.genhosppsych.2012.06.014
CrossRef
Neurorehabilitation
Bi-factor analyses of the Brief Test of Adult Cognition by Telephone
Gavett, BE; Crane, PK; Dams-O'Connor, K
Neurorehabilitation, 32(2): 253-265.
10.3233/NRE-130842
CrossRef
Maternal and Child Health Journal
Measurement of Maternal Instrumental Support: Findings from Three Population-Based Cohort Studies
Bevans, KB; Bhatt, SK; Pascoe, JM; Pati, S
Maternal and Child Health Journal, 17(1): 14-22.
10.1007/s10995-012-0958-2
CrossRef
General Hospital Psychiatry
Association between anxiety, health-related quality of life and functional impairment in primary care patients with chronic pain
Kroenke, K; Outcalt, S; Krebs, E; Bair, MJ; Wu, JW; Chumbler, N; Yu, ZS
General Hospital Psychiatry, 35(4): 359-365.
10.1016/j.genhosppsych.2013.03.020
CrossRef
Brain Imaging and Behavior
The Alzheimer's Disease Assessment Scale-Cognitive-Plus (ADAS-Cog-Plus): an expansion of the ADAS-Cog to improve responsiveness in MCI
Skinner, J; Carvalho, JO; Potter, GG; Thames, A; Zelinski, E; Crane, PK; Gibbons, LE
Brain Imaging and Behavior, 6(4): 489-501.
10.1007/s11682-012-9166-3
CrossRef
Brain Imaging and Behavior
Development and assessment of a composite score for memory in the Alzheimer's Disease Neuroimaging Initiative (ADNI)
Crane, PK; Carle, A; Gibbons, LE; Insel, P; Mackin, RS; Gross, A; Jones, RN; Mukherjee, S; Curtis, SM; Harvey, D; Weiner, M; Mungas, D
Brain Imaging and Behavior, 6(4): 502-516.
10.1007/s11682-012-9186-z
CrossRef
Social Psychiatry and Psychiatric Epidemiology
Evaluating the seven-item Center for Epidemiologic Studies Depression Scale short-form: a longitudinal US community study
Levine, SZ
Social Psychiatry and Psychiatric Epidemiology, 48(9): 1519-1526.
10.1007/s00127-012-0650-2
CrossRef
Rheumatic Disease Clinics of North America
Advances in the Assessment of Fibromyalgia
Williams, DA; Schilling, S
Rheumatic Disease Clinics of North America, 35(2): 339-+.
10.1016/j.rdc.2009.05.007
CrossRef
Quality of Life Research
Logistics of collecting patient-reported outcomes (PROs) in clinical practice: an overview and practical examples
Rose, M; Bezjak, A
Quality of Life Research, 18(1): 125-136.
10.1007/s11136-008-9436-0
CrossRef
Physical Therapy
Evaluation of an Item Bank for a Computerized Adaptive Test of Activity in Children With Cerebral Palsy
Haley, SM; Fragala-Pinkham, MA; Dumas, HM; Ni, P; Gorton, GE; Watson, K; Montpetit, K; Bilocleau, N; Hambleton, RK; Tucker, CA
Physical Therapy, 89(6): 589-600.
10.2522/ptj.20090007
CrossRef
Quality of Life Research
SF-36 includes less Parkinson Disease (PD)-targeted content but is more responsive to change than two PD-targeted health-related quality of life measures
Brown, CA; Cheng, EM; Hays, RD; Vassar, SD; Vickrey, BG
Quality of Life Research, 18(9): 1219-1237.
10.1007/s11136-009-9530-y
CrossRef
Archives of Physical Medicine and Rehabilitation
Patient-Reported Cognitive and Communicative Functioning: 1 Construct or 2?
Hula, WD; Doyle, PJ; Hula, SNA
Archives of Physical Medicine and Rehabilitation, 91(3): 400-406.
10.1016/j.apmr.2009.11.013
CrossRef
Sleep
Development and Validation of Patient-Reported Outcome Measures for Sleep Disturbance and Sleep-Related Impairments
Buysse, DJ; Yu, L; Moul, DE; Germain, A; Stover, A; Dodds, NE; Johnston, KL; Shablesky-Cade, MA; Pilkonis, PA
Sleep, 33(6): 781-792.

Psychometrika
High-Dimensional Exploratory Item Factor Analysis By A Metropolis-Hastings Robbins-Monro Algorithm
Cai, L
Psychometrika, 75(1): 33-57.

Journal of Asthma
Construction of the Pediatric Asthma Impact Scale (PAIS) for the Patient-Reported Outcomes Measurement Information System (PROMIS)
Yeatts, KB; Stucky, B; Thissen, D; Irwin, D; Varni, JW; DeWitt, EM; Lai, JS; DeWalt, DA
Journal of Asthma, 47(3): 295-302.
10.3109/02770900903426997
CrossRef
Journals of Gerontology Series A-Biological Sciences and Medical Sciences
Correspondence of Verbal Descriptor and Numeric Rating Scales for Pain Intensity: An Item Response Theory Calibration
Edelen, MO; Saliba, D
Journals of Gerontology Series A-Biological Sciences and Medical Sciences, 65(7): 778-785.
10.1093/gerona/glp215
CrossRef
American Journal of Speech-Language Pathology
Variables Associated With Communicative Participation in People With Multiple Sclerosis: A Regression Analysis
Baylor, C; Yorkston, K; Bamer, A; Britton, D; Amtmann, D
American Journal of Speech-Language Pathology, 19(2): 143-153.
10.1044/1058-0360(2009/08-0087)
CrossRef
Journal of Clinical Oncology
Standardizing patient-reported outcomes assessment in cancer clinical trials: A patient-reported outcomes measurement information system initiative
Garcia, SF; Cella, D; Clauser, SB; Flynn, KE; Lad, T; Lai, JS; Reeve, BB; Smith, AW; Stone, AA; Weinfurt, K
Journal of Clinical Oncology, 25(): 5106-5112.
10.1200/JCO.2007.12.2341
CrossRef
Cancer
Initial Report of the Cancer Patient-Reported Outcomes Measurement Information System (PROMIS) Sexual Function Committee Review of Sexual Function Measures and Domains Used in Oncology
Jeffery, DD; Tzeng, JP; Keefe, FJ; Porter, LS; Hahn, EA; Flynn, KE; Reeve, BB; Weinfurt, KP
Cancer, 115(6): 1142-1153.
10.1002/cncr.24134
CrossRef
Journal of Rheumatology
Items, Instruments, Crosswalks, and PROMIS
Fries, JF; Krishnan, E; Bruce, B
Journal of Rheumatology, 36(6): 1093-1095.
10.3899/jrheum.090320
CrossRef
Journal of Rheumatology
Progress in Assessing Physical Function in Arthritis: PROMIS Short Forms and Computerized Adaptive Testing
Fries, JF; Cella, D; Rose, M; Krishnan, E; Bruce, B
Journal of Rheumatology, 36(9): 2061-2066.
10.3899/jrheum.090358
CrossRef
Psychotherapy Research
Questioning the measurement precision of psychotherapy research
Doucette, A; Wolf, AW
Psychotherapy Research, 19(): 374-389.
10.1080/10503300902894422
CrossRef
Vox Sanguinis
High-Dimensional Exploratory Item Factor Analysis By A Metropolis-Hastings Robbins-Monro Algorithm
Cai, L
Vox Sanguinis, 98(): 33-57.
10.1007/S11336-009-9136-X
CrossRef
Archives of Physical Medicine and Rehabilitation
A Computerized Adaptive Test for Patients With Hip Impairments Produced Valid and Responsive Measures of Function
Hart, DL; Wang, YC; Stratford, PW; Mioduski, JE
Archives of Physical Medicine and Rehabilitation, 89(): 2129-2139.
10.1016/j.apmr.2008.04.026
CrossRef
Journal of the American Academy of Child and Adolescent Psychiatry
Construct and Differential Item Functioning in the Assessment of Prescription Opioid Use Disorders Among American Adolescents
Wu, LT; Ringwalt, CL; Yang, CM; Reeve, BB; Pan, JJ; Blazer, DG
Journal of the American Academy of Child and Adolescent Psychiatry, 48(5): 563-572.
10.1097/CHI.0b013e31819e3f45
CrossRef
Arthritis Research & Therapy
A functional difficulty and functional pain instrument for hip and knee osteoarthritis
Jette, AM; McDonough, CM; Ni, PS; Haley, SM; Hambleton, RK; Olarsch, S; Hunter, DJ; Kim, YJ; Felson, DT
Arthritis Research & Therapy, 11(4): -.
ARTN R107
CrossRef
Disability and Rehabilitation
A Rasch-based validation of a short version of ABILHAND as a measure of manual ability in adults with unilateral upper limb amputation
Burger, H; Franchignoni, F; Kotnik, S; Giordano, A
Disability and Rehabilitation, 31(): 2023-2030.
10.3109/09638280902887420
CrossRef
Journal of Clinical Epidemiology
Computerized adaptive test for patients with knee impairments produced valid and responsive measures of function
Hart, DL; Wang, YC; Stratford, PW; Mioduski, JE
Journal of Clinical Epidemiology, 61(): 1113-1124.
10.1016/j.jclinepi.2008.01.005
CrossRef
Journal of Clinical Epidemiology
A computer-adaptive disability instrument for lower extremity osteoarthritis research demonstrated promising breadth, precision, and reliability
Jette, AM; McDonough, CM; Haley, SM; Ni, PS; Olarsch, S; Latham, N; Hambleton, RK; Felson, D; Kim, YJ; Hunter, D
Journal of Clinical Epidemiology, 62(8): 807-815.
10.1016/j.jclinepi.2008.10.004
CrossRef
Quality of Life Research
An item response analysis of the pediatric PROMIS anxiety and depressive symptoms scales
Irwin, DE; Stucky, B; Langer, MM; Thissen, D; DeWitt, EM; Lai, JS; Varni, JW; Yeatts, K; DeWalt, DA
Quality of Life Research, 19(4): 595-607.
10.1007/s11136-010-9619-3
CrossRef
Journal of Occupational Rehabilitation
A method to provide a more efficient and reliable measure of self-report physical work capacity for patients with spinal pain
Matheson, L; Mayer, J; Mooney, V; Sarkin, A; Dreisinger, T; Verna, J; Leggett, S
Journal of Occupational Rehabilitation, 18(1): 46-57.
10.1007/s10926-007-9111-2
CrossRef
Pain
Development and psychometric analysis of the PROMIS pain behavior item bank
Revicki, DA; Chen, WH; Harnam, N; Cook, KF; Amtmann, D; Callahan, LF; Jensen, MP; Keefe, FJ
Pain, 146(): 158-169.
10.1016/j.pain.2009.07.029
CrossRef
Patient Education and Counseling
Complex interventions to improve the health of people with limited literacy: A systematic review
Clement, S; Ibrahim, S; Crichton, N; Wolf, M; Rowlands, G
Patient Education and Counseling, 75(3): 340-351.
10.1016/j.pec.2009.01.008
CrossRef
Critical Reviews in Oncology Hematology
Development of symptom assessments utilising item response theory and computer-adaptive testing-A practical method based on a systematic review
Walker, J; Bohnke, JR; Cerny, T; Strasser, F
Critical Reviews in Oncology Hematology, 73(1): 47-67.
10.1016/j.critrevonc.2009.03.007
CrossRef
Anatomical Sciences Education
Meta-Evaluation in Clinical Anatomy: A Practical Application of Item Response Theory in Multiple Choice Examinations
Severo, M; Tavares, MAF
Anatomical Sciences Education, 3(1): 17-24.
10.1002/ase.118
CrossRef
Spine
Letting the CAT out of the bag - Comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire
Cook, KF; Choi, SW; Crane, PK; Deyo, RA; Johnson, KL; Amtmann, D
Spine, 33(): 1378-1383.

Journal of Speech Language and Hearing Research
Improving the Quality and Applicability of the Dutch Scales of the Communication Profile for the Hearing Impaired Using Item Response Theory
Mokkink, LB; Knol, DL; van Nispen, RMA; Kramer, SE
Journal of Speech Language and Hearing Research, 53(3): 556-571.
10.1044/1092-4388(2010/09-0035)
CrossRef
Journal of Pain and Symptom Management
Linking Pain Items from Two Studies Onto a Common Scale Using Item Response Theory
Chen, WH; Revicki, DA; Lai, JS; Cook, KF; Amtmann, D
Journal of Pain and Symptom Management, 38(4): 615-628.
10.1016/j.jpainsymman.2008.11.016
CrossRef
Quality of Life Research
Content validity in the PROMIS social-health domain: a qualitative analysis of focus-group data
Castel, LD; Williams, KA; Bosworth, HB; Eisen, SV; Hahn, EA; Irwin, DE; Kelly, MAR; Morse, J; Stover, A; DeWalt, DA; DeVellis, RF
Quality of Life Research, 17(5): 737-749.
10.1007/s11136-008-9352-3
CrossRef
Quality of Life Research
Cognitive interviewing in the evaluation of fatigue items: Results from the patient-reported outcomes measurement information system (PROMIS)
Christodoulou, C; Junghaenel, DU; DeWalt, DA; Rothrock, N; Stone, AA
Quality of Life Research, 17(): 1239-1246.
10.1007/s11136-008-9402-x
CrossRef
Brain Imaging and Behavior
A composite score for executive functioning, validated in Alzheimer's Disease Neuroimaging Initiative (ADNI) participants with baseline mild cognitive impairment
Gibbons, LE; Carle, AC; Mackin, RS; Harvey, D; Mukherjee, S; Insel, P; Curtis, SM; Mungas, D; Crane, PK
Brain Imaging and Behavior, 6(4): 517-527.
10.1007/s11682-012-9176-1
CrossRef
Brain Imaging and Behavior
Genome-wide pathway analysis of memory impairment in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort implicates gene candidates, canonical pathways, and networks
Ramanan, VK; Kim, S; Holohan, K; Shen, L; Nho, K; Risacher, SL; Foroud, TM; Mukherjee, S; Crane, PK; Aisen, PS; Petersen, RC; Weiner, MW; Saykin, AJ
Brain Imaging and Behavior, 6(4): 634-648.
10.1007/s11682-012-9196-x
CrossRef
Value in Health
Recommendations on Evidence Needed to Support Measurement Equivalence between Electronic and Paper-Based Patient-Reported Outcome (PRO) Measures: ISPOR ePRO Good Research Practices Task Force Report
Coons, SJ; Gwaltney, CJ; Hays, RD; Lundy, JJ; Sloan, JA; Revicki, DA; Lenderking, WR; Cella, D; Basch, E
Value in Health, 12(4): 419-429.
10.1111/j.1524-4733.2008.00470.x
CrossRef
Quality of Life Research
Using cognitive interviews to evaluate items for measuring sexual functioning across cancer populations: improvements and remaining challenges
Fortune-Greeley, AK; Flynn, KE; Jeffery, DD; Williams, MS; Keefe, FJ; Reeve, BB; Willis, GB; Weinfurt, KP
Quality of Life Research, 18(8): 1085-1093.
10.1007/s11136-009-9523-x
CrossRef
Journal of Rehabilitation Medicine
Using Psychometric Techniques to Improve the Balance Evaluation Systems Test: the Mini-Bestest
Franchignoni, F; Horak, F; Godi, M; Nardone, A; Giordano, A
Journal of Rehabilitation Medicine, 42(4): 323-331.
10.2340/16501977-0537
CrossRef
Quality of Life Research
Measuring global physical health in children with cerebral palsy: illustration of a multidimensional bi-factor model and computerized adaptive testing
Haley, SM; Ni, PS; Dumas, HM; Fragala-Pinkham, MA; Hambleton, RK; Montpetit, K; Bilodeau, N; Gorton, GE; Watson, K; Tucker, CA
Quality of Life Research, 18(3): 359-370.
10.1007/s11136-009-9447-5
CrossRef
Pain
Development of a PROMIS item bank to measure pain interference
Amtmann, D; Cook, KF; Jensen, MP; Chen, WH; Choi, S; Revicki, D; Cella, D; Rothrock, N; Keefe, F; Callahan, L; Lai, JS
Pain, 150(1): 173-182.
10.1016/j.pain.2010.04.025
CrossRef
Patient Education and Counseling
Bilingual health literacy assessment using the Talking Touchscreen/la Pantalla Parlanchina: Development and pilot testing
Yost, KJ; Webster, K; Baker, DW; Choi, SW; Bode, RK; Hahn, EA
Patient Education and Counseling, 75(3): 295-301.
10.1016/j.pec.2009.02.020
CrossRef
Psychiatric Services
Are we ready for computerized adaptive testing?
Unick, GJ; Shumway, M; Hargreaves, W
Psychiatric Services, 59(4): 369.

European Journal of Physical and Rehabilitation Medicine
Considerations about the use and misuse of Rasch analysis in rehabilitation outcome studies
Franchignoni, F; Giordano, A; Feirriero, G
European Journal of Physical and Rehabilitation Medicine, 45(2): 289-291.

Journal of Oral Rehabilitation
Disability assessment in temporomandibular disorders and masticatory system rehabilitation*
Ohrbach, R
Journal of Oral Rehabilitation, 37(6): 452-480.
10.1111/j.1365-2842.2009.02058.x
CrossRef
Journal of Speech Language and Hearing Research
Developing the Communicative Participation Item Bank: Rasch Analysis Results From a Spasmodic Dysphonia Sample
Baylor, CR; Yorkston, KM; Eadie, TL; Miller, RM; Amtmann, D
Journal of Speech Language and Hearing Research, 52(5): 1302-1320.
10.1044/1092-4388(2009/07-0275)
CrossRef
Ethnicity & Disease
A Mixed-Methods Approach to Developing A Self-Reported Racial/Ethnic Discrimination Measure for Use in Multiethnic Health Surveys
Shariff-Marco, S; Gee, GC; Breen, N; Willis, G; Reeve, BB; Grant, D; Ponce, NA; Krieger, N; Landrine, H; Williams, DR; Alegria, M; Mays, VM; Johnson, TP; Brown, ER
Ethnicity & Disease, 19(4): 447-453.

Quality of Life Research
Rasch analysis of the short form 8-item Parkinson's Disease Questionnaire (PDQ-8)
Franchignoni, F; Giordano, A; Ferriero, G
Quality of Life Research, 17(4): 541-548.
10.1007/s11136-008-9341-6
CrossRef
Journal of Abnormal Child Psychology
Item response theory analyses of the parent and teacher ratings of the DSM-IV ADHD rating scale
Gomez, R
Journal of Abnormal Child Psychology, 36(6): 865-885.
10.1007/s10802-008-9218-8
CrossRef
Journal of Pain and Symptom Management
Assessing the Symptoms of Cancer Using Patient-Reported Outcomes (ASCPRO): Searching for Standards
Cleeland, CS; Sloan, JA
Journal of Pain and Symptom Management, 39(6): 1077-1085.
10.1016/j.jpainsymman.2009.05.025
CrossRef
Rehabilitation Psychology
Development of an Item Bank for the Assessment of Depression in Persons With Mental Illnesses and Physical Diseases Using Rasch Analysis
Forkmann, T; Boecker, M; Norra, C; Eberle, N; Kircher, T; Schauerte, P; Mischke, K; Westhofen, M; Gauggel, S; Wirtz, M
Rehabilitation Psychology, 54(2): 186-197.
10.1037/a0015612
CrossRef
Journal of Clinical Epidemiology
A multilevel item response theory model was investigated for longitudinal vision-related quality-of-life data
van Nispen, RMA; Knol, DL; Neve, HJ; van Rens, GHMB
Journal of Clinical Epidemiology, 63(3): 321-330.
10.1016/j.jclinepi.2009.06.012
CrossRef
Arthritis Research & Therapy
Better assessment of physical function: item improvement is neglected but essential
Bruce, B; Fries, JF; Ambrosini, D; Lingala, B; Gandek, B; Rose, M; Ware, JE
Arthritis Research & Therapy, 11(6): -.
ARTN R191
CrossRef
Personality and Individual Differences
Parent ratings of the ADHD items of the disruptive behavior rating scale: Analyses of their IRT properties based on the generalized partial credit model
Gomez, R
Personality and Individual Differences, 45(2): 181-186.
10.1016/j.paid.2008.04.001
CrossRef
Quality of Life Research
Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT's unidimensionality assumption
Cook, KF; Kallen, MA; Amtmann, D
Quality of Life Research, 18(4): 447-460.
10.1007/s11136-009-9464-4
CrossRef
Quality of Life Research
Measuring fatigue in persons with multiple sclerosis: creating a crosswalk between the Modified Fatigue Impact Scale and the PROMIS Fatigue Short Form
Noonan, VK; Cook, KF; Bamer, AM; Choi, SW; Kim, J; Amtmann, D
Quality of Life Research, 21(7): 1123-1133.
10.1007/s11136-011-0040-3
CrossRef
Quality of Life Research
PROMIS (R) Parent Proxy Report Scales: an item response theory analysis of the parent proxy report item banks
Varni, JW; Thissen, D; Stucky, BD; Liu, Y; Gorder, H; Irwin, DE; DeWitt, EM; Lai, JS; Amtmann, D; DeWalt, DA
Quality of Life Research, 21(7): 1223-1240.
10.1007/s11136-011-0025-2
CrossRef
Quality of Life Research
Development of an item bank and computer adaptive test for role functioning
Anatchkova, MD; Rose, M; Ware, JE; Bjorner, JB
Quality of Life Research, 21(9): 1625-1637.
10.1007/s11136-011-0076-4
CrossRef
Drug and Alcohol Dependence
Item banks for alcohol use from the Patient-Reported Outcomes Measurement Information System (PROMIS (R)): Use, consequences, and expectancies
Pilkonis, PA; Yu, L; Colditz, J; Dodds, N; Johnston, KL; Maihoefer, C; Stover, AM; Daley, DC; McCarty, D
Drug and Alcohol Dependence, 130(): 167-177.
10.1016/j.drugalcdep.2012.11.002
CrossRef
Patient Education and Counseling
The comprehensibility of health education programs: Questionnaire development and results in patients with chronic musculoskeletal diseases
Farin, E; Nagl, M; Ullrich, A
Patient Education and Counseling, 90(2): 239-246.
10.1016/j.pec.2012.10.004
CrossRef
Pediatric Blood & Cancer
PROMIS pediatric measures in pediatric oncology: Valid and clinically feasible indicators of patient-reported outcomes
Hinds, PS; Nuss, SL; Ruccione, KS; Withycombe, JS; Jacobs, S; DeLuca, H; Faulkner, C; Liu, Y; Cheng, YI; Gross, HE; Wang, JC; DeWalt, DA
Pediatric Blood & Cancer, 60(3): 402-408.
10.1002/pbc.24233
CrossRef
Journal of Rehabilitation Research and Development
Quality of life for veterans and servicemembers with major traumatic limb loss from Vietnam and OIF/OEF conflicts
Epstein, RA; Heinemann, AW; McFarland, LV
Journal of Rehabilitation Research and Development, 47(4): 373-385.
10.1682/JRRD.2009.03.0023
CrossRef
AIDS Care-Psychological and Socio-Medical Aspects of AIDS/Hiv
Measuring depression levels in HIV-infected patients as part of routine clinical care using the nine-item Patient Health Questionnaire (PHQ-9)
Crane, PK; Gibbons, LE; Willig, JH; Mugavero, MJ; Lawrence, ST; Schumacher, JE; Saag, MS; Kitahata, MM; Crane, HM
AIDS Care-Psychological and Socio-Medical Aspects of AIDS/Hiv, 22(7): 874-885.
10.1080/09540120903483034
CrossRef
Transportation Research Part A-Policy and Practice
Evaluating bicycle-transit users' perceptions of intermodal inconvenience
Cheng, YH; Liu, KC
Transportation Research Part A-Policy and Practice, 46(): 1690-1706.
10.1016/j.tra.2012.10.013
CrossRef
Journal of Psychosomatic Research
Temporal trends in symptom experience predict the accuracy of recall PROs
Schneider, S; Broderick, JE; Junghaenel, DU; Schwartz, JE; Stone, AA
Journal of Psychosomatic Research, 75(2): 160-166.
10.1016/j.jpsychores.2013.06.006
CrossRef
Educational and Psychological Measurement
Multidimensionality and Structural Coefficient Bias in Structural Equation Modeling: A Bifactor Perspective
Reise, SP; Scheines, R; Widaman, KF; Haviland, MG
Educational and Psychological Measurement, 73(1): 5-26.
10.1177/0013164412449831
CrossRef
Quality of Life Research
Psychometric characteristics of daily diaries for the Patient-Reported Outcomes Measurement Information System (PROMISA (R)): a preliminary investigation
Schneider, S; Choi, SW; Junghaenel, DU; Schwartz, JE; Stone, AA
Quality of Life Research, 22(7): 1859-1869.
10.1007/s11136-012-0323-3
CrossRef
Spinal Cord
Evaluation of newly developed item banks for child-reported outcomes of participation following spinal cord injury
Mulcahey, MJ; Calhoun, CL; Tian, F; Ni, P; Vogel, LC; Haley, SM
Spinal Cord, 50(): 915-919.
10.1038/sc.2012.80
CrossRef
Disability and Rehabilitation
The reliability and validity of the World Health Organization Disability Assessment Schedule (WHODAS-II) in stroke
Kucukdeveci, AA; Kutlay, S; Yildizlar, D; Oztuna, D; Elhan, AH; Tennant, A
Disability and Rehabilitation, 35(3): 214-220.
10.3109/09638288.2012.690817
CrossRef
Journal of Pediatric Psychology
Disclosure and Self-Report of Emotional, Social, and Physical Health in Children and Adolescents With Chronic Pain-A Qualitative Study of PROMIS Pediatric Measures
Jacobson, CJ; Farrell, JE; Kashikar-Zuck, S; Seid, M; Verkamp, E; DeWitt, EM
Journal of Pediatric Psychology, 38(1): 82-93.
10.1093/jpepsy/jss099
CrossRef
Bmj Open
The Northern Manhattan Caregiver Intervention Project: a randomised trial testing the effectiveness of a dementia caregiver intervention in Hispanics in New York City
Luchsinger, J; Mittelman, M; Mejia, M; Silver, S; Lucero, RJ; Ramirez, M; Kong, J; Teresi, JA
Bmj Open, 2(5): -.
ARTN e001941
CrossRef
Archives of Physical Medicine and Rehabilitation
Development of a Self-Report Physical Function Instrument for Disability Assessment: Item Pool Construction and Factor Analysis
McDonough, CM; Jette, AM; Ni, PS; Bogusz, K; Marfeo, EE; Brandt, DE; Chan, L; Meterko, M; Haley, SM; Rasch, EK
Archives of Physical Medicine and Rehabilitation, 94(9): 1653-1660.
10.1016/j.apmr.2013.03.011
CrossRef
Archives of Physical Medicine and Rehabilitation
Development of a Computer-Adaptive Physical Function Instrument for Social Security Administration Disability Determination
Ni, PS; McDonough, CM; Jette, AM; Bogusz, K; Marfeo, EE; Rasch, EK; Brandt, DE; Meterko, M; Haley, SM; Chan, L
Archives of Physical Medicine and Rehabilitation, 94(9): 1661-1669.
10.1016/j.apmr.2013.03.021
CrossRef
Archives of Physical Medicine and Rehabilitation
Development of an Instrument to Measure Behavioral Health Function for Work Disability: Item Pool Construction and Factor Analysis
Marfeo, EE; Ni, PS; Haley, SM; Jette, AM; Bogusz, K; Meterko, M; McDonough, CM; Chan, L; Brandt, DE; Rasch, EK
Archives of Physical Medicine and Rehabilitation, 94(9): 1670-1678.
10.1016/j.apmr.2013.03.013
CrossRef
Health and Quality of Life Outcomes
Gaining the PROMIS perspective from children with nephrotic syndrome: a Midwest pediatric nephrology consortium study
Gipson, DS; Selewski, DT; Massengill, SF; Wickman, L; Messer, KL; Herreshoff, E; Bowers, C; Ferris, ME; Mahan, JD; Greenbaum, LA; MacHardy, J; Kapur, G; Chand, DH; Goebel, J; Barletta, GM; Geary, D; Kershaw, DB; Pan, CG; Gbadegesin, R; Hidalgo, G; Lane, JC; Leiser, JD; Plattner, BW; Song, PX; Thissen, D; Liu, Y; Gross, HE; DeWalt, DA
Health and Quality of Life Outcomes, 11(): -.
ARTN 30
CrossRef
Journal of Applied Psychology
Blaming the Organization for Abusive Supervision: The Roles of Perceived Organizational Support and Supervisor's Organizational Embodiment
Shoss, MK; Eisenberger, R; Restubog, SLD; Zagenczyk, TJ
Journal of Applied Psychology, 98(1): 158-168.
10.1037/a0030687
CrossRef
Archives of Physical Medicine and Rehabilitation
Validity and Reliability of the FIM Instrument in the Inpatient Burn Rehabilitation Population
Gerrard, P; Goldstein, R; DiVita, MA; Ryan, CM; Mix, J; Niewczyk, P; Kazis, L; Kowalske, K; Zafonte, R; Schneider, JC
Archives of Physical Medicine and Rehabilitation, 94(8): 1521-1526.
10.1016/j.apmr.2013.02.019
CrossRef
Arthritis Care & Research
Validity and Reliability of Patient-Reported Outcomes Measurement Information System Instruments in Osteoarthritis
Broderick, JE; Schneider, S; Junghaenel, DU; Schwartz, JE; Stone, AA
Arthritis Care & Research, 65(): 1625-1633.
10.1002/acr.22025
CrossRef
Psychotherapie Psychosomatik Medizinische Psychologie
Computer Adaptive Tests in Medicine
Rose, M; Wahl, I; Lowe, B
Psychotherapie Psychosomatik Medizinische Psychologie, 63(1): 48-54.
10.1055/s-0032-1329976
CrossRef
Health and Quality of Life Outcomes
Participation and social functioning in patients with fibromyalgia: development and testing of a new questionnaire
Farin, E; Ullrich, A; Hauer, J
Health and Quality of Life Outcomes, 11(): -.
ARTN 135
CrossRef
Contemporary Clinical Trials
Stepped Care to Optimize Pain care Effectiveness (SCOPE) trial study design and sample characteristics
Kroenke, K; Krebs, E; Wu, JW; Bair, MJ; Damush, T; Chumbler, N; York, T; Weitlauf, S; McCalley, S; Evans, E; Barnd, J; Yu, Z
Contemporary Clinical Trials, 34(2): 270-281.
10.1016/j.cct.2012.11.008
CrossRef
Health and Quality of Life Outcomes
Validation of a mobility item bank for older patients in primary care
Cabrero-Garcia, J; Ramos-Pichardo, JD; Munoz-Mendoza, CL; Cabanero-Martinez, MJ; Gonzalez-Llopis, L; Reig-Ferrer, A
Health and Quality of Life Outcomes, 10(): -.
ARTN 147
CrossRef
Bmc Medical Research Methodology
Selecting optimal screening items for delirium: an application of item response theory
Yang, FM; Jones, RN; Inouye, SK; Tommet, D; Crane, PK; Rudolph, JL; Ngo, LH; Marcantonio, ER
Bmc Medical Research Methodology, 13(): -.
ARTN 8
CrossRef
Health and Quality of Life Outcomes
Promising insights into the health related quality of life for children with severe obesity
Selewski, DT; Collier, DN; MacHardy, J; Gross, HE; Pickens, EM; Cooper, AW; Bullock, S; Earls, MF; Pratt, KJ; Scanlon, K; McNeill, JD; Messer, KL; Lu, Y; Thissen, D; DeWalt, DA; Gipson, DS
Health and Quality of Life Outcomes, 11(): -.
ARTN 29
CrossRef
Journal of Rehabilitation Research and Development
Computer-adaptive test to measure community reintegration of Veterans
Resnik, L; Tian, F; Ni, PS; Jette, A
Journal of Rehabilitation Research and Development, 49(4): 557-566.
10.1682/JRRD.2011.04.0081
CrossRef
Journal of Neurogastroenterology and Motility
Patient-Reported Outcomes in Gastroenterology: Clinical and Research Applications
Spiegel, BMR
Journal of Neurogastroenterology and Motility, 19(2): 137-148.
10.5056/jnm.2013.19.2.137
CrossRef
American Journal of Neuroradiology
Correlation of the National Institutes of Health Patient Reported Outcomes Measurement Information System Scales and Standard Pain and Functional Outcomes in Spine Augmentation
Shahgholi, L; Yost, KJ; Kallmes, DF
American Journal of Neuroradiology, 33(): 2186-2190.
10.3174/ajnr.A3145
CrossRef
Journal of Clinical Epidemiology
Measure once, cut twice-adding patient-reported outcome measures to the electronic health record for comparative effectiveness research
Wu, AW; Kharrazi, H; Boulware, LE; Snyder, CF
Journal of Clinical Epidemiology, 66(8): S12-S20.
10.1016/j.jclinepi.2013.04.005
CrossRef
Journal of Alternative and Complementary Medicine
A Prospective Patient-Centered Data Collection Program at an Acupuncture and Oriental Medicine Teaching Clinic
Marx, BL; Rubin, LH; Milley, R; Hammerschlag, R; Ackerman, DL
Journal of Alternative and Complementary Medicine, 19(5): 410-415.
10.1089/acm.2011.0774
CrossRef
Quality of Life Research
Further validation of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) in the UK veterinary profession: Rasch analysis
Bartram, DJ; Sinclair, JM; Baldwin, DS
Quality of Life Research, 22(2): 379-391.
10.1007/s11136-012-0144-4
CrossRef
British Journal of Mathematical & Statistical Psychology
Limited-information goodness-of-fit testing of hierarchical item factor models
Cai, L; Hansen, M
British Journal of Mathematical & Statistical Psychology, 66(2): 245-276.
10.1111/j.2044-8317.2012.02050.x
CrossRef
Clinical Rheumatology
Classical test theory and Rasch analysis validation of the Recent-Onset Arthritis Disability questionnaire in rheumatoid arthritis patients
Salaffi, F; Franchignoni, F; Giordano, A; Ciapetti, A; Gasparini, S; Ottonello, M
Clinical Rheumatology, 32(2): 211-217.
10.1007/s10067-012-2101-6
CrossRef
Aggressive Behavior
Gender bias in the measurement of peer victimization: An application of item response theory
Bevans, KB; Bradshaw, CP; Waasdorp, TE
Aggressive Behavior, 39(5): 370-380.
10.1002/ab.21486
CrossRef
Bmc Medical Research Methodology
Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study
Riley, BB; Carle, AC
Bmc Medical Research Methodology, 12(): -.
ARTN 124
CrossRef
Clinical Gastroenterology and Hepatology
Patient-Reported Outcomes of Cirrhosis
Kanwal, F
Clinical Gastroenterology and Hepatology, 11(9): 1043-1045.
10.1016/j.cgh.2013.07.006
CrossRef
Journal of Abnormal Child Psychology
Identification of Developmentally Appropriate Screening Items for Disruptive Behavior Problems in Preschoolers
Studts, CR; van Zyl, MA
Journal of Abnormal Child Psychology, 41(6): 851-863.
10.1007/s10802-013-9738-8
CrossRef
Psycho-Oncology
Stigma, perceived blame, self-blame, and depressive symptoms in men with colorectal cancer
Phelan, SM; Griffin, JM; Jackson, GL; Zafar, SY; Hellerstedt, W; Stahre, M; Nelson, D; Zullig, LL; Burgess, DJ; van Ryn, M
Psycho-Oncology, 22(1): 65-73.
10.1002/pon.2048
CrossRef
Journal of Pediatric Psychology
Qualitative Development of the PROMIS (R) Pediatric Stress Response Item Banks
Bevans, KB; Gardner, W; Pajer, K; Riley, AW; Forrest, CB
Journal of Pediatric Psychology, 38(2): 173-191.
10.1093/jpepsy/jss107
CrossRef
Du Bois Review-Social Science Research on Race
MEASURING EVERYDAY RACIAL/ETHNIC DISCRIMINATION IN HEALTH SURVEYS How Best to Ask the Questions, in One or Two Stages, Across Multiple Racial/Ethnic Groups?
Shariff-Marco, S; Breen, N; Landrine, H; Reeve, BB; Krieger, N; Gee, GC; Williams, DR; Mays, VM; Ponce, NA; Alegria, M; Liu, BM; Willis, G; Johnson, TP
Du Bois Review-Social Science Research on Race, 8(1): 159-177.
10.1017/S1742058X11000129
CrossRef
Personality Disorders-Theory Research and Treatment
Integrating Competing Dimensional Models of Personality: Linking the SNAP, TCI, and NEO Using Item Response Theory
Stepp, SD; Yu, L; Miller, JD; Hallquist, MN; Trull, TJ; Pilkonis, PA
Personality Disorders-Theory Research and Treatment, 3(2): 107-126.
10.1037/a0025905
CrossRef
Current Opinion in Neurology
Outcome measurement in neurological disease
Playford, D
Current Opinion in Neurology, 21(6): 649-653.
10.1097/WCO.0b013e328318ecca
PDF (114) | CrossRef
Medical Care
Developing the Patient-Reported Outcomes Measurement Information System (PROMIS)
Ader, DN
Medical Care, 45(5): S1-S2.
10.1097/01.mlr.0000260537.45076.74
PDF (86) | CrossRef
Medical Care
The Patient-Reported Outcomes Measurement Information System (PROMIS): Progress of an NIH Roadmap Cooperative Group During its First Two Years
Cella, D; Yount, S; Rothrock, N; Gershon, R; Cook, K; Reeve, B; Ader, D; Fries, JF; Bruce, B; Rose, M; on behalf of the PROMIS Cooperative Group,
Medical Care, 45(5): S3-S11.
10.1097/01.mlr.0000258615.42478.55
PDF (515) | CrossRef
Medical Care
Item Response Theory Analyses of Physical Functioning Items in the Medical Outcomes Study
Hays, RD; Liu, H; Spritzer, K; Cella, D
Medical Care, 45(5): S32-S38.
10.1097/01.mlr.0000246649.43232.82
PDF (908) | CrossRef
Medical Care
Evaluation of Item Candidates: The PROMIS Qualitative Item Review
DeWalt, DA; Rothrock, N; Yount, S; Stone, AA; on behalf of the PROMIS Cooperative Group,
Medical Care, 45(5): S12-S21.
10.1097/01.mlr.0000254567.79743.e2
PDF (352) | CrossRef
Medical Care
Practical Issues in the Application of Item Response Theory: A Demonstration Using Items From the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales
Hill, CD; Edwards, MC; Thissen, D; Langer, MM; Wirth, RJ; Burwinkle, TM; Varni, JW
Medical Care, 45(5): S39-S47.
10.1097/01.mlr.0000259879.05499.eb
PDF (516) | CrossRef
Back to Top | Article Outline
Keywords:

item response theory; unidimensionality; model fit; differential item functioning; computerized adaptive testing

© 2007 Lippincott Williams & Wilkins, Inc.

Login

Article Tools

Images

Share

Search for Similar Articles
You may search for similar articles that contain these same keywords or you may modify the keyword list to augment your search.