Cullen, Mark R. MD; Vegso, Sally MS; Cantley, Linda MS; Galusha, Deron MS; Rabinowitz, Peter MD, MPH; Taiwo, Oyebode MD, MPH; Fiellin, Martha MPH; Wennberg, David MD; Iennaco, Joanne MS, RN; Slade, Martin D. MS; Sircar, Kanta PhD
One of the unanticipated consequences of the managed care movement begun in the 1980s to control spiraling healthcare costs was the creation of large cohesive insurance claims files on various privately and publicly insured populations fostered by the need to accurately and efficiently characterize services and their attendant costs. The availability of these data sets ushered in a new era for health services research demonstrating geographic differences in physician procedure use and referrals,1,2 racial and gender differences in services,3–5 differential diffusion on new technologies and drugs,1 and wide gaps in quality of care based on published practice guidelines.6–10
These studies, based on strictly administrative data—tests, examinations, and procedures performed but not their results—combined with International Classification of Diseases (ICD) codes used for reimbursement, spawned, in turn, ancillary research on the validity of health inferences compared with inferences drawn from self-report data, medical record reviews, or structured diagnostic examinations.11–13 Although this work continues and technical debate lingers,14,15 the consensus is that the comprehensive and longitudinal nature of these files offers sufficient confidence, at least for their use in health services research.6,16,17
In the special circumstance for which risk factors of etiologic interest might be assessable from the insurance claims files—such as exposure to a drug or other treatment—an altogether different set of possibilities emerged, which were rapidly exploited. Claims on commonly insured populations offer an efficient way to explore treatment effects. Workers' compensation case incident files, long used for other research purposes, are not further discussed in this article compared with “event” registries or other “numerator-only” sets.18 In less than a decade, claims files have become the staple of pharmacoepidemiology.18–20 For occupational health research, however, the most interesting risk factors for epidemiologic study such as industrial hygiene data, and the covariates that might modify or confound them, such as health behaviors are not available in claims files, ostensibly rendering these files of limited use. Combined with other perceived limitations such as reliance on physician diagnostic and therapeutic choices to infer patient condition, losses to follow up when subjects leave employment or retire, and limited access to the files for privacy reasons, occupational health studies using claims files as health end points have been scant.
In the context of a research-service agreement with a large multisite U.S. manufacturing company to assess and reduce sources of disease and injury in the workforce, we have demonstrated that highly relevant databases could be linked to the medical insurance claims data rendering these files of great value for occupational health research. Although the mechanics of linking certain sets has proved more daunting than anticipated, we have proceeded far enough to appreciate both multiple issues about how to best to use these claims files and the trove of possibilities they create. Specifically:
* Compared with mortality records, claims files offer access to disease incidence at a far earlier stage in life;
* Compared with registries (such as tumor registries), they offer health ascertainment for multiple simultaneous outcomes; and
* Compared with longitudinal surveys, they offer the possibility of objective continuous follow up without repeated waves of new data collection.
In theory, any “linkable” factor is amenable to study. On the downside, compared with traditional health outcome measures, claims files introduce a host of novel issues and considerations. In the sections that follow, we describe the use of claims files to demonstrate three simple observations, each previously published or widely proved:
* The relationship between sociodemographic status and incidence of chronic disease;
* The risks of tobacco, obesity, and cholesterol in the risk for chronic disease; and
* The role of airborne fluoride as a cause of asthma in aluminum smelting.
We then discuss the problems raised by the use of claims files to study these relationships and provide some recommendations for their future scientific application.
Materials and Methods
Hourly and salaried employees at 11 geographically diverse aluminum production facilities of a single company were included for study if they worked a minimum of 2 years between January 1, 1996, and December 31, 2003. With the exception of a small number of employees (<3%) who elected coverage from single-provider HMOs, which did not contribute data, all were covered by one of two nearly identical insurance plans with rich benefits, including pharmaceutical and mental health coverage. Over 97% used this insurance annually at least once. A single firm, contracted to manage the data, provided the claims files for this research.
Data on smoking status, education, cholesterol, and body mass index (BMI) were abstracted from the occupational clinic records of the employees at each location based on examinations performed between 2001and 2002. Most locations already use Occupational Health Manager software for this purpose. Smoking was coded in total pack years; BMI was calculated as weight in kilograms divided by height in meters squared. For cholesterol, high-density cholesterol (HDL), measured commercially, was used rather than total or low-density cholesterol because many subjects were on lipid-lowering medications, which impact HDL least.
Environmental exposure data was abstracted from the company's industrial hygiene database managed electronically for 2 decades using an internally developed software system. All routine time-weighted average (TWA) samples obtained for surveillance (done annually by regular protocol per company internal standards) on particulate and gaseous fluoride, total respirable dust, coal tar pitch volatile, and SO2 were summarized for each similar exposure group (SEG) at each study location and linked to individuals performing jobs in that SEG using an internally developed job-SEG dictionary (see “Discussion”).
Employment grade, derived from company personnel files, was assigned based on job at the beginning of the study period, January 1, 1996, or date of hire if later. These files, as well as the occupational health and claims files, were linked by a commonly scrambled employee ID number.
All health end points on study subjects were imputed from the insurance claims files, which included up to three diagnostic codes for every clinical encounter during the time period. Other than ICD-9 and procedure codes, no actual clinical data (such as test results, medical records) were used for ascertainment of disease status. Hypertension (ICD-9 401–405), diabetes mellitus (250), coronary heart disease (410–414), asthma (493), chronic obstructive lung disease (COLD) (490–492, 496), and depression (296–300) were inferred if the code appeared one or more times on a face-to-face encounter, ie, doctor visit or hospitalization.
Each condition was evaluated separately. An employee was deemed a prevalent case if he or she met the criterion for diagnosis for that condition during the first 2 years of their entry into the system, for most subjects 1.1.96–12.31.97 (see previously). If they did not meet criterion for that condition, they were considered disease-free and at risk to become an incident case in subsequent follow-up. Prevalence rate for each condition was calculated as the number of prevalent cases divided by the total number of persons in the population or any stratum. Incidence rate for each condition was defined as the number of new cases occurring over the follow-up period (until December 31, 2003, diagnosis of the condition, or termination) divided by the number of person-years at risk for that condition. Prevalence and incidence rates for strata within the population were directly age-standardized using the 2000 U.S. Census standard population.
Age-adjusted disease rates and confidence intervals within strata by job grade were estimated using SAS. Odds ratios for associations between the individual risks—smoking, BMI, HDL, and education—and health outcomes were performed by simple univariate analysis, adjusting only for age. The odds ratios for environmental risk factors were estimated by Poisson regression, adjusting for age, location, race, and smoking.
The protocol was approved by the Yale University School of Medicine Human Investigation Committee.
A total of 13,456 males and females met inclusion criteria. Of these, 76% were hourly, 11% were females, and 14% were nonwhite (15% hourly and 11% salaried). Of the females, 49% were hourly and 51% salaried, and of the males, 79% were hourly and 21% salaried. The mean age at the beginning of the study period was 46.3 ± 8.7 years. The population was highly stable, with an average loss of 3% from the payroll from the first of any study year to the first of the following year, between 1996 and 2000, rising to 7% in 2001, 9% in 2002, and 12% in 2003, because of two staggered plant closings.
The age-adjusted prevalence for the population for each of the six diseases was as follows: hypertension, 103.2; diabetes, 31.2; coronary heart disease, 29.3; asthma, 25.6; COLD, 32.2; and depression, 30.2 per 1000 subjects. Incidence for the overall study population was: hypertension, 33.6; diabetes, 10.3; coronary heart disease, 8.8; asthma, 10.3; COLD, 15.4; and depression, 13.8 cases per 1000 person-years. Figure 1 shows the prevalence of each of the diseases among males and females in the age range 55 to 64 stratified by job class, hourly or salaried, and based on one or more physician or hospital health claims filed the first 2 available years between January 1, 1996, and December 31, 2003. A stepwise decline in prevalence with rising job grade is evident for most conditions. Figure 2 demonstrates age-adjusted incidence in cases per 1000 person-years for each of the six diseases comparing hourly and salaried males and females based on one or more physician or hospital health claims filed between January 1, 1998, and December 31, 2003. These incidence rates, calculated among those who were disease-free on January 1, 1998, or 2 years after hire (if hired later), illustrate the expected differences in incidence among the subgroups with the exception of the surprisingly high rate of COLD among hourly and salaried females.
Table 1 shows the univariate odds ratios associated with two exposure levels for each of the personal risk factors for hypertension, diabetes, ischemic heart disease, COLD, and asthma. Like the relationships shown previously, these are entirely predictable based on Framingham and subsequent studies.
Table 2 shows the effect on asthma incidence of current TWA exposures to pot room irritants from a recently published paper.21 As can be seen, gaseous fluoride exposure is a potent predictor for asthma after adjusting for the other exposures and smoking.
None of the observations presented are novel. Socioeconomic status disparities in chronic disease incidence and prevalence have been long recognized,22 as have each of the roles of gender, obesity, cholesterol, and smoking in chronic disease. The role of fluoride as a cause of pot room asthma has also been established from previous occupational health studies in the aluminum industry using traditional end points such as lung function tests or questionnaires.23,24 These results were presented rather to demonstrate the use of claims data to duplicate studies that originally required extensive data collection and illustrate issues that may arise as newer hypotheses are tested in this way.
Study Design and Population Issues
Although cross-sectional analyses of our claims files yielded similar results (see Figs. 1 and 2), the most exciting opportunity presented by medical insurance claims files is the prospect of following large, well-defined cohorts. The files offer the possibility, too, for nested case-control or case-cohort design studies under circumstances in which exposure data or covariates of interest are not available for the entire population, so they must be separately collected. Although selection biases may complicate the study of diseases of long latency (see subsequently), claims data files offer a self-contained approach for distinguishing prevalent from incident cases by defining a “run-in” period as we have done during which the absence of a claim for a particular condition may be interpreted as evidence that the disorder is absent (see “Validity Considerations”). With appropriate attention to selection effects, incidence rates for any well-defined disease may be continuously measured without additional health data collection as long as subjects remain covered by the benefit. No other source of health end point data offer even remotely comparable advantages, especially for a relatively young population such as a workforce.
Although our results conform to expectation, there is ample reason to worry about the validity of the health inferences drawn from these strings of administrative data. Validity concerns include several different categories:
* Those arising because of incomplete health information;
* Those arising from information quality;
* Those created by the “inference” about health made from descriptions of service; and
* Those pertaining to uncommon diseases or poorly characterized ones.
First and foremost, validity will necessarily be compromised unless the files for each covered individual reflect a complete, continuous compilation of health encounters over the time period of coverage. To the extent information is randomly missing, misclassification of health status will result, reducing study power. Worse, if data are missing differentially, serious bias could result. For example, if more health-conscious individuals receive more services than others, rates of diagnosed disease may appear spuriously higher among this group. Sources of missing data, and potential for nonrandomness, may relate to the behavior of the covered subjects, aspects of the benefits plan, company procedures, or prevailing regulations. For example, certain categories of care such as workers' compensation care, or care received consequent to a car accident may never enter the file at all. In our study, the most obvious such limitation is imposed by age—employees at 65 years of age, whether retired or not, begin to use Medicare as their primary insurance coverage. This renders our private insurance claims files inevitably incomplete after that age unless linkage is attempted to Medicare files, which is fraught with complexities of its own. Different benefit plans have other significant limits or constraints such as the range of allowed treatments or drugs; inferences based on drug claims would therefore likely reflect underdetection. Still others “carve out” some benefits such as mental health and eye or dental care; unless these coverages are otherwise available to the investigator, the data set is incomplete for any analyses that depend on the completeness of such information. In poorer plans with large copays or deductibles (or for eligible employees who elect more financial responsibility for services in return for lower premiums), some office visits or drugs may be paid out of pocket or using the richer benefits of a spouse. This will increasingly be problematic as many companies switch from previously rich plans to so-called consumer-driven plans with large deductibles and coinsurance, something that did not occur in this company during the years studied here.
Incentives and disincentives may also affect the quality of physician diagnostic coding. For example, in calculating rates of depression, we discovered highly discrepant rates of mental illness diagnoses between locations, but not in the rate of prescriptions for antidepressant medication. We interpret this as due most likely not to differential rates of mental illness or recognition, but that historic mental health carveouts have resulted in underdiagnosis of disorders not previously covered although evidently treated. Any difference in benefit design or practice has potential to create distortions. Although not a problem in our study company, different job grades or geographic locations often receive different benefit options or even if the same on paper. This may differentially impact the covered populations such as willingness to pay for a service out of the program or forego necessary care because of deductible costs.
Another set of validation issues pertains to the diagnostic inference process from the administrative data. Like with death certificates, the “true” cause of death is not directly knowable by investigators using medical insurance claims files, only what has been coded by providers. The lengthy literature on death certificate diagnoses is also germane to interpretation of claims files, including the variable knowledge of the person filling out the certificate, differential likelihood of diagnoses to be correct, regional differences and secular trends in diagnostic preference, the likelihood certain patient premortem risk factors may influence coding choice,25 and so on. One crucial difference between claims files and death certificates is that for the death certificate, at least the patient's health status is certain; in any single claims file, there is no certainty that the eligible subject has any condition, let alone the coded ones; likewise, many who actually have conditions may not seek care.
These initially daunting concerns in drawing inferences regarding clinical status from any single file are largely mitigated by the salient advantage these data sets have over death certificates—they are serial. Put another way, although it may be risky to classify an individual as a diabetic based on the diagnosis coded in association with a blood test for hemoglobin A1C (the result itself unavailable), it would become increasingly reasonable to assume such a diagnosis is correct if rendered by a physician (a so-called “face-to-face encounter”), particularly if on two or more separate occasions, and if combined with the prescription of a drug for diabetes or insulin. Likewise, it is reasonable to infer that if someone, for example, goes 2 years with no code whatsoever for hypertension or diabetes in their medical insurance claims file, then has a spate, that the condition is likely “incident.” The ability to apply and test algorithms on these “strings” of data against medical records, self-report, or actual examinations has provided the basis for some confidence that for common diagnoses, at least, specificity can be made arbitrarily high and sensitivity at least acceptable.20,26,27 For purposes of these studies, we used a single face-to-face encounter diagnosis, sacrificing specificity for sensitivity. Reassuringly, repeat of all analyses requiring two separate face-to-face claims resulted in lower incidence rates by 10% to 30% but almost identical covariation with factors of interest (data not shown). There has been less research on the appropriate algorithm for defining the absence of a chronic condition, ie, how long a claims file should be free of any code, but 2 years appears adequate from the limited literature,26 and our experience tends to support that. This potential for studying the onset of chronic conditions at their outset early in life has enormous and predictable repercussions for etiologic research.
One final issue worthy of mention regarding validity is the use of claims files to study the etiology of uncommon or poorly defined clinical conditions. In a previous study of pituitary adenoma using the files, we discovered too few cases using any stringent algorithm, but when we relaxed criteria to include a wider array of possible codes, specificity, directly tested by complete chart review, plummeted to 50%.28 More common but less well-defined conditions such as regional limb pain or noncancer skin conditions may be so variably coded that multiple codes will have to be combined and the tradeoffs between sensitivity and specificity unpredictable without internal validation against medical records or other reports on the population studied.
Linkage to Exposure Data
None of these issues matter at all unless claims files can be linked to sources of information on exposures whose effects one seeks to study and information on covariates that might potentially confound an association between the exposure and a particular disease. Many large employers maintain such files for various purposes, including personnel files (which may have education, income, demographic profile, residential address, job history, and so on), occupational health screening data files, and industrial hygiene data files. Moreover, many employers survey their workforces often for various reasons, providing the opportunity to embellish what already exists with self-report data such as on psychosocial stressors or perceived health. The rate-limiting factor, however, is linkage. For individual-level data such as those used as the independent variables in the first two presented studies, the linkage is through employee ID numbers (scrambled for privacy). For more ecologic measures such as job hazards, alternative approaches to linkage with personal records are required. In our case, this necessitated construction of a complete dictionary linking jobs in the personnel files (N = >9000) to the defined SEGs (N = 140). Predictably, this task took hundreds of hours of close collaboration between investigators and company Industrial Hygiene (IH) personnel.
Eligibility—entry into a cohort or loss to follow up—creates potentially serious selection problems. These may be caused by changes in the eligibility requirements of a contract, a plant opening, or a layoff, impacting simultaneously entire classes of eligible employees. During our follow-up period, one plant closed and another changed operation, resulting in truncated follow up and loss of power but likely little bias per se. Individual movement in or out of the system, however, most often due to changes in life circumstances or pressures, impacts a highly nonrandom subset of the cohort and may substantially bias statistical inference. From the outset, the covered subjects are “select” in that they have achieved coverage—usually by virtue of new employment, but sometimes by change in family or economic circumstances, change in age, or change in the benefit plan itself. Dropout depends on similar factors—which may be health-related in part—as well as by factors that are directly health-related—such as becoming disabled or prematurely retiring. In almost all commercial plans, like in ours, coverage abates, at least in part, at age 65 because of Medicare. Thus, losses to follow up, like in other cohort studies, must be viewed as likely differential, “survivors” appearing relatively “healthier” over time compared with the original, only partially ascertained cohort. On the positive side, the very same strings of claims over time may be useful for drawing inferences about the degree to which such selective dropout is occurring, at least for measurable health reasons. In other words, one can readily assess whether those leaving employment differ on claims from comparable employees who stay and the impact of this bias can be adjusted for. These selection issues largely preclude the use of claims for etiologic analysis when eligibility status fluctuates wildly, however, like in rapidly turning over workforces or any in which individual qualification for benefits changes often. Thankfully, this problem was minimal in this cohort; average dropout other than for the plant closings was low (see previously).
Underutilization of health benefits, resulting in underascertainment of certain target conditions, creates another analytic hazard. In particular, young males, especially those in lower socioeconomic status occupations, tend to use care sparsely.29 In one sense, this is a validation issue if, for example, one compared claims against examinations. More importantly, however, is the selection effect, because risk factors are often distributed unequally relative to care-seeking behavior—usually inversely so. Fortunately, for many large employers such as ours, the study population is under very close medical surveillance—almost every hourly worker and many of those salaried require an examination yearly or even more frequently for occupational health reasons such as respirator clearance or audiogram. Blood pressure and routine laboratories are typically checked, and although these administrative data are not themselves in the claims files, adverse findings precipitate aggressive referrals to the local medical community whose diagnoses are in the claims files.
One immediate casualty of these selection factors, in combination with problems of data homogeneity within and between different benefit plans, is the ability to meaningfully compare one covered population with any other. As noted, even internal comparisons, for example, between hourly and salaried workers, may be vitiated by subtle differences in benefits or responses to various incentives between members of the compared groups, thus biasing results. External comparison must be made with respect not only to differences in benefits,30 but also the gamut of selection factors that may differentiate the population eligible for Medicaid, for example, or pooled plans of small employers with that eligible for the rich plans of a large multisite employer like our client. Even two “insured” groups covered by the same insurance carrier may differ sufficiently in designs that otherwise comparable workforces should be directly compared only with trepidation.
Finally, in the modern research environment, the discussion of issues related to research using medical insurance claims data would be incomplete without some reference to privacy. Recent federal rules require that Health Insurance Portability and Accountability Act (HIPAA) protection procedures be in place for the transmission, data management, and analysis aspects of any studies conducted with claims files. One readily imagined safeguard would be the deletion of the identifiers such as Social Security numbers and names to allow less encumbered transfers of files and subfiles for analyses and, indeed, we recommend nothing less. However, there are two essential points. First, deletion of the unique identifiers cannot be achieved without establishing a substitute, because the personal identifier is the variable that allows linkage of the claims with each other and with other electronic files such as personnel records, exposure files, and the like. Some form of scrambling algorithm, applied commonly to all individually referenced files, would seem ideal. However, it must be borne in mind that even when these files are stripped of their unique identifier, they still qualify as protected medical records under current regulations because they contain enough unique information to allow identification of subjects; to obviate this, details such as birth date and residence must be either deleted or “generalized” to less specific detail such as age or zip code. Even then, in our era of growing concern about the privacy of medical records, rules about the use of these files may become more stringent, a potential development that researchers should anticipate. Closely related are the human subjects protection considerations. To date, most committees have construed administrative health records—with appropriate protections of privacy—as exempt from the need for consent by the individuals studied on the theory that obtaining them would be unfeasible. Claims records differ from other health files, however, in that, at least in theory, it would be possible to reach the subjects, however many there are, because by definition, all subjects (or almost all) are alive, employed by an identified organization, and readily locatable from data within the set. Although we are unaware of any rulings that have deemed such consent necessary, it is not beyond the imagination that future committees might raise such issues, the reason some of the large research-oriented HMOs—having the advantage over most managed care organizations of a limited set of staff providers responsible for all the data—obtain blanket consents from plan participants to cover the eventuality their files might be included in subsequent studies.
Impediments to the use of health claims files may limit the speed of diffusion into the occupational epidemiologist's repertoire, but these files offer the potential to study the longitudinal evolution of many chronic diseases in large populations with exposures and risks of great interest. Empiric research will ultimately determine the extent to which this potential can be mined as well as elucidate appropriate guidelines for successful applications. In the interim, we invite vigorous discussion and debate, and especially the presentation of diverse efforts exploiting claims files to duplicate old associations and entertain new ones. Such efforts will further elucidate the potential and the limitations of these newly available end points for epidemiologic research in the United States, at least until an alternative system for reimbursement of care supplants the present one.
1. Virnig BA, McBean M. Administrative data for public health surveillance and planning. Annu Rev Public Health. 2001;22:213–230.
2. Wennberg J, Gittlesohn A. Small area variations in health care delivery. Science. 1973;182:1102–1108.
3. Escarce JJ, Epstein KR, Colby DC, et al. Racial differences in the elderly's use of medical procedures and diagnostic tests. Am J Public Health. 1993;83:948–954.
4. Javitt JC, McBean AM, Nicholson GA, et al. Undertreatment of glaucoma among black Americans. N Engl J Med. 1991;325:1418–1422.
5. Vaccarino V, Chen YT, Wang Y, et al. Sex differences in the clinical care and outcomes of congestive heart failure in the elderly. Am Heart J. 1999;138:835–842.
6. Lohr KN. Use of insurance claims data in measuring quality of care. Int J Technol Assess Health Care. 1990;6:263–271.
7. Philbin EF, DiSalvo TG. Influence of race and gender on care process, resource use, and hospital-based outcomes in congestive heart failure. Am J Cardiol. 1998;82:76–81.
8. Banta, D. Developing outcome standards for quality assurance activities. Quality Assurance in Health Care. 1992;4:25–32.
9. Trivedi AN, Zaslavsky AM, Schneider EC, et al. Trends in the quality of care and racial disparities in medical managed care. N Engl J Med. 2005;353:692–700.
10. Vaccarino V, Rathore SS, Wenger N, et al. Sex differences in the management of acute myocardial infarction, 1994–2002. N Engl J Med. 2005;353:671–682.
11. Quam L, Ellis LBM, Venus P, et al. Using claims data for epidemiologic research. The concordance of claims-based criteria with the medical record and patient survey for identifying a hypertensive population. Med Care. 1993;31:498–507.
12. Fowles JB, Fowler EJ, Craft C. Validation of claims diagnoses and self-reported conditions compared with medical records for selected chronic diseases. J Ambulatory Care Manage. 1998;21:24–34.
13. Fisher ES, Whaley FS, Krushat WM, et al. The accuracy of Medicare's hospital claims data: progress has been made, but problems remain. Am J Public Health. 1992;82:243–248.
14. Weiner JP, Parente ST, Garnick DW, et al. Variation in office-based quality. A claims-based profile of care provided to Medicare patients with diabetes. JAMA. 1995;273:1503.
15. Palmer RH. The challenges and prospects for quality assessment and assurance in ambulatory care. Inquiry. 1988;25:119–131.
16. Roper WL, Tolleson-Rinehart S. Health care data and health: from numbers to outcomes. Pharmacoepidemiology and Drug Safety. 2001;10:363–366.
17. Connell FA, Diehr P, Hart LG. The use of large data bases in health care studies. Annu Rev Public Health. 1987;8:51–74.
18. Rodriguez EM, Staffa JA, Graham DJ. The role of databases in drug postmarketing surveillance. Pharmacoepidemiology and Drug Safety. 2001;10:407–410.
19. Spitzer WO, Suissa S, Ernst P, et al. The use of β-agonists and the risk of death and near death from asthma. N Engl J Med. 1992;326:501–506.
20. Rawson NSB. An acute adverse drug reaction alerting system using the Saskatchewan health datafiles. Drug Investigation. 1993;6:245–256.
21. Taiwo OA, Sircar KD, Slade MD, et al. Incidence of asthma among aluminum workers. J Occup Environ Med. 2006;48:275–282.
22. Schwatrz JE. Social inequality, stress and health. In: Blau JR, ed. The Blackwell Companion to Sociology. Malden, MA: Blackwell Publishers; 2001:345–360.
23. Kaltreider NL, ElderMJ, Cralley LV, et al. Health survey of aluminum workers with special reference to fluoride exposure. J Occup Med. 1972;14:531–541.
24. Fritschi L. Sim MR, Forbes A, et al. Respiratory symptoms and lung function changes with exposure to five substances in aluminum smelters. Int Arch Occup Environ Health. 2003;76:103–110.
25. Wells CK, Feinstein AR. Detection bias is the diagnostic pursuit of lung cancer. Am J Epidemiol. 1988;128:1016–1026.
26. Rector TS, Wickstrom SL, Shah M, et al. Specificity and sensitivity of claims-based algorithms for identifying members of Medicare+ Choice Health plans that have chronic medical conditions. Health Serv Res. 2004;39:1839–1857.
27. Hux JE, Ivis F, Flintoft V, et al. Diabetes in Ontario. Diabetes Care. 2002;25:512–516.
28. Cullen MR, Checkoway H, Alexander BA. Investigation of a cluster of pituitary adenomas among aluminum industry workers. Occup Environ Med. 1996;53:782–786.
29. Hibbard JH, Pope CR. Gender roles, illness orientation and use of medical services. Soc Sci Med. 1983;17:129–137.
30. Dreyer NA. Accessing third-party data for research: trust me? Trust me not? Pharmacoepidemiology and Drug Safety. 2001;10:385–388.