Crohn’s disease (CD) and ulcerative colitis (UC) are chronic inflammatory diseases of the gastrointestinal (GI) tract. CD can occur anywhere in the GI tract, whereas UC is localized to the colon. Collectively, these diseases are known as inflammatory bowel diseases (IBD). The etiology of IBD is unknown, although it is thought to arise from a combination of factors. These causative factors include genetic influences, alterations in the gut microbiota, alterations in the innate and adaptive immune system, and environmental exposures. Unfortunately, without further understanding of the etiology of IBD, a prevention or cure of IBD is not possible.
IBD can cause severe ongoing gastrointestinal symptoms, such as diarrhea, bleeding, and abdominal pain. These symptoms can dramatically affect quality of life. Disease can be refractory to medical treatments, and surgery is often needed. IBD is, therefore, a costly, morbid condition for which there is currently no cure. In 2008, direct treatment costs alone for patients with IBD were estimated to be greater than 6.8 billion dollars.1 When considering indirect costs, such as work-related opportunity loss, an additional estimated 5.5 billion in 2009 U.S. dollars needs to be added to this estimate.2 Because of this high burden of disease, legislation has been enacted within the past decade to improve research funding for these diseases and to target further understanding of IBD epidemiology and pathophysiology.
This review will describe current estimates of the incidence and prevalence of IBD in the United States, discuss potentially undercounted populations, and describe the history of government funding for surveillance programs in IBD. Lessons learned from other countries on IBD surveillance will be summarized, as will potential resources that may be used to optimize IBD surveillance in the United States. Finally, a consensus recommendation on the best means of optimizing public health surveillance in IBD will be offered.
EPIDEMIOLOGY OF IBD
In the United States, it is currently estimated that more than 1.4 million people suffer from IBD. Estimates of disease prevalence among adults in the United States are 201 cases per 100,000 persons for CD and 238 cases per 100,000 persons for UC.3 The incidence rates in the United States are approximately 8.8 cases per 100,000 person-years for CD and 7.9 cases per 100,000 person-years for UC, as estimated in the Olmsted County, MN, population.4 When compared with international rates, estimates of CD incidence are highest in North America (20.2/100,000 person-years), whereas the annual incidence of UC is highest in Europe (24.3/100,000 person-years). Europe also has the greatest prevalence of both UC and CD (505/100,000 and 322/100,000, respectively).5 Interestingly, other areas of the world have significantly lower rates of IBD5; however, these rates appear to be increasing in parts of Asia and northern Africa.6 IBD incidence is also increasing in other areas such as Australia7 and New Zealand.8 In these emerging areas, rising rates of UC appear before those of CD.9 Data are not robust on IBD incidence and prevalence in underdeveloped countries. More accurate means of surveillance in these areas are needed.
Certain populations may also be undercounted in the surveillance of IBD in the United States (Table 1). A better understanding of disease rates in subgroups of interest, such as minorities, immigrants, the elderly, and children, is warranted. CD and UC incidence and prevalence are difficult to determine by race and ethnicity status. Those studies that have investigated race and ethnicity in the epidemiology of IBD have compared rates of hospitalization for disease by race, rather than incidence or prevalence as identified in inpatient and outpatient resources.10–15 The estimates range from little difference in the rate of CD and UC between whites and African Americans10,13 to decreased rates of CD and UC for African Americans, Hispanics, Asians, and Native Americans/Pacific Islanders compared with whites in the same population.11,12,14,15 Rates of IBD for migrants to the United States have not been reported, although there is evidence from Southeast Asian migrants to the United Kingdom that migrant populations rapidly assume the incidence rate of the underlying population or assume an even higher rate.16 Evidence for variation in incidence rates of IBD by race among U.S. children is even more rare17 with a similar trend in the ratio of cases of IBD lower among blacks and Hispanics compared with whites.
There are 3 studies reporting the incidence of pediatric IBD in the United States, each using a different data source but with similar overall estimates.4,18,19 In general, data are from a single geographic region and must be extrapolated to other areas of the United States. Data for African Americans are sparse, and none of the reports are from the southern United States. In the Olmsted County population, Loftus et al4 estimated the incidence of pediatric CD (4.8/100,000 person-years) and UC (3.2/100,000 person-years) from 1990 to 2000. Adamiak et al18 used data from Wisconsin, where patients with IBD generally receive their care within the state, to calculate the incidence of IBD from 2000 to 2007, with similar estimates found for both CD and UC (6.6/100,000 person-years and 2.4/100,000 person-years, respectively). Both Olmsted County and Wisconsin are predominantly whites; therefore, it is unclear whether these estimates are applicable to other races/ethnicities. In the Kaiser Permanente integrated health plan, data were somewhat different, perhaps related to a different mix of race/ethnicity than other parts of the United States (2.7/100,000 person-years for CD and 3.2/100,000 person-years for UC).19 The prevalence of pediatric IBD in the United States has been studied in these populations and in nationwide health claims data. Prevalence estimates of pediatric IBD are relatively consistent, approximately 40/100,000 for CD and 28/100,000 for UC.3,19,20 Again, data on racial minorities and certain geographic regions of the United States are limited. These limitations demonstrate a need to include data from all geographic regions and ethnicities in the United States.
HISTORY OF IBD SURVEILLANCE
The Crohn’s and Colitis Foundation of America (CCFA) began a collaboration with the Centers for Disease Control and Prevention (CDC) in 2002 to increase the awareness of and funding for IBD epidemiologic research in the United States. In 2003, the Inflammatory Bowel Disease Act was introduced. This act called on the National Institutes of Health (NIH) to expand research in the arena of IBD. Within this act was a provision for the CDC to establish a national program of prevention and epidemiology to determine the prevalence and incidence of IBD in the United States and to explore how practice variations among community physicians affected patient outcomes. Specific funding of 5 million dollars per cycle was provided to accomplish these goals. The components of this act were combined with other legislation in the Research Review Act of 2004. The bill was unanimously passed by the Senate on November 16. President Bush signed the Research Review Act of 2004 into law on November 30.
In its first cycle of IBD surveillance funding, CDC epidemiologists worked with CCFA and Kaiser Permanente to better understand the epidemiology, incidence, prevalence, and outcomes of IBD.19–23 Researchers also worked to determine factors that predicted the course of disease. In the second cycle, the funding went to a unique state-based inception cohort of IBD, the Ocean State Crohn’s and Colitis Area Registry (OSCCAR).24 How best to study IBD prevalence moving forward is of utmost importance.
CONFERENCE ON IBD SURVEILLANCE
There is currently no unified surveillance system for IBD in the United States. Thus far, estimates of incidence and prevalence in the United States have come from regional cohorts and administrative data. To determine the best use of future funding for IBD surveillance, the CDC and CCFA convened a meeting of international experts in IBD epidemiology in February 2013, in Philadelphia, PA. The group was challenged to develop and pilot the prototype for a national surveillance program for IBD. The charge was to gather national prevalence and incidence data, perhaps by using an existing surveillance system.
Before developing a surveillance system for IBD, the group determined that the implications and goals of surveillance in IBD must be considered. Public health surveillance involves the ongoing and systematic collection, analysis and interpretation of health data, and the timely dissemination of these data to those responsible for preventing and controlling the condition.9 The specific purposes of a public health surveillance system are to (1) formulate public health policies, (2) identify high-risk populations, (3) target interventions, (4) evaluate progress in disease prevention, and (5) identify environmental risk factors contributing to disease.
In a disease group such as IBD, not all of these components of surveillance apply. For example, with an as of yet unknown specific etiology, targeting interventions and prevention is not yet possible. However, an accurate estimate of the prevalence of IBD remains important to inform public policies, particularly in populations for whom little data are available (children, racial and ethnic minorities, and immigrants). Surveillance of IBD and other disease processes can help to track healthcare resource needs moving forward. Clinical data beyond estimates of incidence and prevalence are needed, to understand the economic-, population-, and individual-level burdens of IBD.
CHALLENGES OF IBD SURVEILLANCE IN THE UNITED STATES
There are numerous challenges within the United States that have an impact on the creation of an IBD surveillance program. In general, unlike in other countries, there is no unique identifier with which to follow individuals through the fragmented healthcare system. This fragmentation of health care, with multiple separate systems of commercial- and government-based health insurance, allows for the possibility of underrepresentation of certain groups in the data that are available and an inability to unify longitudinal tracking of disease course for any individual. In particular, minorities and children may not be adequately represented. Problems that are more specific to IBD include the lack of hard outcomes such as mortality, the fact that IBD is not a “reportable” condition, and the important problem that a diagnosis of IBD itself is not straightforward without an available gold standard. Laboratory testing for IBD is not considered reliable or accurate. As IBD is a collection of relatively rare entities and there is a wide range of severity of disease, there is the potential for low levels of disease remaining unrecognized below the diagnostic threshold.
There are several potential solutions to these challenges. Importantly, established sources of data could be used to establish a surveillance infrastructure for IBD. These potential data sources include data from established population-based surveys or cohort studies, provider surveys, facility-based surveys, state-based registries, administrative healthcare data, electronic medical records and reporting systems, and Internet-based surveys. During the course of the CCFA/CDC meeting, experts considered each of these sources as a possible solution to the problems of creating a national IBD surveillance program in the United States, with emphasis on feasibility, cost, value added, and quality of the data. Experts also reviewed international surveillance systems already in place, which could provide important lessons for the development of such a system in the United States.
POTENTIAL METHODS OF SURVEILLANCE
Lessons From Elsewhere
There are a number of established registries that have been effectively used to provide insight into the epidemiology of IBD. The paragraphs that follow describe these important resources and how they been used to advance our understanding of IBD.
Scandinavian Health System
The Swedish National Board of Health and Welfare has collected individual-level data since 1964. The registry includes data on personal identification number, date of birth, dates of hospital admissions, hospital departments, and discharge diagnoses. The registry was expanded in 1997 to include data on ambulatory surgery and again in 2001 to include data on hospital-based outpatient clinic visits. In 2005, the Swedish Prescribed Drug Register was established. The register contained data on dispensed prescriptions including such factors as the date of redemption, the amount of drug, and the dosage. Identifying data are missing in less than 0.3% of all items. Other notable Scandinavian registries include the Medical Birth Register Database, the Cause of Death Registry, and the Swedish Cancer Registry. The Medical Birth Register contains detailed information on prenatal care, delivery, and neonatal care. The Swedish Cancer Registry maintains data on sex, age, site of cancer, histology, stage of tumor, and date of death from cancer.
These data have been used in numerous studies investigating various aspects of IBD, beyond incidence and prevalence. In particular, several studies on malignancy risks over time in those with IBD have come from these data sources.25–27 The primary advantages of these registries include the large sample size, the longitudinal data of individuals followed up over a long time period, and availability of prescription, cancer, and cause of death information. The ideal types of studies to perform using these data include natural history studies and outcomes research in IBD. The primary disadvantage to these data is the lack of detailed and updated information on many lifestyle characteristics.
Scandinavian registries are ideal to evaluate the secular trends in incidence and prevalence of IBD. The presence of a unique identification number for every individual living in Sweden allows various registries to be merged and used for future research. Unfortunately, in the United States, it is impossible to implement such a system because of the lack of unique personal identifiers within our fragmented healthcare system.
The Calgary Health Zone is a population-based health authority under a public, single payer system and provides all levels of medical and surgical care to the residents of the city of Calgary and over 20 nearby smaller cities, towns, villages, and hamlets. The estimated population of the Calgary Health Zone is over 2 million people. Administrative data are collected within the province. As these data are identifiable, they can be linked to other databases or to the electronic medical record to validate disease exposures and outcomes. The Data Integration, Measurement, and Reporting (DIMR) hospital discharge abstract administrative database captures all hospitalizations in the Calgary Health Zone of Alberta Health Services, Canada. The DIMR database contains 42 diagnostic and 25 procedural coding fields. The International Classification of Disease, Ninth Revision (ICD-9) was used up to 2001, whereas the ICD-10-CA (Canadian adaption of ICD-10) and the Canadian Classification of Health Intervention coding were used after 2001. Within the province, prevalence and inception cohorts were developed, with age- and sex-matched controls. Validation studies were performed (through chart review) to confirm cases and exposures. Validation studies are essential to evaluate biases in an administrative database. For example, in a recent validation study from within this cohort, administrative data identified the same risk factors as chart review but overestimated the magnitude of risk.28
This data source has been used in the studies of IBD, including outcomes, risk factor analysis, surveillance of comorbidities or complications, tracking of infections, health services utilization, and health economics/costs.29–32 For example, the incidence of IBD has been reported to be 25 per 100,000 person-years from 2003 to 2011 in this region (equating to 800 new cases per year). Data from this cohort have also shown that risk factors for postoperative mortality in patients with IBD include age and disease severity.33 The advantages to the use of administrative data as an initial data source include the large sample size, the ability to completely capture any billed aspect of disease management (such as surgery), and detailed information on costs. The disadvantages include the need for validation of individual-level data through chart review and the costs and time involved to do so. Detailed clinical information, such as phenotype, disease activity, and biological samples, are not available within administrative data, unless linked to other repositories. Unfortunately, because of the inability to gain access to identifiable data within conglomerate administrative data in the United States, this type of prospective inception/prevalence cohort is not possible. Median time on a given commercial health plan in the United States is only a few years. As we do not have the ability to track an individual across plans, long-term outcomes data would not be available.
The University of Manitoba IBD Epidemiology Database (UMIBDED) and the University of Manitoba IBD Research Registry (UMIBDRR) were established in 1995 in the central Canadian province of Manitoba (population of 1.27 million in 2012). Manitoba Health is the single health insurance provider in the region. Each resident of Manitoba has a personal health identification number (PHIN) dating back to 1984 through which every health system encounter is logged with a diagnostic code (ICD-9 for all outpatient encounters and inpatient encounters until 2003 and ICD-10 for inpatient encounters starting in 2004), a tariff that identifies what service was provided and a provider ID number.
A population-based database of all Manitobans was established applying a validated (by self-report and chart review) administrative definition of IBD to the Manitoba Health database. A case is identified in the Manitoba Health database as being an IBD case if there are at least 5 distinct encounters with a code for IBD. Alternately, if the person is in the database for less than 2 years, then at least 3 distinct encounters are required. This database is updated regularly to provide contemporary data on all Manitobans with IBD. A matched cohort drawn from the Manitoba Health database was also established during the same timeframe. The matching was 10:1 on sex, age, and geographic location (by postal code). In 1995, the Drug Program Information Network was established, which identifies all outpatient prescriptions for all residents, and this can be linked to the UMIBDED by the PHIN. The UMIBDED is anonymous and scrubbed of any subject identifiers. Factors that can be identified include gender, age, disease duration, disease diagnosis (CD versus UC), comorbidities, healthcare utilization, hospitalizations, surgeries, endoscopies, imaging procedures, and area of residence. Factors that cannot be identified include disease phenotype and disease activity.
This database has been used to identify epidemiology outcomes, disease outcomes, comorbidities, healthcare utilization, and direct healthcare costs. At the time the database was developed, the UMIBDRR was also formed by enrolling persons who responded to an initial mailing from their doctors’ offices. This Research Registry has been intermittently updated since 1995 through mailing to recently diagnosed subjects, identified as such by the administrative definition applied to the Manitoba Health database. This Registry has been used to enroll patients in several studies including the Manitoba IBD Risk Factor Study and the Manitoba IBD Cohort Study among others. The UMIBDRR is a population-based cohort that can be accessed for studies where personal data and disease characteristics can be more fully explored. Further interventions can be undertaken on these subjects.
Numerous publications have arisen from this population-based cohort, including studies of risk factors for disease, malignancy, disease complications, and disease outcomes.34–43 The UMIBDED is an administrative database and subject access or intervention is not possible. The UMIBDED is ideal for surveillance of incidence and prevalence data, burden of comorbidities, healthcare utilization and pharmacoepidemiology studies. Its main limitation is the lack of patient-level data such as phenotype, disease activity, and response to medications. The PHIN to link individuals through various aspects of the healthcare system and other registries makes this type of cohort possible and is currently not available in the United States. In addition, the region has 1 insurer, allowing for greater confidence in complete capture of incident cases, which is not the case in the United States, where there are many insurance providers and individuals without any form of insurance coverage.
Unlike some other European countries, there is no national disease registration system in France. In 1988, the Epidémiologie des maladies inflammatoires chroniques de l'Intestin en France (EPIMAD) registry was established as a means of tracking incident cases of IBD, risk factors, and outcomes. The registry covers an area in Northern France with over 6 million inhabitants (9.3% of the French population). This cohort study uses interviewer practitioners in the offices of 262 adult and pediatric gastroenterologists who work in private practice (n = 168), general practice (n = 78), or university hospitals (n = 16) within the catchment area. Only patients who are residents in the defined study area at the time of diagnosis are included. Each gastroenterologist reports all patients undergoing first consultations for symptoms compatible with IBD. For each new case, the interviewer practitioners collect all available information from the chart in a standardized format. These data include such factors as age, gender, year of diagnosis, time between onset of symptoms and diagnosis, and clinical, radiological, endoscopic, and histological findings at the time of diagnosis. Final diagnosis of IBD is then made by 2 expert gastroenterologists and recorded as definite, probable, or possible, according to the validated criteria.
This database has been used in many studies of incidence, risk factors, disease complications, disease phenotype and progression, natural history of disease, and genetic studies.44–51 Important findings have arisen from this cohort, such as the description of a generally mild disease course in elderly-onset IBD.45 The strengths of this cohort include the population-based nature of the cohort, the prospective data collection, the sample size (now with more than 15,900 cases of confirmed IBD), and the detailed level of clinical information (such as phenotype and medication use). This cohort can also be used for recruitment into interventional studies as well. The disadvantages of this registry include the limited timeframe for assessment (registry initiated in 1988), no prospective data collection following the diagnosis, the fact that detection of incident cases depends on a good collaboration with private gastroenterologists, labor intensive case detection, and diagnosis confirmation, and the costs are quite high (annual costs of approximately 300,000 euros). This type of incident disease cohort has been established in the United States (OSCCAR) in Rhode Island over the past 5 years. The limitations to using this type of cohort for U.S. epidemiologic surveillance include the small sample size, the limited geographic coverage (therefore, limiting inclusion of underrepresented racial and ethnic minorities), and the costs involved in development and maintenance of the cohort.
Administrative Data: Commercial Health Plans
Administrative claims data from within the United States can be leveraged for robust epidemiological research. These data consist of the computerized encounter data submitted by providers (physicians, hospitals, nursing homes, home health agencies, and pharmacies) to the payers. Data users can then use these de-identified claims to perform research. There are numerous commercially available data sources within the United States, such as data from a single payer (such as United Healthcare [Ingenix]) or pooled health plan data (such as IMS Health Lifelink, MarketScan, or Healthcore). There are also data from government insurers, namely Medicare and Medicaid.
There are several important advantages to these data. First, these data have great power and precision related to the extremely large sizes of the plans. The data are longitudinal, allowing individual to be followed over time, as long as they are insured by the same entity. The data are relatively complete and encompass care provided to individuals across all settings, provided that claims were submitted for payment. These data, therefore, represent “real world settings.”
There are also many potential pitfalls, or disadvantages, to using these data. Most importantly, these data are not collected for the purpose of research (or for clinical care). Therefore, administrative data lack clinical details not captured by procedure or diagnosis codes. In the case of IBD, administrative data do not capture phenotype, extent, symptoms, or disease severity. Other important clinical characteristics are also missing, such as body mass index, race/ethnicity, or smoking status. Validation of exposures and outcomes is not always possible. Although individuals can be followed longitudinally, a limitation to the use of these data is that the median time period for health plan enrollment in the United States is relatively short. For example, the median duration of enrollment in IMS Health data is 2.5 years per individual. Therefore, studies of long-term disease/treatment outcomes (such as malignancy) can be limited. Within these data, it is also not always clear whether an individual died or left the health plan; it is only apparent that they are no longer within the dataset (which limits studies of mortality).
Health plan data are excellent for estimating the prevalence of disease. Various definitions of IBD within administrative data have been used across these studies. In a northern California population, Liu et al52 found a definition consisting of any 2 separate visits associated with an ICD-9 code for IBD to have a positive predictive value (PPV) of 95% when compared with chart review. Herrinton et al22 investigated a random sample of 1.8 million members of 9 integrated healthcare organizations using several different combinations of ICD-9 codes and medications to identify patients with IBD. The final algorithm included at least 1 ICD-9 for CD or UC and at least 1 dispensing of a 5-aminosalicylate medication. The prevalence per 100,000 in 1999 to 2001 was 129 for CD, 191 for UC, and 388 for IBD overall. There was a 3-fold variation in prevalence by plan.22 Prevalence will vary based on characteristics used to define IBD, including the run-in period. There are differences in the Kaiser-insured population (all-inclusive closed plan), when compared with other U.S.-insured populations, therefore limiting the generalizability of these definitions across all U.S. commercial health plan data.
An updated prevalence study by Kappelman et al used data from Pharmetrics Choice Database (now IMS Health), which covered over 44 million lives. A somewhat more specific algorithm was used, requiring at least 3 healthcare contacts with a diagnosis code of IBD or 1 ICD-9 code for IBD in combination with 1 IBD-specific medication. Prevalence was standardized to the age, gender, and regional distribution of the 2009 U.S. census. Kappelman et al found a prevalence of pediatric and adult CD to be 58 and 241 per 100,000, respectively. Pediatric and adult UC were relatively similar, with somewhat lower pediatric rates and higher adult rates, as expected (34 and 263/100,000, respectively). In the various studies of IBD prevalence in the United States since 2000, overall reported prevalence rates have been in a similar range (96–190/100,000 for CD and 156–214/100,000 for UC). Determination of incidence within health plan data is more difficult, particularly related to the low overall median health plan duration of coverage (2.5 yr). Incidence data cannot be generated with such short time spans. As an extended period with no previous claims for IBD is needed to establish truly incident disease (rather than prevalent disease with low utilization), a limitation to the power of these datasets is the need to select only those individuals with longer periods of coverage. Kappelman et al limited their analysis to those with longer coverage time and found an incidence of 20 per 100,000 person-years for CD and 34 per 100,000 person-years for UC (unpublished data). These rates are similar to those reported by Loftus et al4 in their population-based cohort in Olmsted County from 2001 (31–71/100,000 person-years for CD and 18–31/100,000 person-years for UC). Therefore, it may be possible to use administrative claims data to establish incidence with the appropriate methodological considerations. In summary, de-identified health plan data are readily available, reasonably current (12-mo delay), and are fairly inexpensive to access and to work with. With these data, misclassification will happen, and definitions need to take into account the likely direction of the bias. Algorithms for the definition of incident IBD could potentially be informed and calibrated by data from other sources.
These types of data could be leveraged as a component of a surveillance system for IBD. A surveillance system could partner with selected health plans to achieve the desired population sampling. In essence, this could become “real-time” surveillance, beginning with the first claim. The infrastructure and costs to support such a system would be quite high, and it may not be plausible given the need for chart review to validate cases. Therefore, the use of administrative data would likely need to be combined with other approaches to reach the goals of surveillance in IBD.
Administrative Data: Medicare and Medicaid
Medicare and Medicaid are organized broadly under the Center for Medicaid and Medicare Services. Medicare is organized under the federal government and Medicaid is organized individually under the state governments within federal guidelines and oversight. Therefore, these data do represent a central repository of national data. However, these data only cover a segment of the population.
In 2000, there were 39.7 million people enrolled in Medicare and this increased to 50.7 million in 2012. Criteria for coverage under Medicare include those older than 65 years, those younger than 65 years with certain disabilities, and all those with end-stage renal disease. The selected breakdown within the 2012 data includes 8.8 million disabled younger than 65 years, 17 million older than 75 years, 5.1 million African American, and 3.4 million of other race/ethnicity. Medicare is divided into 4 components: part A is hospital insurance; part B is outpatient insurance; part C is Medicare Advantage, with services through provider organization; and part D is prescription coverage, which started in 2006.53
Medicaid is a highly used program and is steadily growing. For example, in 2000, there were 34.5 million enrollees, and in 2012, there were 56.6 million enrollees. After the implementation in 2014 of the Patient Protection and Affordable Care Act, an additional 17 million enrollees are expected. Overall, Medicaid enrollees represent 22% of the U.S. population.53 Regarding Medicaid eligibility, all groups include low income as a criterion, and other qualifications include disability, pregnancy, children, and elderly individuals. Regarding race and ethnicity, there were 13.6 million African Americans, 15.5 million Hispanic, and 1.9 million Asian enrollees. Each state determines the type, duration, and scope of services it will provide, within broad federal guidelines. Delivery of care occurs in 2 separate models, including fee for service and through managed care organizations.
Medicare and Medicaid data are used for a broad range of projects. One important aspect of these data is the potential to link them to other registries to enrich the data. For example, these data can be linked to Surveillance and End Epidemiology Results (SEER) to perform studies of cancer outcomes. Shaukat et al linked a SEER catchment area to Medicare claims data parts A and B. The authors reported a 93% success rate for linking. The authors also investigated IBD as a risk factor for colorectal cancer (CRC) in older individuals54 and risks of small bowel adenocarcinoma among older individuals with CD.55 Other linkage studies did not focus on IBD but were able to link Medicare or Medicaid data to other U.S. government registries or disease-based registries.56–58
It is also possible to combine Medicare and Medicaid data with other data sources to enrich data or augment the power. For example, 1 group combined 4 data sources, Medicare/Medicaid databases, Tennessee Medicaid, pharmacy benefits for Medicare in certain states, and Kaiser Permanente data, to investigate the cancer risks associated with antitumor necrosis factor (anti-TNF)-alpha agents in chronic immune-mediated diseases. The authors used ICD-9 codes for identification and a variety of methods in combining data regarding cancers and regarding infectious complications.59,60 There are also state-based data that can be leveraged to perform research. For example, in California, there is a research institute (California Medicaid Research Institute) within the MediCal program that focuses on research relevant to health policy and healthcare decision makers. As of yet, state programs such as this do not focus on surveillance or disease-specific states. IBD researchers have used state government Medicaid repositories to investigate the incidence and risk of intestinal and extraintestinal complications over a 5-year period in IBD.61 ICD-9 codes were used for identification of patients, requiring 2 claims for IBD, similar to previous studies using administrative data.
These data currently have an undefined role in surveillance. However, these data could form an aspect of a broad approach to surveillance. CMS data are the closest thing we currently have to a “national” population-based database. As data are in place, the costs may be less than those associated with initiating a surveillance program from scratch. In addition, these data include historically underrepresented populations, including vulnerable populations such as children. The limitations include those of administrative data, without the ability to validate data unless linked to another data source. Clinical characteristics such as disease phenotype are not available. The population is limited, and it is not representative of the U.S. population at large. These data have a high prevalence of elderly individuals and low-income individuals. If these data are used in combination with other commercial administrative claims data that include Medicare or Medicaid managed care programs, it is possible that individuals would be double counted or miscounted. In all, these data could not stand alone as a surveillance program for IBD but could play a role in such a program.
Closed Panel Health Maintenance Organization Data (Kaiser Permanente)
Kaiser Permanente Northern California (KPNC) is a closed, staff model, integrated healthcare delivery system with capitated payment founded in 1945 serving Sacramento and the Bay Area. Today, it is 1 of the largest healthcare delivery organizations in the United States. KPNC provides healthcare services to approximately 3.3 million members, about 30% of the population in 14 Northern California counties. Based on data from the 2003 California Health Interview Survey, the KPNC adult membership is very similar to the non-KPNC population regarding education and health, differing by having lower percentages of white non-Hispanics and lower percentages of members with very low and very high household incomes. The KPNC membership is similar to the general population regarding race/ethnicity but seems to be better educated, in better overall health (as perceived by survey participants), and have lower rates of smoking.
The KPNC Division of Research (DOR) was established in 1961 with a mission to understand the determinants of illness and well being and to improve the quality and cost effectiveness of health care for Kaiser Permanente members and society at large. DOR conducts epidemiological and health services research, including technology assessment, and serves as the principal consulting resource for health services, health disparities, and clinical studies for KPNC. The DOR has validated algorithms for IBD case finding within their computerized clinical data. These definitions were previously validated against chart review. Cases of IBD have been identified and followed, allowing for publication of studies on natural history, incidence and prevalence, disease complications and outcomes, medication safety, and mortality.20,22,23,62–65
Large computerized data systems are essential for studying rare diseases such as IBD and they represent real-world experience. Kaiser Permanente’s population is large, diverse, and generally representative of urban Northern California. Many of the variables used in the IBD research program have been validated. In addition, the automated healthcare data used for this study have relatively long follow-up, allowing greater capture of the patient’s history. The setting and resources are excellent for assessing incidence and prevalence and for studying healthcare processes and outcomes. There are also limitations to this data source. These important limitations include incomplete information on the patient’s clinical history; interval censoring of patients, with patients cycling into and out of enrollment; the use of algorithms with levels of validity that are not adequate to meet a study’s aims. In addition, information on the incidence date and disease severity and activity are not recorded as structured data in the computerized system, although these data can be obtained from the interpretation of text in progress notes.
Kaiser Permanente’s computerized data have an important role to play in understanding the incidence and prevalence of IBD, including patterns in relation to sex, age, race/ethnicity, and calendar change over time. However, regarding a national IBD surveillance program, this model is not feasible as the majority of U.S. health plans are not closed, capitated health plans as in the Kaiser Permanente model. Other forms of administrative data within the United States are not identifiable and, therefore, cannot be linked to medical records for validation. A proposed national IBD surveillance program may include a component of administrative data, such as that within Kaiser Permanente.
Regional Longitudinal Cohort (Olmsted County)
The Rochester Epidemiology Project (REP) is a medical records linkage system that has existed since 1966 and captures healthcare information on all residents of Olmsted County, MN.66–69 The project provides computer linkage for medical diagnoses and surgical procedures (through ICD-9 and other codes) for all healthcare encounters including hospitalizations, emergency room visits, outpatient visits, and nursing home care. The project has access to all records, including ambulatory care, within the county. The REP exploits the fact that virtually all healthcares in the county is provided by 1 of 2 integrated healthcare systems. The project now includes over 500,000 individual patients (1966–2010), with over 6 million years of follow-up. These data identify the full spectrum of disease. There are over 1800 epidemiologic reports on acute and chronic disease. The schema for development of an individual study with the REP includes retrieval of data using codes and dates, medical record review, and then performance of the study within these validated, abstracted data. This methodology has been used to develop an inception cohort of IBD.4,70,71
Case definitions of IBD within this data source can include clinical characteristics, because of the highly reliable level of clinical data available. For example, the case definition for CD includes a finding from at least 2 of the following categories: specific symptoms, endoscopic findings, radiologic findings, surgical findings, or pathologic findings. For UC, the definition includes 2 studies separated by 6 months including endoscopy findings of diffuse granular or friable mucosa with continuous involvement on endoscopy or barium study. Examples of IBD studies within this data source have included several incidence and prevalence studies. Loftus et al4 showed incidence rates of both CD and UC in Olmsted County from 1940 to 2000 to peak in the 20- to 29-year-old age group. Temporal trends showed an increase in the incidence of IBD over time in this cohort, particularly for CD. Other factors addressed in studies from within this cohort include costs,72 malignancy risks,73 fistula risks,74 bone fracture risks,75,76 responses to medical therapies (such as corticosteroids),77 progression of intestinal complications,78 requirement for surgeries,79 and overall survival.80 In a survival study, Jess et al demonstrated that Olmsted County residents with UC had similar survival to expected rates, actually trending toward better than expected. There were significantly fewer cardiovascular deaths, perhaps related to smoking status.80 In CD, Olmsted County residents also had similar survival to expected rates, with a trend toward worsening survival. Approximately one-third of deaths are related to GI causes.
The data within the REP have been shown to be representative of Minnesota and the upper Midwestern United States. However, these data are not representative of the United States as a whole. There are fewer non-white and Hispanic individuals than in the U.S. population. Individuals in the REP are also more highly educated and have higher incomes than the U.S. population.68 Although the data are rich in clinical characteristics, they cover a limited geographic area and cannot be used alone to form the basis of a surveillance program. This type of data source is ideal for population-based studies that allow for description of the full spectrum of disease. There is the potential for recognition of “high-risk” patients early in disease because of the level of clinical data available. Because of the relatively small sample size, it can be difficult to study low-frequency complications. Although these data could not stand alone as a surveillance program in IBD, they would play an integral role in such a program by providing such a high level of clinical data and providing the necessary information on prognosis of disease. These data could be combined with other surveillance methods to answer important questions and enhance personalized medicine in IBD.
The OSCCAR is a prospective inception cohort of IBD. The broad spectrum of IBD has not been well represented at cohorts of convenience at tertiary care centers. Retrospective cohorts do not permit for unbiased assessment of exposures, risk factors outcomes, or biobanking of baseline specimens. Rhode Island was chosen as the site of this inception cohort, because it is a densely populated area with care generally delivered within this state. The population is 10-fold larger than that of Olmsted County in Minnesota, and there is also somewhat greater ethnic diversity. The goals of this cohort were to estimate incidence, create models of risk stratification, and enhance studies of disease causation and susceptibility.
There is a complex biopsychosocial model of factors that influence prognosis in IBD, and understanding these components can likely only be done in a prospective fashion. To understand prognosis, data collection needs to include detailed genetic, phenotypic, environmental, and therapeutic factors (Fig. 1). For example, in the literature, factors such as corticosteroid use at first flare, age at diagnosis, smoking, perianal fistulas, disease extent, disease location, severity on endoscopic measurements, and serologic reactivity to microbial antigens have been associated with prognosis.
The study design includes academic detailing of practices, home visits to patients, a combination of prospective and retrospective data collection, and central reading of pathology and imaging. Data collection comes from the patient, with confirmation in medical records through direct chart abstraction. Data collection from the patient allows for collection of patient-reported outcomes, such as scales of health-related quality of life, work productivity, and specific food frequency questionnaires. Specimens are also collected as baseline and follow-up visits.
Thus far, data from this cohort show the incidence of IBD in Rhode Island by age from 2008 to 2010 to be similar to that reported in Olmsted County, with a peak at age 20 to 29 years. The advantages to this type of inception cohort are myriad, with detailed information on clinical characteristics, patient-reported outcomes, validated disease, and exposures from clinical record review. The results will yield important results on prognosis of disease. However, there are numerous limitations as well. The cohort is small, there is the potential for incomplete capture because of the clinical care sought out of state, the population is not necessarily generalizable to the remainder of the United States, and the effort is costly. However, as long as the range of relevant variables is covered in a regional cohort, getting representativeness across the United States is an unnecessary goal. This type of inception cohort answers a series of very important questions but cannot stand alone as a surveillance system in IBD.
Nationwide Inpatient Sample
The Nationwide Inpatient Sample (NIS) is part of the healthcare utilization project sponsored by the Agency for Healthcare Research and Quality. It comprises all discharge records from a 20% stratified sampling of acute care hospitals in the United States since 1988. This sample is representative of approximately 5000 hospitals included in the American Hospital Association Survey but does not include Veterans Affairs and federal hospitals. Hospitals are sampled from within 60 strata based on geographic region, urban versus rural location, teaching status, hospital bed-size, and ownership. The number of states that contribute data to the NIS has gradually increased from 8 states in 1988 to 45 states in 2010. Data that are collected in the NIS include demographic information on age, sex, race and ethnicity, health insurance payer, and neighborhood income. Clinical fields in the database include type of admission (emergent versus elective), primary and secondary diagnoses and procedures, length of stay, inpatient charges, and disposition on discharge. The NIS also contains data on hospital characteristics geographic location, rural status, teaching status, and hospital bed size.
The NIS has been analyzed for many IBD-related studies including descriptive epidemiology of hospitalizations and surgery.1 This analysis has been performed on underserved populations such as the uninsured, who are included in few other databases.2 The NIS has been used to describe variation in utilization of surgery among geographic regions and racial/ethnic groups.3,4 It has also been used to characterize the impact of admission and procedural volume on clinical outcomes in the IBD population.5,6 In addition, NIS data have been used to describe the increased prevalence of Clostridium difficile, methicillin-resistant Staphylococcus aureus, and venous thromboembolism in hospitalized patients with IBD compared with general medical inpatients. Associations between these factors and mortality in IBD have also been reported.7–11
Among the key strengths of the NIS is its large national sample size spanning over 2 decades, which enables time trend and group analysis even in a relatively less common condition such as IBD. The NIS also contains racially and ethnically diverse data from all insurance payer types, particularly the uninsured. However, it has several important limitations revolving around the lack of personal identifiers. Consequently, the validity of IBD diagnosis codes in the NIS cannot be confirmed by medical records. Although inpatient diagnosis codes for IBD have been shown to be quite good in Canadian and Veterans Affairs (VA) databases, this validation is not necessarily generalizable to the NIS.12 Any potential misclassification, even nondifferential, may compromise findings from descriptive epidemiology studies. An additional limitation is the high rate of missing racial and ethnic data, which approaches 25%. However, the missing racial/ethnic data are specific to certain states that do not collect them. Thus, exclusion of these states from analyses does not necessarily introduce bias.
Despite these limitations, the NIS may be useful for identifying time trends in IBD-related hospitalizations, surgeries, and other clinical outcomes. Systematic errors in IBD coding are not expected to change year-to-year; so, relative changes in hospitalizations and surgeries are likely to be valid and can be used to identify sentinel increases in those outcomes. Increases in hospitalizations, however, do not necessarily reflect increases in IBD prevalence but may also be attributable to increased disease severity or decreased access to ambulatory IBD care. Because these data are limited to hospitalizations, they cannot be used as a mechanism of estimating incidence and prevalence of disease. Instead, these data would play an important role in understanding the burden of disease.
National Health Interview Survey
The National Health Interview Survey (NHIS) is an annual cross-sectional household health survey conducted by the National Center for Health Statistics, CDC. The target population for the NHIS is the civilian noninstitutionalized population of the United States. The survey uses a multistage clustered sample design, with oversampling of black, Hispanic, and Asian persons, and produces nationally representative data on health insurance coverage, healthcare access and utilization, health status, health behaviors, and other health-related topics. The in-person interviews are conducted by roughly 750 trained interviewers with the U.S. Census Bureau using computer-assisted personal interviewing technique. In this technique, telephone interviewing is allowed to complete missing sections of the interview when in-person interviewing is not possible.
The NHIS questionnaire consists of a core set of questions that remain relatively unchanged from year-to-year and supplemental questions that vary from year-to-year to collect additional data pertaining to current health issues of national importance. The core survey instrument contains 4 main modules: Household Composition, Family, Sample Child, and Sample Adult. For the household composition module, a household respondent provides basic sociodemographic information on all members of the household. Within each family, the family module is completed by a family respondent, who provides health information on each member of the family. All available adults in the family are invited to participate in the family interview. Additional health information is subsequently collected from 1 randomly selected adult (the “sample adult”) aged 18 years or older and from the parent or guardian of 1 randomly selected child younger than 18 years (the “sample child”).
The last time IBD was included in the NHIS was in 1999 as a supplement to the sample adult interview. Interviewees were asked if they were given a diagnosis of CD or UC by a physician or other health professional, age at diagnosis, and whether they had IBD-related symptoms within the previous 12 months. Longobardi et al used the 1999 NHIS data to demonstrate increased healthcare utilization among patients with IBD compared with non-IBD controls.81 Using their analysis, we can also estimate that the prevalence of self-reported IBD to be 876 per 100,000. This figure is nearly 2- to 3-fold higher than recent estimates of IBD prevalence in North America (UC, 249/100,000 persons; CD, 319/100,000 persons).5 This discrepancy underscores the difference between prevalence estimates based on self-report compared with those ascertained by IBD-related health encounters. It should be noted that self-reported IBD in the NHIS is not confirmed with the medical record and may reflect overreporting. However, there are patients with IBD in long-term remission, who do not seek IBD-related healthcare, and these individuals may be undercounted using health administrative data. Another potential limitation of the NHIS is that the final sample adult response rate in 1999 was 70%, and patients with IBD may have been more likely to agree to the survey leading to a response bias. However, age is a correlate of both response and many health outcomes and the sample adult weights are calibrated to Census-based population controls including age. This certainly helps to address nonresponse bias in sample adult estimates. Finally, the exclusion of children may have partly explained the higher prevalence estimates in the NHIS.
The NHIS is potentially a very useful tool to assess IBD burden in the United States, because it represents a nationwide population-based sample and uses an existing infrastructure to collect data. Its’ oversampling of Hisapnics, non-Hispanic blacks, and non-Hispanic Asians also enables descriptive epidemiology for important minority groups. The presence of personal identifiers also enables linkage to databases such as the National Death Index and the Medical Expenditure Panel Survey. Sample redesign of this survey is done every 10 years, with a redesigned sample due to be released next in 2016. A content redesign is scheduled for implementation in 2017. Inclusion of an IBD component would represent an invaluable opportunity. This can be expensive, with a cost of approximately 1 million dollars to add a 3- to 4-minute series of core questions to the survey, such as those specific to IBD. An additional challenge is the need to validate self-reported IBD. This could be achievable by asking the interviewee permission to contact treating physicians. The infrastructure for such validation is already in place in that the current survey does validate self-reported immunizations in children with the medical record. Finally, the total IBD sample size was relatively small (N = 271 in 1999) and may be insufficient for subgroup analyses. This limitation could be addressed by including IBD-related questions on the NHIS in multiple years.
Population-based Surveys or Cohort Studies (Harvard Professional Health Studies)
The Harvard Professional Health Studies are a group of prospective cohorts of women (Nurses’ Health Study I and II) and men (Health Professional Follow-up studies) that were initiated in 1976, 1989, and 1986, respectively, and are followed through biennial questionnaires ascertaining diet, lifestyle, and health history including new medical diagnoses. With a follow-up rate exceeding 90%, they have contributed significantly to understanding chronic disease epidemiology. Diagnosis is through initial self-report, followed by a supplemental questionnaire and medical record review of every reported case.
The infrastructure needs for using the cohort for the study of IBD studies is difficult to isolate as questionnaire development and administration is encompassed within the larger efforts of cohort follow-up, which include collection of multiple exposures and surveillance for the incidence of several disease outcomes. Resources generally required for IBD-specific studies require funding to support research assistants to contact participants, obtain medical records, and create case files. Currently, natural history of disease including complications, need for surgery, and various treatments are obtained through self-report on a supplemental questionnaire and then validated on medical record review. However, this assessment is obtained at the time of initial self-report. At present, follow-up questionnaires are not being administered. Biospecimens are available (before or after diagnosis) in a subset (approximately one-third of the cohort) consisting of buffy coat for DNA and stored plasma.
There are several examples of IBD studies from within this cohort, particularly focused on risk factors for development of disease (vitamin D, stress/anxiety, medication use, early life factors, smoking, and dietary factors).82–86 The advantages of this cohort pertain to an accurate ascertainment of incident IBD through physician confirmation of reported cases. The cohort is ideal for studying environmental exposures as they relate to disease risk and possible interactions between genetic and environmental factors. Validation ensures accuracy of self-reported cases in all cases. A key advantage is that, by simultaneously assessing multiple exposures and outcomes, “disease-specific” costs are kept lower than what they would otherwise be. There are also some distinct limitations to this cohort. Women enrolled in the initial Nurses’ Health Study now represent an older population and the cohorts are relatively gender specific. However, ongoing expansion of the target population through newer cohorts would potentially mitigate some of this skewed distribution. Furthermore, the results from this cohort have been consistent with other published epidemiologic studies, supporting continued utility of such cohort. The high degree of medical knowledge of the participants is also a substantial advantage of this cohort. Use of this and other established cohorts in IBD surveillance would require additional infrastructure to obtain periodic follow-up, including time of medication initiation, surgery, hospitalization, and other disease-specific factors.
This form of surveillance is ideal in many ways for a U.S. surveillance program as structured assessments of exposures before disease onset allows for identifying environmental risks through a high-quality prospective cohort and allows for tracking incident disease, which is otherwise challenging. The fact that the cohort assesses multiple diseases makes it efficient and less expensive. Studies from within this cohort have taken a structured approach to validation, which would be key in any future surveillance program. Participants need to remain motivated to ensure high follow-up rates. In these cohorts, this has been accomplished through periodic feedback of results (e.g., through newsletters) from the study. The Harvard Professional Health Studies have central data storage and a structured protocol for proposing a study, executing it, and publishing results. These factors ensure data integrity and would be needed should similar methodology be used in a national IBD surveillance program.
Veteran’s Administration Data
The Department of VA is the largest healthcare system in the United States. The VA has kept computerized records of all its patients for more than 30 years. These records comprise the patient treatment file with inpatient data since 1970. From 1970 until 1980, diagnoses in the patient treatment file were recorded using the ICD-8. From 1981 until 2006, ICD-9 has been used. Codes have been validated within the VA system, with a reasonable reliability for disease status. The PPV of CD codes was superior (88%–100%) to that of UC (50%–93%), but the accuracy of extraintestinal manifestations and surgeries was poor (PPV, 0%–29%).87
There are numerous examples of studies on IBD etiology, geographic variation of disease within the VA system, complications, medication effects, and quality assessment of preventive measures using the VA population.88–94 For example, in a study of CRC in patients with IBD when compared with those without, CRC was found to affect patients with IBD at a younger age and have a more proximal localization, when compared with CRC of patients without IBD.88 In a study of etiologic risk factors for IBD in the veteran population, military duty in Vietnam and a status as prisoner of war both were protective against the development of CD but not UC (odds ratio, 0.84; confidence interval, 0.75–0.96 and odds ratio, 0.60; confidence interval, 0.41–0.87, respectively).89 Khan et al performed a recent study on adherence to mesalamine and its influence on UC flares. Long-term high-adherence level was lower than previously reported (40% range). Adherence was a significant factor in predicting disease flares.92 Other studies from this group have investigated metabolic bone disease, prevalence of corticosteroid use, and lymphoma risk in individuals with UC on thiopurines.95–97 Studies directly related to quality of care within the VA have also been performed. For example, Hou et al reviewed monitoring for myelosuppression in veterans on thiopurines for IBD. Monitoring after medication initiation was low with wide variability based on facility.91 As these few examples suggest, data from the VA system can be used for a wide variety of research and quality improvement purposes.
One of the distinct advantages to this database is that individuals can be followed up throughout a long time period of coverage, allowing for cohort studies and studies of outcomes that can take years to develop (such as malignancies). Because medications are recorded within the database, pharmacoepidemiologic studies are also possible. One example is the recent study by Khan et al95 demonstrating an increased risk of lymphoma in individuals with UC on thiopurines. There are several disadvantages to this data source as well. For each patient, a record of the previous medical history needs to be reassembled from multiple individual annual data files, including separate inpatient and outpatient files, medical and surgical procedural files, medication file, and a death file. The programming to use these data is quite labor intensive, as is the approval process. Because medical test results are unavailable within the database, the investigator merely knows that the test has been performed (such as laboratory, pathology, radiology, and endoscopy results). In addition, some veterans use the VA for prescription medication coverage but are hospitalized or obtain much of their care outside the VA system; in these instances, the external records and outcomes are not available. There are ongoing changes in eligibility for VA benefits, resulting in an unknown population base (or denominator). The lack of a known denominator affects the ability to perform disease surveillance. Historically, the VA system has been predominantly caring for male patients, limiting the generalizability of its data to women. Although the VA system includes a large population, for some epidemiologic questions, the sample size still remains too small. Such questions pertain, for instance, to the seasonal variation of IBD, its distribution among ethnic minorities, or its underlying birth-cohort patterns. Data from the VA system could not stand alone as a surveillance system for IBD but could serve as a valuable component of a disease surveillance system.
Surveillance Epidemiology and End Results Program
The SEER is sponsored by the National Cancer Institute. This program collects data on cancer incidence from population-based registries covering approximately 28% of the U.S. population. This program is designed to over sample traditionally underrepresented populations such as Hispanics, American Indians and Alaskan Natives, Asians, and Pacific Islanders. The sample is comparable with the U.S. population regarding income and education. The data are updated annually. SEER registries are located in particular states, located in various regions of the country.
The process for capturing data within SEER relies on mandatory reporting of cancer diagnoses. Within each SEER area, healthcare providers who diagnose or provide first courses of treatment are required to report cancer cases to the SEER registry. Some healthcare facilities have onsite cancer registers to aid in this process. After a cancer report is received, a cancer register is sent to the facility with 6 months to abstract data on the case. This form of data collection can be time consuming and expensive, although the data are of high quality.
The goals of the SEER program are to collect complete and accurate cancer data. The program periodically reports on the burden of cancer in the United States. Researchers and public health policy makers also use these data to describe temporal changes in cancer diagnoses. The program can identify any unusual patterns in cancer incidence, including iatrogenic cancers. Data are provided as a resource to researchers for a variety of other study purposes. The data currently collected within SEER include demographics, primary tumor site, tumor morphology, stage at diagnosis, first course of treatment, survival, and cause of death. Census data are also included every decade, such as median household income, percent with high school education, percent speaking English, population density, and urban versus rural status based on zip code. There are also data on availability of health resources (e.g., facilities for radiation, etc).
Importantly, SEER data can be linked to other sources for research and public health purposes. For example, data can be linked to Medicare data or to other national data repositories such as the Medicare Health Outcomes Survey (MHOS) or the National Longitudinal Mortality Study. SEER–Medicare data include information on 100% of cancer patients in SEER areas. These data also include a 5% random sample of non-cancer patients in these areas. Longitudinal Medicare claim data are also included, with factors such as short stay hospitalization data, physician or laboratory services, hospital outpatient claims, and part D prescription drug claims. There are several strengths to SEER–Medicare data, including a large number of patients within a diverse geographical base, longitudinal follow-up, data on non-cancer controls for comparisons, and the population-based nature of the cohort. There are also several limitations to SEER data. These data only include individuals aged 65 years and older and lack of inclusion of data on noncovered services, reasons for tests or services, the results of testing, or confounders associated with treatment. SEER-Medicare data are well suited for healthcare utilization research or for studies on late outcomes of cancer treatments. Types of research that are more challenging using these data include treatment effectiveness studies, long-term symptoms, or studies on recurrence of cancer or disease progression. MHOS is a supplemental questionnaire administered to a random sampling of 1000 beneficiaries from each managed care organization that participates in Medicare. There are 95 questions included related to demographics, socioeconomic status, activities of daily living, and quality of life. A new MHOS cohort is selected every year, and follow-up data are obtained 2 years later. A total of 8 MHOS cohorts have been linked to SEER (approximately 55,000 individuals). The strengths of this linkage include the ability to evaluate health-related quality of life before and after a cancer diagnosis and after an initial treatment. The impact of cancer on activities of daily living can also be determined. The weaknesses to these data are that they are limited to a small sample and that the entire sample comes from Medicare managed care beneficiaries. These data also do not contain information on recurrence or specifics of the cancer treatment. SEER–National Longitudinal Mortality Study data combine socioeconomic data collected by Current Population Surveys and cause of death. These data do include characteristics such as race/ethnicity, marital status, education, income, employment status, occupation, household size, immigrant status, limited smoking data, and health insurance status. This resource includes 3 million people, with cause of death in approximately 250,000. These data have been linked to SEER for approximately 35,000 individuals with cancer diagnoses through 2003.
SEER data have been used in previous publications studying IBD-associated cancer. For example, several studies on the risk of lymphoma in IBD have used this data source.98–100 Sultan et al101 compared the prognosis of cancer in individuals with IBD with survival outcomes in SEER. Other studies have compared IBD-associated CRC with sporadic CRC102,103 or rates of early/missed cancers after colonoscopy in older patients with and without IBD.104 Although SEER itself could not be used as a surveillance system for IBD, this model could be extended to IBD-related outcomes and complications, which are an important part of disease surveillance. Established IBD cohorts could be linked to SEER registries to capture IBD-associated cancer. To develop a SEER type infrastructure to capture incident cases of IBD would be costly and would require that IBD be considered a reportable condition (i.e., mandatory reporting within a defined population). There are additional challenges as well, such as determination of who does the reporting and the fact that an IBD diagnosis can be more complicated than a cancer diagnosis, requiring multiple components such as clinical symptoms, endoscopic, pathologic, and radiologic findings. As IBD is a rare condition, there is a lower economy of scale for establishing a SEER-like registry.
Most longitudinal cohort studies of CD and UC in the United States have relied on administrative data or retrospective review of clinical records. Such studies often lack important clinical data that can only be collected directly from the patient. Examples include detailed smoking history, depression, sleep, physical activity, and diet. Traditionally, prospective cohort studies have been extraordinarily expensive to conduct because of the need for large personnel resources to recruit and follow-up patients over time. With the advent of widespread availability of the Internet, the opportunity exists to prospectively follow-up patients longitudinally using electronic methods at a dramatically reduced cost. The CCFA’s Partners research program is a cohort study that uses Internet-based recruitment and data collection to assemble detailed cross-sectional and longitudinal data on large numbers of patients with IBD.105
The objective of CCFA Partners is to establish a large and diverse registry of patients with IBD who can be identified, enrolled, and followed up using the Internet. To date, more than 12,000 subjects have been enrolled, the majority joined after recruitment from the nearly 400,000 person e-mail roster maintained by the CCFA.
Detailed information is collected at baseline on disease type, location, medications, family history, disease activity, prevention activities (such as influenza vaccine), exercise, quality of life, and patient-reported outcomes. Every 6 months, this information is updated. Reminders at 3-month intervals are designed to help maintain interest and deliver educational messages. Rotating modules are introduced to address-specific questions developed by the data management team or by outside investigators who have peer-reviewed ancillary studies. A validation study comparing patient-reported disease type with physician-confirmed diagnosis has been completed and shows >95% specificity for IBD diagnosis (unpublished data).
Several studies have been published using data from this cohort, including a study on the role of diet in exacerbation of IBD symptoms, the perception of chronic illness care, and sleep disturbance as risk factors for disease relapse.105–107 There are several advantages to this online cohort. The cohort was specifically designed to focus on patient-reported outcomes such as anxiety, depression, fatigue, pain, and sleep. There are large numbers of potential study subjects, low cost, and rapid results. Because study subjects are surveyed every 6 months and because questionnaires do not require printing, it is possible to introduce new questions and achieve results quickly. One central disadvantage to this cohort is that the population is highly selected and may not be representative of the general IBD population. The study is also threatened by drop-outs, which will reduce study power and risk bias if the drop-outs are differential.
Unfortunately, this type of study cannot be used for traditional population surveillance. It is not possible to determine incidence and prevalence for example, as there is no underlying denominator to the population in its current design. However, the study can be used to conduct surveillance on enrolled subjects to determine the course of their disease over time and the impact of the disease on quality of life and function. This unique resource could certainly help to estimate the burden of disease within the IBD population.
Medical Informatics Approaches to Electronic Health Records
The current capabilities of electronic health records (EHRs) are ever changing, and these records have evolved from storage of records to being able to track quality factors and generate reports. Data flow from the front-end to the back-end of the system using inputs in data elements. These systems can also facilitate order entry. Through order entry systems, systems can enact decision-support features (e.g., IBD-specific patient order sets to improve compliance with preventive measures such as thromboembolism prophylaxis during hospitalizations). These systems can also use outpatient clinical reminders in the decision-support system.
After information is entered into an EHR, it is stored as structured and unstructured data elements. Structured data elements include objective data such as medications, laboratory values, vital signs, demographics, and diagnoses. Unstructured data consist of chart elements such as the chief complaint or free text procedures and radiology reports. Currently, by using data from EHRs, queries can be run to determine the total number of patients with a condition, such as IBD. Other structured data, such as number of visits or medications, are available. These systems can also be used to track quality process measures. There are also many things that EHRs cannot do. For example, a denominator cannot be generated from an EHR. In theory, if EHRs could eventually be combined throughout the United States, this could form the start of a national surveillance program. However, there is limited penetration of EHRs throughout the United States (only 60% in some estimates). Currently, there are 550 separate ambulatory vendors with complete EHRs approved by CMS. There are also no currently widely adopted standards for data exchange. In addition, without a unique identifier, these EHRs do not have the ability to differentiate or uniquely identify patients seen in more than 1 practice.
There may be value in exploring the role of ancillary health information systems in disease surveillance. Other possible non-EHR systems with information about IBD care include laboratory information systems, imaging, pathology, endoscopy reporting systems, billing and communication between patients, pharmacies, and insurance providers. Laboratory information systems are the most commonly used systems for infectious and microbiology surveillance. There is a well-defined system in place for mandatory reporting of certain diseases to state public health agencies and the CDC. This system would be inadequate for IBD, because no specific laboratory is ordered in all patients with IBD and no specific laboratory value can confirm the diagnosis. An imaging system would have limitations if used in surveillance; for example, there are variations in practice patterns involved in ordering imaging tests in IBD, there are no structured reports for IBD, and in general, imaging is inadequate to diagnose isolated upper GI or colonic disease. Surgical pathology systems, in general, do not use structured data, which can be a challenge when using these data for a surveillance system. There are a number of different electronic endoscopic record systems.108 Data commonly available through these systems include factors such as scheduling data, monitoring of vital signs during a procedure, endoscopic record, correspondence with referring physicians, and billing codes. Some of the advantages to using this system include the rich procedural details available and has excellent yield for upper GI tract and colonic IBD. The disadvantages of using this system include the low yield for isolated small bowel disease and the problem of unique identification.
There are numerous examples of studies on IBD from EHRs. Reddy et al109 examined EHR data to determine whether patients with IBD were receiving optimal care (such as addition of steroid sparing agents or colon cancer surveillance). There were high levels of noncompliance with important IBD quality measures. EHRs have also been used to perform studies of medication adherence in IBD. Kane et al110 demonstrated that medication nonadherence correlated significantly with relapse of disease in patients with UC on 5-aminosalicylate therapy. There are many other retrospective studies in single centers describing disease outcomes and complications. These studies are greatly facilitated through EHRs.
How data from EHRs could add to a surveillance system in IBD remains to be seen. As technology advances, there may be the potential to embed cloud-based registries within the EHR. Any such efforts would also need to address the unique identification issue. One model for leveraging the healthcare system electronic records for the purposes of surveillance is the Federal Drug Administration Mini-Sentinel program. This database contains information on over 130 million individuals from a combined total of 18 partner organizations. These data encompass nearly 400 million person-years of observation time. The Mini-Sentinel Common Data Model standardizes administrative and clinical information across the included Data Partners. Data Partners execute, within their own institutions’ firewalls, standardized computer programs (e.g., modular programs) provided by the Operations Center or project workgroups. These data form 1 basis for surveillance of disease outbreaks in the United States. Certainly, there are challenges to using these data in their current form as a tool to monitor surveillance in IBD; but as technologies emerge, the EHR may play an important role in future IBD surveillance programs.
POSSIBLE SOLUTIONS TO IBD SURVEILLANCE IN THE UNITED STATES
The healthcare system in the United States presents unique challenges to the development of a surveillance system. Although we can learn important lesions from other countries that have been successful in this endeavor, the fragmentation of the U.S. system and the lack of a unique personal identification number used for healthcare encounters makes this charge of IBD surveillance an even greater challenge. Unlike other disease surveillance programs in the United States, where a laboratory value alone can trigger a cascade of events, IBD diagnosis itself (requiring endoscopic, pathologic, and radiologic findings) also does not lend itself to the current surveillance model.
It becomes important to use a series of tools to accomplish the goals of a disease surveillance program (Fig. 2). In essence, the right tool must be used for the right job. To provide estimates of prevalence, Medicare/Medicaid data, closed panel Health Maintenance Organization data, and commercial U.S. administrative claims data may be used and, with appropriate sampling and weighting, can produce national estimates that are reasonable enough. Other national surveys and data sources, such as the NHIS, could be leveraged as a means of determining incidence and prevalence if the sample size were large enough and representative of the overall population. IBD could be added to the screened conditions within this questionnaire at the next revision (2017), with preparations to validate the self-reported diagnoses of IBD. National-level data might particularly focus on estimating the incidence and prevalence in those groups previously undercounted (such as minorities, immigrants, and children). For race in particular, these data would need to be captured under the auspices of a program such as NHIS, as administrative data are not sufficiently reliable in this regard. To complement these data, regional estimates of incidence, prevalence, and other factors such as natural history of disease, etiology, and burden should also be abstracted from established population-based cohorts such as Olmsted County and OSCCAR. These data sources with a richer level of clinical data could significantly enhance the counts available from administrative data sources and could answer important questions not amenable to administrative data sources. Novel regional population-based cohorts might be needed to achieve greater population sampling, including the southern and western regions and of African American, Latino, and Asian populations.
Perhaps, it is time to move beyond assessments of incidence and prevalence. In effect, by rejecting the charge of only monitoring surveillance, much more can be accomplished. In general, the model of surveillance does not fit well with a disease such as IBD. IBD is not subject to outbreaks, as seen with infectious diseases, where real time systems of disease surveillance are needed. Reported incidence and prevalence rates of IBD in various populations within the United States, arrived at from several different data systems and approaches, remain in similar ranges. These estimates of disease may be good enough at this point. Future incidence and prevalence estimates should likely focus on previously undercounted populations.
Perhaps, a broader definition of surveillance system could incorporate many other important aspects of disease (Fig. 2). Rather than merely estimating the incidence and prevalence of disease, we should be providing surveillance of the burden of disease to society. Factors involved in the burden of disease could be costs, loss of work productivity, morbidity, disease-related cancers, and other factors downstream from diagnosis. At this point, it remains unknown whether the burden of disease is shared equally by the estimated 1.4 million Americans with IBD or if this burden is concentrated in a portion of those affected. This surveillance program would be a new model for disease surveillance, in essence focusing on “surveillance of the burden of disease.” This system would focus separately on (1) natural history of disease and (2) outcomes and complications of the disease and/or treatments.
Because we cannot prevent IBD, we should focus on preventing its potentially avoidable complications. A multifaceted model of surveillance of the burden of disease (such as that seen in Fig. 2) could accomplish this important goal. However, research into the etiology of disease is still of great importance, particularly among populations with rapidly rising rates of IBD diagnosis. These populations, such as immigrants, may provide important etiologic clues as to disease development. It is possible that some of these etiologic clues are shared across autoimmune diseases and may aid in future preventive efforts outside the direct arena of IBD.
Therefore, after a thorough review of epidemiologic data sources available in the United States and a review of successful international surveillance programs, we have concluded that the United States should broaden the goals of surveillance to include surveillance of the burden of disease. As outlined in this document, this can only be accomplished through a combination of approaches, using unique data resources ranging from administrative claims data to national surveys to inception cohorts of IBD. With this model in place, we will be able to direct appropriate resources to the IBD community to begin to decrease the burden of disease.
1. Kappelman MD, Rifas-Shiman SL, Porter CQ, et al.. Direct health care costs of Crohn’s disease and ulcerative colitis in US children and adults. Gastroenterology. 2008;135:1907–1913.
2. Park KT, Bass D. Inflammatory bowel disease-attributable costs and cost-effective strategies in the United States: a review. Inflamm Bowel Dis. 2011;17:1603–1609.
3. Kappelman MD, Rifas-Shiman SL, Kleinman K, et al.. The prevalence and geographic distribution of Crohn’s disease and ulcerative colitis in the United States. Clin Gastroenterol Hepatol. 2007;5:1424–1429.
4. Loftus CG, Loftus EV Jr, Harmsen WS, et al.. Update on the incidence and prevalence of Crohn’s disease and ulcerative colitis in Olmsted County, Minnesota, 1940-2000. Inflamm Bowel Dis. 2007;13:254–261.
5. Molodecky NA, Soon IS, Rabi DM, et al.. Increasing incidence and prevalence of the inflammatory bowel diseases with time, based on systematic review. Gastroenterology. 2012;142:46–54.e42; quiz e30.
6. Ng SC, Bernstein CN, Vatn MH, et al.. Geographical variability and environmental risk factors in inflammatory bowel disease. Gut. 2013;62:630–649.
7. Benchimol EI, Fortinsky KJ, Gozdyra P, et al.. Epidemiology of pediatric inflammatory bowel disease: a systematic review of international trends. Inflamm Bowel Dis. 2011;17:423–439.
8. Loftus EV Jr. Clinical epidemiology of inflammatory bowel disease: incidence, prevalence, and environmental influences. Gastroenterology. 2004;126:1504–1517.
9. Bernstein CN, Shanahan F. Disorders of a modern lifestyle: reconciling the epidemiology of inflammatory bowel diseases. Gut. 2008;57:1185–1191.
10. Calkins BM, Lilienfeld AM, Garland CF, et al.. Trends in incidence rates of ulcerative colitis and Crohn’s disease. Dig Dis Sci. 1984;29:913–920.
11. Sonnenberg A, Wasserman IH. Epidemiology of inflammatory bowel disease among U.S. military veterans. Gastroenterology. 1991;101:122–130.
12. Sonnenberg A, McCarty DJ, Jacobsen SJ. Geographic variation of inflammatory bowel disease within the United States. Gastroenterology. 1991;100:143–149.
13. Kurata JH, Kantor-Fish S, Frankl H, et al.. Crohn’s disease among ethnic groups in a large health maintenance organization. Gastroenterology. 1992;102:1940–1948.
14. Sonnenberg A. Demographic characteristics of hospitalized IBD patients. Dig Dis Sci. 2009;54:2449–2455.
15. Sewell JL, Yee HF Jr, Inadomi JM. Hospitalizations are increasing among minority patients with Crohn’s disease and ulcerative colitis. Inflamm Bowel Dis. 2010;16:204–207.
16. Carr I, Mayberry JF. The effects of migration on ulcerative colitis: a three-year prospective study among Europeans and first- and second- generation South Asians in Leicester (1991-1994). Am J Gastroenterol. 1999;94:2918–2922.
17. Malaty HM, Fan X, Opekun AR, et al.. Rising incidence of inflammatory bowel disease among children: a 12-year study. J Pediatr Gastroenterol Nutr. 2010;50:27–31.
18. Adamiak T, Walkiewicz-Jedrzejczak D, Fish D, et al.. Incidence, clinical characteristics, and natural history of pediatric IBD in wisconsin: a population-based epidemiological Study. Inflamm Bowel Dis. 2013;19:1218–1223.
19. Abramson O, Durant M, Mow W, et al.. Incidence, prevalence, and time trends of pediatric inflammatory bowel disease in Northern California, 1996 to 2006. J Pediatr. 2010;157:233–239.e1.
20. Herrinton LJ, Liu L, Lewis JD, et al.. Incidence and prevalence of inflammatory bowel disease in a Northern California managed care organization, 1996-2002. Am J Gastroenterol. 2008;103:1998–2006.
21. Weng X, Liu L, Barcellos LF, et al.. Clustering of inflammatory bowel disease with immune mediated diseases among members of a northern California-managed care organization. Am J Gastroenterol. 2007;102:1429–1435.
22. Herrinton LJ, Liu L, Lafata JE, et al.. Estimation of the period prevalence of inflammatory bowel disease among nine health plans using computerized diagnoses and outpatient pharmacy dispensings. Inflamm Bowel Dis. 2007;13:451–461.
23. Hutfless SM, Weng X, Liu L, et al.. Mortality by medication use among patients with inflammatory bowel disease, 1996-2003. Gastroenterology. 2007;133:1779–1786.
24. Sands BE, LeLeiko N, Shah SA, et al.. OSCCAR: Ocean State Crohn’s and Colitis Area Registry. Med Health R I. 2009;92:82–85, 88.
25. Askling J, Dickman PW, Karlen P, et al.. Colorectal cancer rates among first-degree relatives of patients with inflammatory bowel disease: a population-based cohort study. Lancet. 2001;357:262–266.
26. Ekbom A, Helmick C, Zack M, et al.. Ulcerative colitis and colorectal cancer. A population-based study. N Engl J Med. 1990;323:1228–1233.
27. Soderlund S, Granath F, Brostrom O, et al.. Inflammatory bowel disease confers a lower risk of colorectal cancer to females than to males. Gastroenterology. 2010;138:1697–1703.
28. Ma C, Crespin M, Proulx MC, et al.. Postoperative complications following colectomy for ulcerative colitis: a validation study. BMC Gastroenterol. 2012;12:39.
29. Kaplan GG, Seow CH, Ghosh S, et al.. Decreasing colectomy rates for ulcerative colitis: a population-based time trend study. Am J Gastroenterol. 2012;107:1879–1887.
30. de Silva S, Ma C, Proulx MC, et al.. Postoperative complications and mortality following colectomy for ulcerative colitis. Clin Gastroenterol Hepatol. 2011;9:972–980.
31. Soon IS, Wrobel I, deBruyn JC, et al.. Postoperative complications following colectomy for ulcerative colitis in children. J Pediatr Gastroenterol Nutr. 2012;54:763–768.
32. Rezaie A, Quan H, Fedorak RN, et al.. Development and validation of an administrative case definition for inflammatory bowel diseases. Can J Gastroenterol. 2012;26:711–717.
33. Kaplan GG, Hubbard J, Panaccione R, et al.. Risk of comorbidities on postoperative outcomes in patients with inflammatory bowel disease. Arch Surg. 2011;146:959–964.
34. Bernstein CN, Wajda A, Svenson LW, et al.. The epidemiology of inflammatory bowel disease in Canada: a population-based study. Am J Gastroenterol. 2006;101:1559–1568.
35. Bernstein CN, Rawsthorne P, Cheang M, et al.. A population-based case control study of potential risk factors for IBD. Am J Gastroenterol. 2006;101:993–1002.
36. Bernstein CN, Singh S, Graff LA, et al.. A prospective population-based study of triggers of symptomatic flares in IBD. Am J Gastroenterol. 2010;105:1994–2002.
37. Bernstein CN, Nugent Z, Blanchard JF. 5-aminosalicylate is not chemoprophylactic for colorectal cancer in IBD: a population based study. Am J Gastroenterol. 2011;106:731–736.
38. Graff LA, Vincent N, Walker JR, et al.. A population-based study of fatigue and sleep difficulties in inflammatory bowel disease. Inflamm Bowel Dis. 2011;17:1882–1889.
39. Rawsthorne P, Clara I, Graff LA, et al.. The Manitoba Inflammatory Bowel Disease Cohort Study: a prospective longitudinal evaluation of the use of complementary and alternative medicine services and products. Gut. 2012;61:521–527.
40. Singh H, Nugent Z, Demers AA, et al.. Increased risk of nonmelanoma skin cancers among individuals with inflammatory bowel disease. Gastroenterology. 2011;141:1612–1620.
41. Singh H, Nugent Z, Demers AA, et al.. Screening for cervical and breast cancer among women with inflammatory bowel disease: a population-based study. Inflamm Bowel Dis. 2011;17:1741–1750.
42. Targownik LE, Leslie WD, Carr R, et al.. Longitudinal change in bone mineral density in a population-based cohort of patients with inflammatory bowel disease. Calcif Tissue Int. 2012;91:356–363.
43. Targownik LE, Singh H, Nugent Z, et al.. The epidemiology of colectomy in ulcerative colitis: results from a population-based cohort. Am J Gastroenterol. 2012;107:1228–1235.
44. Boualit M, Salleron J, Turck D, et al.. Long-term outcome after first intestinal resection in pediatric-onset crohn's disease: a population-based study. Inflamm Bowel Dis. 2013;19:7–14.
45. Charpentier C, Salleron J, Savoye G, et al.. Natural history of elderly-onset inflammatory bowel disease: a population-based cohort study. Gut. 2013.
46. Chouraki V, Savoye G, Dauchet L, et al.. The changing pattern of Crohn’s disease incidence in northern France: a continuing increase in the 10- to 19-year-old age bracket (1988-2007). Aliment Pharmacol Ther. 2011;33:1133–1142.
47. Crombe V, Salleron J, Savoye G, et al.. Long-term outcome of treatment with infliximab in pediatric-onset Crohn’s disease: a population-based study. Inflamm Bowel Dis. 2011;17:2144–2152.
48. Gower-Rousseau C, Vasseur F, Fumery M, et al.. Epidemiology of inflammatory bowel diseases: new insights from a French population-based registry (EPIMAD). Dig Liver Dis. 2013;45:89–94.
49. Lesage S, Zouali H, Cezard JP, et al.. CARD15/NOD2 mutational analysis and genotype-phenotype correlation in 612 patients with inflammatory bowel disease. Am J Hum Genet. 2002;70:845–857.
50. Savoye G, Salleron J, Gower-Rousseau C, et al.. Clinical predictors at diagnosis of disabling pediatric Crohn’s disease. Inflamm Bowel Dis. 2012;18:2072–2078.
51. Vernier-Massouille G, Balde M, Salleron J, et al.. Natural history of pediatric Crohn’s disease: a population-based cohort study. Gastroenterology. 2008;135:1106–1113.
52. Liu L, Allison JE, Herrinton LJ. Validity of computerized diagnoses, procedures, and drugs for inflammatory bowel disease in a northern California managed care organization. Pharmacoepidemiol Drug Saf. 2009;18:1086–1093.
54. Shaukat A, Virnig DJ, Salfiti NI, et al.. Is inflammatory bowel disease an important risk factor among older persons with colorectal cancer in the United States? A population-based case-control study. Dig Dis Sci. 2011;56:2378–2383.
55. Shaukat A, Virnig DJ, Howard D, et al.. Crohn’s disease and small bowel adenocarcinoma: a population-based case-control study. Cancer Epidemiol Biomarkers Prev. 2011;20:1120–1123.
56. Sanoff HK, Carpenter WR, Sturmer T, et al.. Effect of adjuvant chemotherapy on survival of patients with stage III colon cancer diagnosed after age 75 years. J Clin Oncol. 2012;30:2624–2634.
57. Braithwaite D, Zhu W, Hubbard RA, et al.. Screening outcomes in older US women undergoing multiple mammograms in community practice: does interval, age, or comorbidity score affect tumor characteristics or false positive rates? J Natl Cancer Inst. 2013;105:334–341.
58. Baxter NN, Warren JL, Barrett MJ, et al.. Association between colonoscopy and colorectal cancer mortality in a US cohort according to site of cancer and colonoscopist specialty. J Clin Oncol. 2012;30:2664–2669.
59. Haynes K, Beukelman T, Curtis JR, et al.. Tumor necrosis factor alpha inhibitor therapy and cancer risk in chronic immune-mediated diseases. Arthritis Rheum. 2013;65:48–58.
60. Grijalva CG, Chen L, Delzell E, et al.. Initiation of tumor necrosis factor-alpha antagonists and the risk of hospitalization for infection in patients with autoimmune diseases. JAMA. 2011;306:2331–2339.
61. Arora G, Singh G, Vadhavkar S, et al.. Incidence and risk of intestinal and extra-intestinal complications in Medicaid patients with inflammatory bowel disease: a 5-year population-based study. Dig Dis Sci. 2010;55:1689–1695.
62. Herrinton LJ, Curtis JR, Chen L, et al.. Study design for a comprehensive assessment of biologic safety using multiple healthcare data systems. Pharmacoepidemiol Drug Saf. 2011;20:1199–1209.
63. Herrinton LJ, Liu L, Fireman B, et al.. Time trends in therapies and outcomes for adult inflammatory bowel disease, Northern California, 1998-2005. Gastroenterology. 2009;137:502–511.
64. Herrinton LJ, Liu L, Weng X, et al.. Role of thiopurine and anti-TNF therapy in lymphoma in inflammatory bowel disease. Am J Gastroenterol. 2011;106:2146–2153.
65. Velayos FS, Liu L, Lewis JD, et al.. Prevalence of colorectal cancer surveillance for ulcerative colitis in an integrated health care delivery system. Gastroenterology. 2010;139:1511–1518.
66. St Sauver JL, Grossardt BR, Yawn BP, et al.. Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project. Am J Epidemiol. 2011;173:1059–1068.
67. St Sauver JL, Grossardt BR, Yawn BP, et al.. Data resource profile: the Rochester Epidemiology Project (REP) medical records-linkage system. Int J Epidemiol. 2012;41:1614–1624.
68. St Sauver JL, Grossardt BR, Leibson CL, et al.. Generalizability of epidemiological findings and public health decisions: an illustration from the Rochester Epidemiology Project. Mayo Clin Proc. 2012;87:151–160.
69. Rocca WA, Yawn BP, St Sauver JL, et al.. History of the Rochester Epidemiology Project: half a century of medical records linkage in a US population. Mayo Clin Proc. 2012;87:1202–1213.
70. Loftus EV Jr, Silverstein MD, Sandborn WJ, et al.. Crohn’s disease in Olmsted County, Minnesota, 1940-1993: incidence, prevalence, and survival. Gastroenterology. 1998;114:1161–1168.
71. Loftus EV Jr, Silverstein MD, Sandborn WJ, et al.. Ulcerative colitis in Olmsted County, Minnesota, 1940-1993: incidence, prevalence, and survival. Gut. 2000;46:336–343.
72. Silverstein MD, Loftus EV, Sandborn WJ, et al.. Clinical course and costs of care for Crohn’s disease: Markov model analysis of a population-based cohort. Gastroenterology. 1999;117:49–57.
73. Jess T, Loftus EV Jr, Velayos FS, et al.. Risk of intestinal cancer in inflammatory bowel disease: a population-based study from Olmsted county, Minnesota. Gastroenterology. 2006;130:1039–1046.
74. Schwartz DA, Loftus EV Jr, Tremaine WJ, et al.. The natural history of fistulizing Crohn’s disease in Olmsted County, Minnesota. Gastroenterology. 2002;122:875–880.
75. Loftus EV Jr, Crowson CS, Sandborn WJ, et al.. Long-term fracture risk in patients with Crohn’s disease: a population-based study in Olmsted County, Minnesota. Gastroenterology. 2002;123:468–475.
76. Loftus EV Jr, Achenbach SJ, Sandborn WJ, et al.. Risk of fracture in ulcerative colitis: a population-based study from Olmsted County, Minnesota. Clin Gastroenterol Hepatol. 2003;1:465–473.
77. Faubion WA Jr, Loftus EV Jr, Harmsen WS, et al.. The natural history of corticosteroid therapy for inflammatory bowel disease: a population-based study. Gastroenterology. 2001;121:255–260.
78. Thia KT, Sandborn WJ, Harmsen WS, et al.. Risk factors associated with progression to intestinal complications of Crohn’s disease in a population-based cohort. Gastroenterology. 2010;139:1147–1155.
79. Peyrin-Biroulet L, Harmsen WS, Tremaine WJ, et al.. Surgery in a population-based cohort of Crohn’s disease from Olmsted County, Minnesota (1970-2004). Am J Gastroenterol. 2012;107:1693–1701.
80. Jess T, Loftus EV Jr, Harmsen WS, et al.. Survival and cause specific mortality in patients with inflammatory bowel disease: a long term outcome study in Olmsted County, Minnesota, 1940-2004. Gut. 2006;55:1248–1254.
81. Longobardi T, Jacobs P, Bernstein CN. Work losses related to inflammatory bowel disease in the United States: results from the National Health Interview Survey. Am J Gastroenterol. 2003;98:1064–1072.
82. Ananthakrishnan AN, Higuchi LM, Huang ES, et al.. Aspirin, nonsteroidal anti-inflammatory drug use, and risk for Crohn disease and ulcerative colitis: a cohort study. Ann Intern Med. 2012;156:350–359.
83. Ananthakrishnan AN, Khalili H, Higuchi LM, et al.. Higher predicted vitamin D status is associated with reduced risk of Crohn's disease. Gastroenterology. 2012;142:482–489.
84. Ananthakrishnan AN, Khalili H, Pan A, et al.. Association between depressive symptoms and incidence of Crohn’s disease and ulcerative colitis: results from the Nurses’ Health Study. Clin Gastroenterol Hepatol. 2013;11:57–62.
85. Khalili H, Ananthakrishnan AN, Higuchi LM, et al.. Early life factors and risk of inflammatory bowel disease in adulthood. Inflamm Bowel Dis. 2013;19:542–547.
86. Khalili H, Higuchi LM, Ananthakrishnan AN, et al.. Oral contraceptives, reproductive factors and risk of inflammatory bowel disease. Gut. 2013;62:1153–1159.
87. Thirumurthi S, Chowdhury R, Richardson P, et al.. Validation of ICD-9-CM diagnostic codes for inflammatory bowel disease among veterans. Dig Dis Sci. 2010;55:2592–2598.
88. Bansal P, Sonnenberg A. Risk factors of colorectal cancer in inflammatory bowel disease. Am J Gastroenterol. 1996;91:44–48.
89. Delco F, Sonnenberg A. Military history of patients with inflammatory bowel disease: an epidemiological study among U.S. veterans. Am J Gastroenterol. 1998;93:1457–1462.
90. Hou JK, Chang M, Nguyen T, et al.. Automated identification of surveillance colonoscopy in inflammatory bowel disease using natural language processing. Dig Dis Sci. 2013;58:936–941.
91. Hou JK, Kramer JR, Richardson P, et al.. Myelosuppression monitoring after immunomodulator initiation in veterans with inflammatory bowel disease: a national practice audit. Aliment Pharmacol Ther. 2012;36:1049–1056.
92. Khan N, Abbas AM, Bazzano LA, et al.. Long-term oral mesalazine adherence and the risk of disease flare in ulcerative colitis: nationwide 10-year retrospective cohort from the veterans affairs healthcare system. Aliment Pharmacol Ther. 2012;36:755–764.
93. Khan N, Abbas AM, Moehlen M, et al.. Methotrexate in ulcerative colitis: a nationwide retrospective cohort from the veterans affairs health care system. Inflamm Bowel Dis. 2013;19:1379–1383.
94. Sonnenberg A, Richardson PA, Abraham NS. Hospitalizations for inflammatory bowel disease among US military veterans 1975-2006. Dig Dis Sci. 2009;54:1740–1745.
95. Khan N, Abbas AM, Lichtenstein GR, et al.. Risk of lymphoma in patients with ulcerative colitis treated with thiopurines—a nationwide retrospective cohort study. Gastroenterology. 2013;145:1007–1015.
96. Khan N, Abbas AM, Almukhtar RM, et al.. Prevalence and predictors of low bone mineral density in males with ulcerative colitis. J Clin Endocrinol Metab. 2013;98:2368–2375.
97. Khan N, Abbas A, Williamson A, et al.. Prevalence of corticosteroids use and disease course after initial steroid exposure in ulcerative colitis. Dig Dis Sci. 2013;58:2963–2969.
98. Deepak P, Sifuentes H, Sherid M, et al.. T-cell non-Hodgkin’s lymphomas reported to the FDA AERS with tumor necrosis factor-alpha (TNF-alpha) inhibitors: results of the REFURBISH study. Am J Gastroenterol. 2013;108:99–105.
99. Chiorean MV, Pokhrel B, Adabala J, et al.. Incidence and risk factors for lymphoma in a single-center inflammatory bowel disease population. Dig Dis Sci. 2011;56:1489–1495.
100. Siegel CA, Marden SM, Persing SM, et al.. Risk of lymphoma associated with combination anti-tumor necrosis factor and immunomodulator therapy for the treatment of Crohn’s disease: a meta-analysis. Clin Gastroenterol Hepatol. 2009;7:874–881.
101. Sultan K, Korelitz BI, Present D, et al.. Prognosis of lymphoma in patients following treatment with 6-mercaptopurine/azathioprine for inflammatory bowel disease. Inflamm Bowel Dis. 2012;18:1855–1858.
102. Gearhart SL, Nathan H, Pawlik TM, et al.. Outcomes from IBD-associated and non-IBD-associated colorectal cancer: a Surveillance Epidemiology and End Results Medicare study. Dis Colon Rectum. 2012;55:270–277.
103. Shaukat A, Salfiti NI, Virnig DJ, et al.. Is ulcerative colitis associated with survival among older persons with colorectal cancer in the US? A population-based case-control study. Dig Dis Sci. 2012;57:1647–1651.
104. Wang YR, Cangemi JR, Loftus EV Jr, et al.. Rate of early/missed colorectal cancers after colonoscopy in older patients with or without inflammatory bowel disease in the United States. Am J Gastroenterol. 2013;108:444–449.
105. Long MD, Kappelman MD, Martin CF, et al.. Development of an internet-based cohort of patients with inflammatory bowel diseases (CCFA Partners): methodology and initial results. Inflamm Bowel Dis. 2012;18:2099–2106.
106. Cohen AB, Lee D, Long MD, et al.. Dietary patterns and self-reported associations of diet with symptoms of inflammatory bowel disease. Dig Dis Sci. 2013;58:1322–1328.
107. Ananthakrishnan AN, Long MD, Martin CF, et al.. Sleep disturbance and risk of active disease in patients with Crohn’s disease and ulcerative colitis. Clin Gastroenterol Hepatol. 2013;11:965–971.
108. Atreja A, Achkar JP, Jain AK, et al.. Using technology to promote gastrointestinal outcomes research: a case for electronic health records. Am J Gastroenterol. 2008;103:2171–2178.
109. Reddy SI, Friedman S, Telford JJ, et al.. Are patients with inflammatory bowel disease receiving optimal care? Am J Gastroenterol. 2005;100:1357–1361.
110. Kane S, Huo D, Aikens J, et al.. Medication nonadherence and the outcomes of patients with quiescent ulcerative colitis. Am J Med. 2003;114:39–43.