The use of large clinical and administrative databases for orthopaedic research has increased exponentially over the last decade (Fig. 1) [11, 12]. Access to extremely large volumes of patient data has allowed researchers to answer questions previously difficult to evaluate using smaller single-institution cohort studies. Additionally, these large databases have allowed for more robust analysis of trends in procedures, complications, and outcomes after surgery. Furthermore, administrative claims databases are now being utilized for the public reporting of surgical outcomes with subsequent penalties for underperforming institutions [6, 14].
Although powerful in their ability to analyze large cohorts, each database summarizes the experiences of a unique patient population and varies in their methodology for data acquisition [11, 12]. These intrinsic differences may result in inconsistencies of reported comorbidities and surgical complications. Recently, studies have evaluated differences in reported comorbidities and surgical complications for multiple large databases across many surgical specialties, including orthopaedic surgery [2-4, 7-10]. For example, Bohl et al. demonstrated differences in inpatient adverse events after hip fracture surgery between the National Inpatient Sample (NIS) and National Surgical Quality Improvement Program (NSQIP) databases with frequencies of acute kidney injury and urinary tract infection in NIS being more than twice those in NSQIP . Understanding the differences among these databases is important for appropriately evaluating research utilizing them.
Therefore, we asked: (1) What are the differences in reported demographics, comorbidities, and complications for patients undergoing primary TKA among four databases commonly used in orthopaedic research? (2) How does the difference in reported complication rates vary depending on whether only inpatient data or 30-day postoperative data are analyzed?
Patients and Methods
A retrospective study of patients who had undergone primary TKA was performed with four databases commonly used in orthopaedic research: NSQIP, NIS, Medicare Standard Analytic Files (MED), and the Humana Administrative Claims database (HAC). Only procedures that occurred between 2010 and 2012 were evaluated because these were the years of data available across all data sets. Patients undergoing primary TKA were identified using Current Procedural Terminology (CPT) code 27447 and International Classification of Diseases, 9th Revision (ICD-9) code 81.54. All data within these databases are deidentified and Health Insurance Portability and Accountability Act-compliant and were thus exempt from institutional review board approval.
The NSQIP database is maintained by the American College of Surgeons and uses trained clinical reviewers to perform data collection through chart review and patient/surgeon contact utilizing strict definitions for each comorbidity and complication variable catalogued . This database captures both inpatient and outpatient events up to 30 days after surgery and there is roughly an equal mix of public and private hospitals in the cohort. Routine auditing of the NSQIP database is performed to ensure standardized data collection. Audits have demonstrated high data reliability with disagreement rates of < 2% . Additionally, the database reports if any data are missing for a given variable to allow researchers to appropriately address the missing data. The three additional databases (NIS, HAC, and MED) were queried utilizing the PearlDiver Research Program (www.pearldiverinc.com; PearlDiver Inc, Fort Wayne, IN, USA). All three of these databases are administrative claims data sets with their comorbidity and adverse event data defined by reimbursement data in the form of ICD-9 and/or CPT codes. The HAC and MED databases utilize both ICD-9 and CPT codes and capture both inpatient and outpatient events. For the MED and HAC databases, there are no time intervals after which outpatient events are no longer captured other than limitations by what years are included in the respective databases. In this study, outpatient events were only analyzed out to 30 days after TKA to allow for comparison to NSQIP. The NIS consists of a 20% sample of all inpatient discharges and includes only inpatient data. Similar to HAC and MED, NIS comorbidity and adverse event data are defined by reimbursement data for the inpatient admission of interest; however, only ICD-9 codes are supported by NIS. Although the absence of CPT codes in NIS does somewhat limit the accuracy of identifying complications that required a return to the operating room (ie, infection), this is unlikely to greatly limit the findings of this study because return to the operating room during the inpatient stay was overall a rare event, ICD-9 procedure codes do capture operations, and only the infection variables and cardiac arrest variable had CPT codes included (for HAC and MED) in their definitions along with ICD-9 diagnosis codes that are individually quite specific for these particular complications. The validity of administratively coded comorbidity and complication data in the total joint arthroplasty population has been studied by Bozic et al. . They reported varied concordance between administrative claims and the clinical record, citing a high degree of specificity (> 92%) for all comorbidities and complications but a lower degree of sensitivity (29%-100%) suggesting comorbidities and complications in the administrative claims record are accurate but often incomplete . The number of patients undergoing primary TKA analyzed in this study was 48,248 in HAC, 783,546 in MED, 393,050 in NIS, and 43,220 in NSQIP.
Definitions from the NSQIP user manual for seven comorbidities (morbid obesity, obesity, coagulopathy, diabetes, hypertension, chronic obstructive pulmonary disease [COPD], and smoking; Table 1) and nine postoperative complications (deep vein thrombosis [DVT], pneumonia, stroke, myocardial infarction, cardiac arrest, pulmonary embolism [PE], wound dehiscence, deep surgical site infection [SSI], and any SSI; Table 2) were matched to corresponding ICD-9 and CPT codes and compared for all patients across all databases. These particular comorbidities and complications were chosen because they were thought to be most relevant to patients undergoing TKA and they were felt to be most accurately matched to corresponding ICD-9 codes based on their NSQIP definitions. NSQIP variables not included were either unrelated to patients undergoing TKA or the definitions were extremely specific and did not have an adequate corresponding ICD-9 code. Additionally, patient demographics (age at the time of surgery and sex) were also compared across data sets.
Postoperative complications were evaluated and compared over two different time periods: those that occurred during the inpatient stay (NIS, NSQIP, HAC, MED) and those that occurred within 30 days of the primary TKA (NSQIP, HAC, MED). Lastly, the differences in rates of complications occurring during the inpatient stay versus those occurring within 30 days after surgery were compared for databases with both of these times points available (NSQIP, HAC, MED).
Demographic characteristics were compared between databases with use of the Pearson chi-square test. Given the manner in which age data are provided by the PearlDiver research program for HAC, NIS, and MED, we compared age of patients between databases based on which age group the median age at the time of TKA fell within. A p value of < 0.05 was considered significant. Prevalence of comorbidities and complication rates was compared among databases with use of relative risk (RR) and corresponding 95% confidence intervals. However, consistent with other database comparison studies, the large sample sizes and associated high power allowed for detection of statistical significance for small, clinically insignificant differences [3, 4]. Thus, the focus of these comparisons was on the magnitude of the differences, specifically utilizing an absolute difference threshold of greater than twofold to signify an important clinical difference.
Demographic Comparisons Among Databases
Age distribution was clinically similar among databases with median age falling within the 70- to 74-year-old age group for NSQIP, MED, and HAC and 65 to 69 years for NIS (Fig. 2). The female-to-male ratio was 1.7 to one for NIS and NSQIP and 1.8 to one for HAC and MED. Despite observed statistical differences (p < 0.001 for age breakdown and sex ratios), these differences were small and well below our predetermined thresholds defining clinically important differences.
Differences in Comorbidities and Complications
Overall there was some variation in the prevalence of comorbidities (Fig. 3) and large variation in rates of inpatient complications (Fig. 4) and postoperative complications occurring within 30 days of TKA among databases compared (Fig. 5). However, given that many small and clinically unimportant differences were statistically significant in the RR analysis, the focus of these comparisons was on the absolute difference in comorbidities and complication rates using a threshold of a twofold difference to define clinical importance. Comparison of comorbidities demonstrated a greater than twofold increase in RR of COPD and coagulopathy in both HAC and MED compared with NIS and NSQIP (RR for COPD: MED versus NIS 3.1 [3.0-3.1], MED versus NSQIP 4.5 [4.3-4.7], HAC versus NIS 3.6 [3.6-3.7], HAC versus NSQIP 5.3 [5.0-5.6]; RR for coagulopathy: MED versus NIS 3.9 [3.8-4.0], MED versus NSQIP 3.1 [2.9-3.2], HAC versus NIS 3.3 [3.2-3.4], HAC versus NSQIP 2.7 [2.5-2.8]; p < 0.001 for all). Additionally, NSQIP had twice the amount of obese patients as NIS (RR 0.4 [0.3-0.4], p < 0.001). The prevalence of all other comorbidities was not different among all databases (less than a twofold difference; Table 3).
The occurrence of inpatient complications was compared among all four databases included in this study (MED, NIS, HAC, NSQIP) and revealed HAC, MED, and NIS to have at least a twofold increase in RR of any SSI and wound dehiscence compared with NSQIP (RR any SSI: HAC: 17.9 [12.3-26.3], MED: 3.45[2.38-5.01], NIS: 2.73 [1.88-3.99]; RR wound dehiscence: HAC: 2.64 [1.61-4.34], MED: 3.45 [2.38-5.01], NIS: 2.73 [1.88-3.99]; p < 0.001 for all). Additionally, there was a greater than fivefold increase in RR of any SSI for HAC compared with MED (RR 5.21 [4.74-5.73], p < 0.001) or NIS (RR 6.57 [5.88-7.34], p < 0.001). For the inpatient complications stroke and pneumonia, HAC and MED had more than a twofold RR of these complications relative to NSQIP (RR for stroke: HAC: 3.73 [2.41-5.77], MED: 2.23 [1.50-3.32]; RR for pneumonia: HAC: 2.62 (2.11-3.27), MED: 2.38 [1.96-2.88); p < 0.001 for all). Lastly, HAC had a RR of 2.62 (2.10-3.27) for prevalence of stroke compared with NIS (p < 0.001). Prevalence of all other inpatient complications was not different among databases (Table 4).
Prevalence of complications occurring within 30 days after TKA was compared among HAC, MED, and NSQIP (NIS only included inpatient data) and varied greatly across databases with HAC having more than a twofold greater prevalence of every complication than NSQIP (p < 0.001 for all). Additionally, HAC had over twice the RR of stroke (2.62 [2.34-2.93]), deep SSI (10.72 [9.37-12.29]), and any SSI (2.04 [1.94-2.15]) than MED (p < 0.001 for all). Although MED had greater than a twofold prevalence of pneumonia (3.25 [2.77-3.81]), stroke (3.48 [2.48-4.89]), and wound dehiscence (2.66 [2.15-3.29]) than NSQIP, MED had less than half the prevalence of deep SSI relative to NSQIP (0.22 [0.18-0.27], p < 0.001 for all). Prevalence of all other complications occurring within 30 days of surgery was not different among databases (Table 5).
When looking at the number of complications captured by the databases relative to the time period analyzed (inpatient versus 30 days after surgery), it was found that over half of any SSI, deep SSI, wound dehiscence, and DVT was not captured by NSQIP if followup was limited to only inpatient events. Similarly, over half of all complication endpoints except deep SSI would not have been captured by HAC if limited to the inpatient time period. Additionally, MED would have missed over half of all SSI, deep SSI, wound dehiscence, PE, stroke, and DVT if only inpatient data were included.
In an era of increasing use of large clinical registries and administrative claims databases for orthopaedic research and assessment of hospital quality, our study demonstrated that among four commonly utilized databases, there is considerable variation in prevalence of complication rates after primary TKA despite relatively similar demographic and comorbidity profiles across these data sets. Additionally, a large percentage of complications is not captured if only inpatient events are analyzed and included within the database. These findings highlight the importance of understanding the methodology utilized to create each respective database.
Limitations to this study include the inability to link specific patients across databases used in this analysis. If we were able to specifically identify the same patient within each database, we would have been able to provide a true measure of validity and determine the most accurate database for analyzing complication rates after TKA. It is likely the same patient may be represented in multiple databases because HAC does contain patients with Medicare advantage plans and both NSQIP and NIS represent a sample of the population not defined by an insurance provider. However, the degree to which there is overlap between these databases is not known. Although we are unable to determine the effect of overlap between databases from this study, we would assume a lesser degree of overlap would result in a lower degree of concordance among the findings in each database. Additionally, there was the potential for bias resulting from limitations of ICD-9/CPT coding and the strict nature of NSQIP definitions of comorbidities and complications possibly resulting in imperfect matching of some variables. For example, in the case of DVT, the NSQIP variable requires diagnosis and treatment, but for the administrative claims data, the treatment threshold is not a necessity for a diagnosis. Although matched as closely as possible, the differences in NSQIP definitions and ICD-9/CPT coding for a given variable likely played a large role in reported differences in NSQIP and administrative claims databases. Lastly, a twofold cutoff was used as a threshold for signifying excessive differences in the prevalence of comorbidities and complications. This threshold may have not signified clinically important differences of more common comorbidities/complications that had higher baseline prevalence because it would have still needed to double or halve to meet this cutoff. For example, 66% of NSQIP patients undergoing TKA had hypertension (HTN) and thus as few as 33% of patients could only have HTN in the other databases to meet this threshold. However, we believed that having a higher cutoff threshold was imperative to highlight the large clinically important differences within these data sets.
Despite differences in age and sex observed between databases, the magnitude of these differences was quite small, and all of them fell well below our a priori definitions of what would represent a clinically important difference. Similar findings have been reported in other database comparison studies with small, likely clinically unimportant differences found when comparing both patients with hip fracture and those undergoing lumbar spine fusion in NIS and NSQIP [3, 4]. These findings are important because they demonstrate that despite similar demographic profiles, patients undergoing TKA have wide variability in the prevalence of comorbidities and postoperative complications depending on the specific database utilized for analysis.
The differences in prevalence of complications for the various databases in this analysis are likely a result of the differences in methodology used to categorize complications for each respective data set. Additional reasons for differences in complication rates may be the manner in which the data are collected (trained clinical reviewers in NSQIP versus coders for administrative claims databases) and the primary reason for the data collection (quality assessment for NSQIP versus billing for administrative claims databases). These differences seem to apply to both inpatient and outpatient complications. We suspect that NSQIP, which utilized chart abstraction by trained clinical reviewers with strict definitions for complications, is likely more accurate than administrative claims databases given limitations in ICD-9/CPT codes and the need for accurate documentation. The concern for the accuracy of administrative claims data relative to clinical registries such as NSQIP has recently emerged in the literature across multiple surgical specialties [2-4, 7-10]. Two of these studies performed by Bohl et al. compared NSQIP with NIS for patients with hip fracture and patients who underwent lumbar spine surgery utilizing similar methodology as the present study [3, 4]. Both studies demonstrated variation in rates of inpatient adverse events depending on the database after either hip fracture or lumbar spine surgery despite having similar patient populations [3, 4]. Additionally, similar to the current study, limitations of data sets that only included inpatient data were highlighted in these articles because over half of SSIs, DVTs, and urinary tract infections after hip fracture surgery and over half of SSIs and mortalities after lumbar spine surgery are not captured by NSQIP if the time period analyzed is limited to only inpatient data [3, 4]. However, despite the data cited and the results of this study, the differences in prevalence of complications among databases cannot be entirely attributed to methods of data acquisition (specifically, administrative claims versus clinical registry) because even comparisons among multiple administrative claims databases in this study had drastically different results for the complications of stroke and SSI (Figs. 4, 5) despite querying the database with the exact same methodology.
Although we found a number of clinically important differences across the four databases we evaluated, we still believe each of these databases has important roles to play in orthopaedic research. Clinical registries such as NSQIP likely would be best utilized to evaluate the impact of comorbidities on specific complications after surgical procedures given their prospective collection of data, robust review of multiple sources to obtain clinical information (chart abstraction, medical providers, and patients), and their well-defined definitions for each included variable. However, NSQIP is limited to a short followup period (30 days) and is just a small sample of voluntarily participating hospitals, which limits the ability of this type of database to estimate disease prevalence or to examine trends. Prevalence of disease or trends over time are probably best studied with administrative claims databases such as NIS, MED, or HAC because they are more representative of the general population and generally have more years of data included. Additionally, administrative claims databases appear to be better suited for financial analysis, analysis of questions outside the scope of strict clinical definitions often associated with clinical registries, and to evaluate complications that occur further in time after the index procedure (ie, revision of a TKA) because specific patients can be followed for the entire time period they remain within a given claims databases. Although in this study, inpatient-only analysis missed a number of clinically important complications that occurs in the 30 days after discharge, inpatient-only databases such as NIS can still be very useful because they often allow for better analysis of length of stay issues and are usually less geographically limited compared with other databases. The strengths and weaknesses of these databases must be considered when evaluating literature that utilizes a large administrative claims database or clinical registry.
In conclusion, this study demonstrates that among clinical and administrative databases commonly used in orthopaedic research, there is considerable variation in the prevalence of comorbidities and rates of complications after primary TKA depending on the database and postoperative time period used for analysis. The drivers of this variation are likely the result of major differences in the variable definitions, collections methods, and patient cohorts. When evaluating research utilizing large databases, one must pay particular attention to the type of database used (administrative claims, clinical registry, or other kinds of databases), the duration of followup, and the population captured in that data set to ensure it is best suited for the specific research question. Furthermore, attention must also be paid to definitions utilized to define comorbidities and complications to ensure they accurately represent the variable of interest. Lastly, in the era of value-based health care, these differences must be considered when developing risk adjustment models for initiatives such as bundled payments. In the development of these programs, policymakers must carefully consider the data sources used to ensure the data analytics match historical sources. For example, if administrative claims data will be used actively within a bundle, then expected rates and risk adjustment models should be built off of those data.
1. American College of Surgeons National Surgical Quality Improvement Program. Data collection, analysis, and reporting. Available at: http://site.acsnsqip.org/program-specifics/data-collection-analysis-and-reporting/
. Accessed May 15, 2016.
2. Awad MI, Shuman AG, Montero PH, Palmer FL, Shah JP, Patel SG. Accuracy of administrative and clinical registry data in reporting postoperative complications after surgery for oral cavity squamous cell carcinoma. Head Neck. 2015;37:851–861.
3. Bohl DD, Basques BA, Golinvaux NS, Baumgaertner MR, Grauer JN. Nationwide Inpatient Sample and National Surgical Quality Improvement Program give different results in hip fracture studies. Clin Orthop Relat Res. 2014;472:1672–1680.
4. Bohl DD, Russo GS, Basques BA, Golinvaux NS, Fu MC, Long WD 3rd, Grauer JN. Variations in data collection methods between national databases affect study results: a comparison of the Nationwide Inpatient Sample and National Surgical Quality Improvement Program databases for lumbar spine fusion procedures. J Bone Joint Surg Am. 2014;96:e193.
5. Bozic KJ, Bashyal RK, Anthony SG, Chiu V, Shulman B, Rubash HE. Is administratively coded comorbidity and complication data in total joint arthroplasty valid? Clin Orthop Relat Res. 2013;471:201–205.
6. Centers for Medicare & Medicaid Services. CMS dry run hospital-specific report for hospital-wide all-cause unplanned readmission (HWR) measure. 2012 Available at: http://www.qualitynet.org
. Accessed August 15, 2016.
7. Enomoto LM, Hollenbeak CS, Bhayani NH, Dillon PW, Gusani NJ. Measuring surgical quality: a national clinical registry versus administrative claims data. J Gastrointest Surg. 2014;18:1416–1422.
8. Kulaylat AN, Engbrecht BW, Rocourt DV, Rinaldi JM, Santos MC, Cilley RE, Hollenbeak CS, Dillon PW. Measuring surgical site infections in children: comparing clinical, electronic, and administrative data. J Am Coll Surg. 2016;222:823–830.
9. Lawson EH, Louie R, Zingmond DS, Brook RH, Hall BL, Han L, Rapp M, Ko CY. A comparison of clinical registry versus administrative claims data for reporting of 30-day surgical complications. Ann Surg. 2012;256:973–981.
10. Lawson EH, Louie R, Zingmond DS, Sacks GD, Brook RH, Hall BL, Ko CY. Using both clinical registry and administrative claims data to measure risk-adjusted surgical outcomes. Ann Surg. 2016;263:50–57.
11. Pugely AJ, Martin CT, Harwood J, Ong KL, Bozic KJ, Callaghan JJ. Database and registry research in orthopaedic surgery: part I: claims-based data. J Bone Joint Surg Am. 2015;97:1278–1287.
12. Pugely AJ, Martin CT, Harwood J, Ong KL, Bozic KJ, Callaghan JJ. Database and registry research in orthopaedic surgery: part 2: clinical registry data. J Bone Joint Surg Am. 2015;97:1799–1808.
13. Shiloach M, Frencher SK Jr, Steeger JE, Rowell KS, Bartzokis K, Tomeh MG, Richards KE, Ko CY, Hall BL. Toward robust information: data quality and inter-rater reliability in the American College of Surgeons National Surgical Quality Improvement Program. J Am Coll Surg. 2010;210:6–16.
14. US Senate. HR 3590: The Patient Protection and Affordable Care Act. 2009. Available at: https://www.govtrack.us/congress/bills/111/hr3590/text
. Accessed August 25, 2016.