Share this article on:

Data, Big Data, and Metadata in Anesthesiology

Levin, Matthew A. MD*; Wanderer, Jonathan P. MD, MPhil†‡; Ehrenfeld, Jesse M. MD, MPH†‡§∥

doi: 10.1213/ANE.0000000000000716
The Open Mind: The Open Mind

The last decade has seen an explosion in the growth of digital data. Since 2005, the total amount of digital data created or replicated on all platforms and devices has been doubling every 2 years, from an estimated 132 exabytes (132 billion gigabytes) in 2005 to 4.4 zettabytes (4.4 trillion gigabytes) in 2013, and a projected 44 zettabytes (44 trillion gigabytes) in 2020.a This growth has been driven in large part by the rise of social media along with more powerful and connected mobile devices, with an estimated 75% of information in the digital universe generated by individuals rather than entities. Transactions and communications including payments, instant messages, Web searches, social media updates, and online posts are all becoming part of a vast pool of data that live “in the cloud” on clusters of servers located in remote data centers. The amount of accumulating data has become so large that it has given rise to the term Big Data. In many ways, Big Data is just a buzzword, a phrase that is often misunderstood and misused to describe any sort of data, no matter the size or complexity. However, there is truth to the assertion that some data sets truly require new management and analysis techniques.

From the *Department of Anesthesiology, Icahn School of Medicine at Mount Sinai, New York, New York; Departments of Anesthesiology, Biomedical Informatics, §Health Policy, and Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee.

Accepted for publication December 10, 2014.

Funding: No external sources.

Conflict of Interest: See Disclosures at the end of the article.

Reprints will not be available from the authors.

Address correspondence to Matthew A. Levin, MD, Department of Anesthesiology, Mount Sinai School of Medicine, 1 Gustave L. Levy Place, Box 1010, New York, NY 10029. Address e-mail to matthew.levin@mssm.edu.

Increasingly, health care data are becoming a part of this continuous stream of digital data, driven in part by the mandates included in the Health Information Technology for Economic and Clinical Health (HITECH) Act, which was part of the larger American Recovery and Reinvestment Act passed in 2009. The HITECH Act established financial incentives and penalties to encourage the “meaningful use” of electronic health records (EHRs). This has undoubtedly increased the volume of electronic health care data by an order of magnitude over the past 5 years. This leads to the question: have health care data become Big Data? If so, can health care Big Data provide new insights that help improve outcomes on both an individual and a population level? In this article, we define Big Data, discuss whether anesthesiology has Big Data, and determine whether there is truly a need to use new infrastructure and analytic techniques to manage data in anesthesiology.

Back to Top | Article Outline

DEFINING BIG DATA

The term Big Data is not new, but its use has only recently become widespread (Fig. 1). Broadly speaking, Big Data is data that are so large and complex, and generated from such a wide variety of sources at such a high rate, that they exceed the ability of traditional tools and infrastructure to capture, store, and analyze them. The defining characteristics of Big Data, as originally put forth by Laney in 2001, are the “3 Vs”: Volume, Velocity, and Variety (Table 1).b The combination of these 3 attributes is what makes Big Data so challenging to work with, although the difficulty most often arises more from data volume and velocity than from variety. For example, an average of 500 million “tweets” is posted on Twitter every day, and Google indexes over 20 billion sites daily to process >3.5 billion searches per day.c This volume and velocity of data are simply too great for most conventional database systems.

Table 1

Table 1

Figure 1

Figure 1

The 3 Vs are not the only definition of Big Data. Some have suggested that a fourth V, Veracity, is important. If sources cannot be trusted and data are not reliable, they cannot be acted upon. Veracity, however, is a necessary subjective property of all data, regardless of size or complexity, and is not unique to Big Data. A recent review of Big Data definitions found strong ties between Big Data and certain infrastructure, specifically technologies such as “NoSQL” databases that organize and store data using simple key-value pairs rather than the tables and columns used by modern relational databases.1 In fact, the best definition of Big Data might simply be “too big to fit on my computer.” Conversely, many technologists would say, “If the data fit in a database, they are not Big Data.”d

Another proposed definition of Big Data is “to describe ‘big’ in terms of the number of useful permutations of sources making useful querying difficult ... and complex interrelationships making purging difficult ... Big Data can be small and not all large datasets are big.”e This somewhat contradictory statement captures the concept that, as much as size, what can make Big Data difficult to manage is the lack of structure and the complexity of the relationships among data elements.

One issue that has generally been poorly addressed in discussions of Big Data is data quality.f In fact, there is an attitude that “‘good enough’ is good enough,” meaning that some degree of data loss and inaccuracy is an acceptable trade-off for the insight gained from massive data sets.2 Although this may be true for nonclinical applications, adopting this approach in the health care setting can be problematic. Data loss and inaccuracy are clearly not acceptable for granular analyses at the individual patient level. However, utilization of statistical analysis techniques designed to deal with “noisy data” may render Big Data with suboptimal data quality still useful. Using such methods requires early and close collaboration with formally trained statisticians and bioinformaticists because these techniques are mathematically complex and well beyond the level of statistical education of most clinicians.3 The analysis process is no longer highly linear (as in a classic randomized controlled trial), but iterative and even branching. Early collaboration helps ensure success in both the interpretation and presentation of results.

Back to Top | Article Outline

DOES ANESTHESIOLOGY HAVE BIG DATA OR NEED BIG DATA?

Having defined Big Data, we can now ask: does the medical specialty of anesthesiology have Big Data? By looking at each of the Vs in turn, a more detailed answer can be developed. First, however, it is worthwhile to ask, does anesthesiology really need Big Data?

Back to Top | Article Outline

Potential of Big Data

The current generation of perioperative research is the analyses of tens of thousands of anesthetic cases comprising megabytes of data. The next generation of perioperative research will involve millions of anesthetic cases, a sample size currently restricted to research that uses large administrative data sets.4 Perhaps the biggest argument for Big Data in anesthesiology is that there remain important clinical problems for which we do not have good answers, because we do not have enough power to perform meaningful statistical analyses. This conundrum has been recognized by the anesthesia community for some time.5 Examples of such problems and questions are:

  • True root causes of ischemic optic neuropathy (the largest study to date has only looked at dozens of cases).6
  • True incidence of and risk factors for postoperative pulmonary complications.7–9 The recently completed PERISCOPE trial, which externally validated a previously developed postoperative pulmonary complications risk score, showed wide variation in predictive power even within a fairly large sample of over 5000 patients drawn from 63 centers in 21 European countries.10 This suggests that even larger and more diverse data are needed to develop globally applicable risk scores.
  • The true value of processed electroencephalography monitoring (e.g., Bispectral Index) in preventing awareness. The largest studies thus far have only had a handful of incidences of awareness in each arm.11–13

In the examples above, it is not only that more raw numbers of cases may be needed to answer the question, it is also that more detail per case is likely needed. This can lead to Big Data even if the number of cases remains relatively modest. For example, having access to intraoperative blood pressure waveform recordings might facilitate better understanding of whether rapid and transient blood pressure changes (not recorded by conventional monitoring) play any role in causing ischemic optic neuropathy.

Back to Top | Article Outline

Anesthesiology Big Data—Volume

There are 2 aspects to consider for volume: (1) data for an individual case, and (2) aggregate data across practices (i.e., institutional and/or national level data). Individual case data consist of patient demographics, physiologic data (vital signs), event data, medication data, fluid data, and any associated information describing these data. Most of these elements involve minimal amounts of data, on the order of kilobytes. Table 2 shows an example of how much storage might be required to record one physiologic parameter, in various formats. It is evident that individual anesthesia records generated by the current generation of anesthesia information management systems (AIMSs) are not Big Data. Full waveform capture, however, begins to generate a significant volume of data. As shown in Table 2, waveform data for a 5-lead electrocardiogram for a 2-hour case would generate 37 MB of data. Add capnography, arterial blood pressure and central venous pressure, pulse oximetry, electroencephalograph traces, airway pressure and volume waveforms, and the data volume explodes. For example, Liu et al.14 from the University of Queensland recorded waveform data with 10 millisecond resolution (100 Hz) from 32 patients undergoing anesthesia. This generated approximately 5.5 GB of data, or about 170 MB per case. (The researchers have made this data set freely available.g)

Table 2

Table 2

Another perspective from which to consider volume is at the aggregate institutional or national level. A modern AIMS, sampling physiologic data once every minute, and including all other patient and case data, will generate a file approximately 1 MB in size. A large tertiary care center might perform 200 cases per day, generating 200 MB of data. If the center performs 50,000 anesthesia cases per year, 50 GB of data are generated. In the United States in 2010, approximately 51 million inpatient surgical procedures were performed.h If data from all of these cases were captured, this would result in approximately 51 terabytes of anesthesia case data per year. The National Clinical Outcomes Registry (NACOR), a nationwide anesthesia database maintained by the Anesthesia Quality Institute, has the stated goal of capturing data on all anesthetics administered in the United States. Since starting operations in 2010, NACOR has collected data on over 21 million cases through November 2014.i Of these, only an estimated 1 million cases have detailed intraoperative physiologic data. With the use of the estimates above, this is only about 1 terabyte of data, the equivalent of about 400 full-length DVD movies, or 33 movies in the higher-quality Blu-Ray format, an amount that can easily fit onto a modern consumer hard drive. Additionally, by design, NACOR is focused on breadth rather than depth of data capture, with the result that the quality and completeness of case data may vary widely among contributing sites. Further, current data use agreements limit reporting to one’s own data and benchmarking, and site and provider identities are masked in the Participant User File (the research extract made available to participants). Probability sampling cannot be done and, therefore, no conclusions can be drawn about incidence. This limits NACOR’s current utility for research purposes.

Another possible source of Big Data in anesthesiology is the Multicenter Perioperative Outcomes Group (MPOG).j This initiative, started by the University of Michigan in 2008, aims to aggregate EHR, administrative and outcome data into a single unified source that can be used for perioperative research. To date, 17 sites are actively contributing data to this effort from the United States and the Netherlands. The MPOG database contains over 2 million patient cases representing 1.4 million unique patients, with over 5 billion vital signs and 125 million laboratory values.k MPOG limits access to contributing members, and use of the data for research projects requires approval from its Perioperative Clinical Research Committee. Access to data is restricted to the subset relevant to each research project, which limits the use of MPOG as a true large scale data source.

Other possible sources of anesthesia Big Data are national perioperative databases such as the National Surgery Quality Program (NSQIP) and the Society of Thoracic Surgeons (STS) National Database.15 Both the STS and NSQIP rely on manual data collection, which is time consuming, costly, labor intensive, and inflexible.16 STS and NSQIP data entry is performed via structured forms with prespecified values that do not have the flexibility to allow free text input. Updating the data capture forms requires administrative review and consensus, which cannot be done by individual reporters. Because these registries are surgically oriented, they generally do not contain detailed intraoperative anesthesia data. STS participation is voluntary and only captures cardiothoracic surgical cases, and NSQIP relies on sampling and thus only captures a small fraction of all surgical cases performed in the United States. These registries remain small and cannot be considered Big Data.

Back to Top | Article Outline

Anesthesiology Big Data—Velocity

At first glance, it appears obvious that anesthesiology has high-velocity data. Intraoperative monitoring is continuous, and every minute of every day there are thousands of cases occurring simultaneously across the United States and the world. Yet, the vast majority of these data is never captured because waveform data are not stored. Therefore, in reality, the current velocity of anesthesia data is quite low, with the typical AIMS only recording data once per minute (Table 2). Additionally, data are often not available for use in near real time, but are only made available for reporting the next day. This is not true of many of the older, more established AIMS, but it is the case for some of the newer AIMS provided by the large EMR vendors, such as Epic Anesthesia (Epic Systems Corp., Madison, WI). This further decreases the velocity of data.

While many practitioners may not currently benefit from real-time data analysis, there are growing examples of ways in which real-time predictive analysis of intraoperative trends can lead to improved outcomes. For example, at Vanderbilt University Medical Center, real-time data including vital signs and operating room video are streamed wirelessly to mobile devices to allow supervising anesthesiologists to remotely monitor their cases and manage clinical workflow.17 Others have proposed using real-time waveform data analysis to support clinical decision support around fluid responsiveness and management in both the operating room and intensive care unit.18

Back to Top | Article Outline

Anesthesiology Big Data—Variety

The last dimension of Big Data is variety. There are clearly a variety of data types present in perioperative data. Physiologic data can be continuous or discrete numerical data. Demographic data are numeric, text, and categorical data (e.g., ASA Physical Status Classification). Medication data are a mixture of data types, representing medication name, units, dose, and administration time stamps. Allergy data can be either structured and mapped to a standardized nomenclature, or can be represented as unstructured free text.19,20 Intraoperative events are represented by time stamps or time series. Cases may include imaging data such as transesophageal echocardiographic images, video laryngoscopy images, and intraoperative radiographs. There are also a variety of data sources. Data can come from an AIMS, an EHR, an anesthesia workstation (via an AIMS or directly), a picture-archiving and communication system, an ultrasound device, an intraoperative video-recording device, etc. On a regional or national level, data could come from a wide variety of different institutions (e.g., community hospitals, ambulatory surgical centers, freestanding imaging centers, or tertiary medical centers). The challenge of integrating data from all of these sources and care settings is daunting, particularly given the current lack of widespread interoperability among systems.

Back to Top | Article Outline

Anesthesiology Big Data—Summary

In summary, does the field of anesthesiology really have Big Data? The answer is: not yet. There is definitely variety. There is increasing velocity, especially as the field moves toward more real-time analysis of intraoperative data. The volume of data, however, is modest. Looking toward the near future, if full waveform data were captured and used for real-time signal processing (e.g., heart rate variability, entropy analysis), not only would the volume of data suddenly become very large, the analysis would start to become computationally complex and resource intensive. This might truly push anesthesiology into the realm of Big Data. The payoff of such real-time waveform analysis might be better prediction of impending clinical decompensation (e.g., postoperative hemodynamic instability) before it occurs, in time for preemptive intervention to occur. On a national level, NACOR will become an increasingly important resource for perioperative research, and in time, as more providers begin sending their data to NACOR and the frequency of submission increases, the volume and velocity may begin to approach a scale that could be called Big Data. It must be noted again, however, that as long as NACOR only provides aggregate data to researchers, no studies of incidence can be undertaken using NACOR data.

Back to Top | Article Outline

METADATA

We have described the volume, velocity, and variety of data in anesthesiology and provided information describing each of these 3 attributes. Another important attribute of Big Data (or any data) is metadata, which is defined as data about data. The National Information Standards Organization specifies 3 types of metadata: descriptive, structural, and administrative.21 Descriptive metadata are data points that are used to assist with discovery and identification of data, including elements like content authorship. In the context of an AIMS, for instance, descriptive metadata associated with a case comment would indicate who entered the comment, at what time, and from which device. Structural metadata specifies how data are ordered and linked together. For an AIMS, this would include a database schema that describes how a case might be stored as one patient identifier record linked to many physiologic data entry records. Administrative metadata are used to manage data and may specify elements such as data access permissions and data access logs. An example of this is an audit log that records individual provider accesses of AIMS records.

United States law mandates retention of these records. The Health Insurance Portability and Accountability Act of 1996 requires that covered entities, such as hospitals, “implement hardware, software, and/or procedural mechanisms that record and examine activity in information systems that contain or use electronic protected health information” (45 C.F.R. § 164.312). These records must be maintained for 6 years, and a process must be in place to examine these logs and generate compliance reports. The volume of data needed to meet this requirement can be very large. At Vanderbilt University Medical Center, approximately 4.5 million audit records are generated daily for one of the non-AIMS EHR systems, which scales to 10 billion records extrapolated over 6 years. Maintaining these data stores in an accessible fashion can be challenging and may require some techniques associated with Big Data, even without AIMS records.

AIMS records were featured in a case report by Vigoda and Lubarsky in 2006, where an unrecognized failure of the AIMS to record intraoperative vital signs likely led to increased medical liability in a procedure that resulted in patient harm.22 While not the focus of the case report, the authors also noted that the plaintiff’s attorney had requested the metadata associated with the AIMS entries. These entries included the attending anesthesiologist’s attestation of being present at emergence, which was entered soon after the surgery’s start. Thus, metadata revealed the temporal context of the attestation, which undermined the credibility of the anesthesia team. This episode highlighted the importance of metadata to anesthesiologists specifically.

In summary, metadata are an important component of health care data that provide the context for data generated in perioperative care, which can be leveraged in nontraditional ways to provide insight into health care workflow. It has substantial volume and velocity, but does not have significant variety. However, there is another form of high-volume health care Big Data that may become increasingly important: genomics.

Back to Top | Article Outline

EMERGING BIG DATA: GENOMICS AND ANESTHESIOLOGY

As our understanding of genomics and our ability to deliver personalized medicine grows, there will be a growing need to incorporate genomics data within the context of perioperative medicine. The explosive growth in genomics has been driven by next-generation sequencing machines, which have the ability perform whole-genome sequencing at an unprecedented resolution and price point. The cost to sequence the entire human genome has fallen from $100 million in 2001 to about $10,000 in 2014.l Genome-Wide Association Studies, which attempt to link several genes to a single phenotype, and Phenome-Wide Association Studies, which attempt to link several phenotypes to a single gene, are ushering in a new understanding of biology and medicine, along with unprecedented amounts of clinical data. The complete human genome is approximately 3 GB. If even 1% of these data were used during perioperative management, that would represent a 30-fold increase in the amount of perioperative data potentially generated and stored per patient.

Some centers (including the authors’) are already prospectively genotyping patients and using that information to provide personalized therapeutics such as initial dosing of clopidogrel.23,24 The shift from population-based to patient-centered care will require the development of new approaches to managing the data that are generated as a part of this new care process. Within anesthesiology, there is great promise as we begin to understand the genomic underpinnings of drug metabolism and response, pain susceptibility, and wound healing.25–29 To be successful in this area, investigators will need to develop new tools that can combine, manage, and analyze the growing genomic and physiologic data that are generated during the perioperative period that may require using a Big Data framework. An example of such a new approach was recently described in which investigators used a high-throughput unbiased next-generation sequencing pipeline to identify leptospira in a cerebrospinal fluid sample from a critically ill patient in whom all conventional diagnostic workup had been negative.30 Over 10 million raw DNA sequence reads from the patient’s cerebrospinal fluid were compared to over 40 gigabases of reference sequences obtained from the National Center for Biotechnology Information. This was a massive computational problem. The rapid turnaround time (within 48 hours) was fast enough to enable clinicians to successfully treat the infection within the same hospitalization, with a near-complete recovery. The pipeline architecture is specifically intended to be cloud deployable.31 It is not hard to envision a future in which such real-time sequencing will be routinely used in clinical practice, although its specific role within anesthesia practice remains to be defined.

Back to Top | Article Outline

STRENGTHS AND LIMITATIONS OF TRADITIONAL ANALYTIC TECHNIQUES

While waveform or genomic data may necessitate new tools and frameworks, the majority of current perioperative data sets do not require new analytic techniques or infrastructure. A statistical analysis of postanesthesia care unit staffing performed by Dexter et al.32 in 2001 analyzed approximately 580 billion shift permutations (a Big Data number of permutations) on a low-power personal computer in approximately 7 hours. Computer hardware continues to follow “Moore’s Law,” roughly doubling in processing power every 2 years.33 In combination with advances in algorithmic optimization techniques, this has resulted, in some cases, in a 200 billion factor speedup in processing time over the last 20 years.34 A modern relational database, with fast disk arrays, adequate memory (typically 128 GB or more), properly indexed tables, and an intelligently constructed query that avoids table scans and unrestricted joins, can easily scale to terabytes of data. Partitioning, which splits 1 large table into multiple smaller tables, and sharding, which distributes the partitions across multiple servers, are 2 techniques commonly used by modern databases to handle very large data. These technologies are available on proprietary (e.g., Microsoft SQLServerm) and free (e.g., MySQLn) databases. Statistical tools such as SAS (SAS Institute, Cary, NC) and R (R Foundation for Scientific Computing, Vienna, Austria) also have the ability to handle very large data sets with essentially no limitation on file size other than that imposed by the underlying hardware and software. In combination, these platforms and programs can easily handle most perioperative data sets.

This does not mean that analysis of perioperative data is immune to the limitations of traditional tools, especially if those tools are not up to date. A recent observational study of outcome after hip surgery in the United Kingdom was later found by the authors to have inadvertently excluded 8 months of data.35 The error was not discovered until months after publication when the senior author read an editorial on Big Data and noticed the numerical similarity between the largest number able to be represented by 16 bits (65,536) and the number of patients in their data set (65,535). Further investigation revealed that a very old version of Microsoft Excel (Excel 2003) with a 16-bit limit on the total number of rows had been used for data analysis. This resulted in the data set being truncated at 65,535 patients (plus 1 header row for 65,536 total rows).36 The authors subsequently issued a correction that redefined the time period for the original article to only include the analyzed cases, leaving the conclusions unaffected. This highlights the importance of careful data analysis and familiarity with computer science concepts for those involved in the analysis of large data sets, as well as in the review of any resultant manuscripts.

Use of large data sets can present potential statistical problems for both researchers and readers. Application of statistical tests such as the Student t test, for instance, can yield “statistically significant” results with miniscule P values when used on large data sets even when the actual differences are clinically insignificant. This requires approaches that establish clinically meaningful differences a priori, and statistical testing that establishes effect sizes of differences with confidence intervals. These approaches are bolstered by carefully planned statistical analyses that are registered with an institutional or governmental research entity before data access, which is the approach currently taken by MPOG’s Perioperative Clinical Research Committee. Readers should maintain awareness of these implications for research results and cautiously interpret the significance of small effect sizes.

Other issues with statistical analysis of Big Data are noise accumulation, spurious correlation, and measurement errors.37 “Noise accumulation” refers to the increasing amount of corrupt, missing, or spurious data that become present as the size and dimension (number of variables) of a data set becomes very large. This can decrease the signal-to-noise ratio and make it hard to identify true positives. The high dimensionality of Big Data sets can also lead to spurious correlations where unrelated random variables appear to be highly and causally related but are in fact not.

Machine-learning and data-mining methods are commonly applied to Big Data sets to help overcome these issues.3 The 2 terms are often conflated, and in many ways overlap, but can be distinguished roughly as follows: machine learning is focused on making predictions about new data, based on known properties learned from existing data, whereas data mining is concerned with discovery of previously unknown properties.o Data mining may sometimes be referred to disparagingly as “fishing” since it usually involves analyzing data without an a priori hypothesis.p There is no doubt, however, that mining can provide valuable insight into massive data sets and may be useful for hypothesis generation. Data mining can encompass summarization, outlier detection, dependency modeling, classification, clustering, and regression fitting.3 These techniques are helpful addressing the issue of low signal-to-noise ratio mentioned above.

Some of the commonly used machine-learning algorithms are Bayesian networks, cluster analysis, and support vector machines.38,39 In medicine, machine learning has found particular application in genetics and genomics, although it has also been used in the perioperative arena, particularly the intensive care unit.18,40,41 There has been work done in the field of anesthesiology that uses support vector machines for predicting the depth of anesthesia in rats42 and for entropy analysis to discriminate awake versus asleep states in recovery from anesthesia43 Tighe et al.44 explored the use of a machine-learning classifier to predict the need for femoral nerve block after anterior cruciate ligament repair and found that machine-learning techniques outperformed the more traditional logistic regression.

Back to Top | Article Outline

NEW TECHNOLOGIES FOR BIG DATA

The real limitation of traditional relational databases and analytic tools becomes apparent when the volume of data to be analyzed becomes very large and highly dynamic. Typically, this occurs with Internet-scale data (i.e., search data, social media posts, etc.), where the daily volume of data is billions or even trillions of data points. The fundamental approach to dealing with such data sets is not new: split them into smaller pieces, analyze each piece, and then reassemble the results. What is new over the past decade is applying this approach by using thousands or even millions of machines. There are issues of availability, fault tolerance, and load balancing that are truly challenging. Tools such as MapReduce45,q and Hadoopr were designed to address such problems. These tools provide a programming framework and infrastructure to streamline and automate the use of massively distributed computing clusters for data processing. They typically do not provide a relational framework but act as a more primitive key-value pair store, and are optimized more for tasks such as processing large Web server log files rather than ad hoc queries. It is important to understand that these tools are programmer-intensive and not turnkey solutions. In truth, there are likely no perioperative data sets extant today that require such advanced techniques.

Back to Top | Article Outline

CONCLUSIONS

Anesthesiology is on the threshold of a change in scale that is affecting all of medicine and health care. While at an individual case level we do not have Big Data, the demand for national-level metrics, personalized medicine (genomics), and population-scale outcomes research will be key drivers for the creation of large, collaborative anesthesia data sets. Unless they incorporate waveform data, these data sets may never become big enough to truly be called Big Data. The challenges, however, in standardization, quality control, and linking data across institutions will be considerable and will require a keen understanding of how to manipulate very large data sets. The reward will be new insights that will allow our specialty to remain relevant in the health care ecosystem and improve the care of our patients.

Back to Top | Article Outline

DISCLOSURES

Name: Matthew A. Levin, MD.

Contribution: This author contributed to manuscript preparation.

Attestation: Matthew A. Levin approved the final manuscript.

Conflicts of Interest: This author declares no conflicts of interest.

Name: Jonathan P. Wanderer, MD, MPhil.

Contribution: This author contributed to manuscript preparation.

Attestation: Jonathan P. Wanderer approved the final manuscript.

Conflicts of Interest: Jonathan P. Wanderer is supported by the Foundation for Anesthesia Education and Research (FAER)’s Mentored Research Training Grant in Health Services Research (MRTG-HSR).

Name: Jesse M. Ehrenfeld, MD, MPH

Contribution: This author contributed to manuscript preparation.

Attestation: Jesse M. Ehrenfeld approved the final manuscript.

Conflicts of Interest: This author declares no conflicts of interest.

This manuscript was handled by: Franklin Dexter, MD, PhD.

Back to Top | Article Outline

FOOTNOTES

a http://http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm. Accessed November 26, 2014.

b http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed November 26, 2014.
Cited Here...

c http://http://www.internetlivestats.com. Accessed October 20, 2014.
Cited Here...

d http://ask.slashdot.org/story/14/11/08/0139248/ask-slashdot-choosing-a-data-warehouse-server-system. Accessed November 24, 2014.
Cited Here...

e http://mike2.openmethodology.org/wiki/Big_Data_Definition. Accessed July 21, 2014.
Cited Here...

f http://http://www.techrepublic.com/article/data-quality-the-ugly-duckling-of-big-data/. Accessed November 24, 2014.
Cited Here...

g http://dropbox.eait.uq.edu.au/uqdliu3/uqvitalsignsdataset/index.html. Accessed November 7, 2014.
Cited Here...

h http://http://www.cdc.gov/nchs/fastats/inpatient-surgery.htm. Accessed June 3, 2014.
Cited Here...

i https://http://www.aqihq.org/introduction-to-nacor.aspx. Accessed October 20, 2014.
Cited Here...

j https://mpog.med.umich.edu/.
Cited Here...

k Personal correspondence, author MAL, September 1, 2014.
Cited Here...

l http://http://www.genome.gov/sequencingcosts/. Accessed November 7, 2014.
Cited Here...

m http://technet.microsoft.com/en-us/library/ms345599(v=sql.105).aspx. Accessed November 19, 2014.
Cited Here...

n https://github.com/greenlion/swanhart-tools/blob/master/shard-query/README.md. Accessed November 26, 2014.
Cited Here...

o http://en.wikipedia.org/wiki/Machine_learning. Accessed August 20, 2014.
Cited Here...

p http://en.wikipedia.org/wiki/Data_mining. Accessed August 20, 2014.
Cited Here...

q http://en.wikipedia.org/wiki/Map_reduce. Accessed November 26, 2014.
Cited Here...

r http://hadoop.apache.org/. Accessed November 26, 2014.
Cited Here...

Back to Top | Article Outline

REFERENCES

1. Ward JS, Barker A. Undefined by data: a survey of Big Data definitions. arXiv. 2013 cs.DB
2. Helland P. If you have too much data, then “good enough” is good enough. Queue. 2011;9
3. Hastie T, Tibshirani R, Friedman J The Elements of Statistical Learning. 2009 New York Springer Science & Business Media
4. Sessler DI, Sigl JC, Manberg PJ, Kelley SD, Schubert A, Chamoun NG. Broadly applicable risk stratification system for predicting duration of hospitalization and mortality. Anesthesiology. 2010;113:1026–37
5. Kheterpal S, Woodrum DT, Tremper KK. Too much of a good thing is wonderful: observational data for perioperative research. Anesthesiology. 2009;111:1183–4
6. Postoperative Visual Loss Study Group. . Risk factors associated with ischemic optic neuropathy after spinal fusion surgery. Anesthesiology. 2012;116:15–24
7. Ramachandran SK, Nafiu OO, Ghaferi A, Tremper KK, Shanks A, Kheterpal S. Independent predictors and outcomes of unanticipated early postoperative tracheal intubation after nonemergent, noncardiac surgery. Anesthesiology. 2011;115:44–53
8. Brueckmann B, Villa-Uribe JL, Bateman BT, Grosse-Sundrup M, Hess DR, Schlett CL, Eikermann M. Development and validation of a score for prediction of postoperative respiratory complications. Anesthesiology. 2013;118:1276–85
9. Canet J, Gallart L. Predicting postoperative pulmonary complications in the general population. Curr Opin Anaesthesiol. 2013;26:107–15
10. Mazo V, Sabaté S, Canet J, Gallart L, de Abreu MG, Belda J, Langeron O, Hoeft A, Pelosi P. Prospective external validation of a predictive score for postoperative pulmonary complications. Anesthesiology. 2014;121:219–31
11. Myles PS, Leslie K, McNeil J, Forbes A, Chan MT. Bispectral index monitoring to prevent awareness during anaesthesia: the B-Aware randomised controlled trial. Lancet. 2004;363:1757–63
12. Avidan MS, Jacobsohn E, Glick D, Burnside BA, Zhang L, Villafranca A, Karl L, Kamal S, Torres B, O’Connor M, Evers AS, Gradwohl S, Lin N, Palanca BJ, Mashour GABAG-RECALL Research Group. BAG-RECALL Research Group. . Prevention of intraoperative awareness in a high-risk surgical population. N Engl J Med. 2011;365:591–600
13. Avidan MS, Zhang L, Burnside BA, Finkel KJ, Searleman AC, Selvidge JA, Saager L, Turner MS, Rao S, Bottros M, Hantler C, Jacobsohn E, Evers AS. Anesthesia awareness and the bispectral index. N Engl J Med. 2008;358:1097–108
14. Liu D, Görges M, Jenkins SA. University of Queensland vital signs dataset. Anesth Analg. 2012;114:584–9
15. Sessler DI. Big Data–and its contributions to peri-operative medicine. Anaesthesia. 2014;69:100–5
16. Ramachandran SK, Kheterpal S. Outcomes research using quality improvement databases: evolving opportunities and challenges. Anesthesiol Clin. 2011;29:71–81
17. Lane JS, Sandberg WS, Rothman B. Development and implementation of an integrated mobile situational awareness iPhone application VigiVU™ at an academic medical center. Int J Comput Assist Radiol Surg. 2012;7:721–35
18. Pinsky MR. Functional haemodynamic monitoring. Curr Opin Crit Care. 2014;20:288–93
19. Epstein RH, St Jacques P, Stockin M, Rothman B, Ehrenfeld JM, Denny JC. Automated identification of drug and food allergies entered using non-standard terminology. J Am Med Inform Assoc. 2013;20:962–8
20. Levin MA, Krol M, Doshi AM, Reich DL. Extraction and mapping of drug names from free text to a standardized nomenclature. AMIA Annual Symposium proceedings/AMIA Symposium AMIA Symposium. 2007:438–42
21. National Information Standards Organization.Understanding Metadata. 2004 Bethesda, MD NISO Press Available at: http://www.niso.org/publications/press/UnderstandingMetadata.pdf
22. Vigoda MM, Lubarsky DA. Failure to recognize loss of incoming data in an anesthesia record-keeping system may have increased medical liability. Anesth Analg. 2006;102:1798–802
23. Pulley JM, Denny JC, Peterson JF, Bernard GR, Vnencak-Jones CL, Ramirez AH, Delaney JT, Bowton E, Brothers K, Johnson K, Crawford DC, Schildcrout J, Masys DR, Dilks HH, Wilke RA, Clayton EW, Shultz E, Laposata M, McPherson J, Jirjis JN, Roden DM. Operational implementation of prospective genotyping for personalized medicine: the design of the Vanderbilt PREDICT project. Clin Pharmacol Ther. 2012;92:87–95
24. Gottesman O, Scott SA, Ellis SB, Overby CL, Ludtke A, Hulot JS, Hall J, Chatani K, Myers K, Kannry JL, Bottinger EP. The CLIPMERGE PGx Program: clinical implementation of personalized medicine through electronic health records and genomics-pharmacogenomics. Clin Pharmacol Ther. 2013;94:214–7
25. Kitzmiller JP, Groen DK, Phelps MA, Sadee W. Pharmacogenomic testing: relevance in medical practice: why drugs work in some patients but not in others. Cleve Clin J Med. 2011;78:243–57
26. Choi EM, Lee MG, Lee SH, Choi KW, Choi SH. Association of ABCB1 polymorphisms with the efficacy of ondansetron for postoperative nausea and vomiting. Anaesthesia. 2010;65:996–1000
27. Edwards RR. Genetic predictors of acute and chronic pain. Curr Rheumatol Rep. 2006;8:411–7
28. Lötsch J, Geisslinger G. Current evidence for a genetic modulation of the response to analgesics. Pain. 2006;121:1–5
29. Candiotti KA, Birnbach DJ, Lubarsky DA, Nhuch F, Kamat A, Koch WH, Nikoloff M, Wu L, Andrews D. The impact of pharmacogenomics on postoperative nausea and vomiting: do CYP2D6 allele copy number and polymorphisms affect the success or failure of ondansetron prophylaxis? Anesthesiology. 2005;102:543–9
30. Wilson MR, Naccache SN, Samayoa E, Biagtan M, Bashir H, Yu G, Salamat SM, Somasekar S, Federman S, Miller S, Sokolic R, Garabedian E, Candotti F, Buckley RH, Reed KD, Meyer TL, Seroogy CM, Galloway R, Henderson SL, Gern JE, DeRisi JL, Chiu CY. Actionable diagnosis of neuroleptospirosis by next-generation sequencing. N Engl J Med. 2014;370:2408–17
31. Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, Bouquet J, Greninger AL, Luk KC, Enge B, Wadford DA, Messenger SL, Genrich GL, Pellegrino K, Grard G, Leroy E, Schneider BS, Fair JN, Martínez MA, Isa P, Crump JA, DeRisi JL, Sittler T, Hackett J Jr, Miller S, Chiu CY. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014;24:1180–92
32. Dexter F, Epstein RH, Penning DH. Statistical analysis of postanesthesia care unit staffing at a surgical suite with frequent delays in admission from the operating room–a case study. Anesth Analg. 2001;92:947–9
33. Moore GE. Cramming more components onto integrated circuits. Electronics. 1965;38:114–7
34. Bertsimas D. Statistics and machine learning via a modern optimization lens. INFORMS Annual Meeting. 2014 Catonsville, MD The Institute for Operations Research and the Management Sciences (INFORMS) Available at: https://www.informs.org/content/.../2014+Morse+McCord+Lecture.pdf. Accessed March 25, 2015
35. White SM, Moppett IK, Griffiths R. Outcome by mode of anaesthesia for hip fracture surgery. An observational audit of 65 535 patients in a national dataset. Anaesthesia. 2014;69:224–30
36. White SM, Moppett IK, Griffiths R. Big data and big numbers. Anaesthesia. 2014;69:389–90
37. Fan J, Han F, Liu H. Challenges of Big Data analysis. arXiv. 2013
38. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97
39. Bal M, Amasyali MF, Sever H, Kose G, Demirhan A. Performance evaluation of the machine learning algorithms used in inference mechanism of a medical decision support system. Sci World J. 2014;2014:1–15
40. Yoo C, Ramirez L, Liuzzi J. Big data analysis using modern statistical and machine learning methods in medicine. Int Neurourol J. 2014;18:50–7
41. Pinsky MR, Dubrawski A. Gleaning knowledge from data in the intensive care unit. Am J Respir Crit Care Med. 2014;190:606–10
42. Shi L, Li X, Wan H. A predictive model of anesthesia depth based on SVM in the primary visual cortex. Open Biomed Eng J. 2013;7:71–80
43. Nicolaou N, Houris S, Alexandrou P, Georgiou J. Entropy measures for discrimination of ‘awake’ Vs ‘anaesthetized’ state in recovery from general anesthesia. Conf Proc IEEE Eng Med Biol Soc. 2011;2011:2598–601
44. Tighe P, Laduzenski S, Edwards D, Ellis N, Boezaart AP, Aygtug H. Use of machine learning theory to predict the need for femoral nerve block following ACL repair. Pain Med. 2011;12:1566–75
45. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM. 2008;51:107–13
© 2015 International Anesthesia Research Society