Secondary Logo

Share this article on:

Reinventing Radiology: Big Data and the Future of Medical Imaging

Morris, Michael, A., MD, MS*,†,‡; Saboury, Babak, MD, MPH*,†; Burkett, Brian, MD, MPH§; Gao, Jackson, MD; Siegel, Eliot, L., MD*,†

doi: 10.1097/RTI.0000000000000311
Symposium Review Articles

Purpose: Today, data surrounding most of our lives are collected and stored. Data scientists are beginning to explore applications that could harness this information and make sense of it.

Materials and Methods: In this review, the topic of Big Data is explored, and applications in modern health care are considered.

Results: Big Data is a concept that has evolved from the modern trend of “scientism.” One of the primary goals of data scientists is to develop ways to discover new knowledge from the vast quantities of increasingly available information.

Conclusions: Current and future opportunities and challenges with respect to radiology are provided with emphasis on cardiothoracic imaging.

*Diagnostic Radiology and Nuclear Medicine, University of Maryland, School of Medicine

Diagnostic Imaging Baltimore Veterans Affairs Medical Center

Internal Medicine Mercy Medical Center, Baltimore, MD

§Diagnostic Radiology Mayo Clinic School of Medicine, Rochester, MI

Albert Einstein College of Medicine, Bronx, NY

Michael A. Morris and Babak Saboury contributed equally.

The authors declare no conflicts of interest.

Correspondence to: Eliot L. Siegel, MD, Department of Diagnostic Radiology and Nuclear Medicine, 22 S Greene Street, Baltimore, MD 21201 (e-mail:

Health care today is an evolving technological landscape in which data-generating devices and social communication platforms are increasingly ubiquitous, creating what is commonly referred to as “Big Data.” The Oxford English Dictionary defines Big Data as “computing data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data.”1

A recent review of 1437 sources with “Big Data” in the title or keywords—a substantial but still not exhaustive collection of literature—revealed a heterogeneous set of definitions in several broad groups. The frequently cited “V’s” of Big Data, introduced by Laney in the early 21st century, include volume, velocity, variety (unstructuredness), uncertain veracity, and value.2 Unfortunately, these are vague terms that can be problematic to clearly define. What volume of data is large enough? What speed of data generation, sharing, and consumption is fast enough to qualify as Big Data? What level of structural variety is complex and unpredictable enough? Is there a threshold and if so, what is it?

A second broad group of definitions addresses these questions by defining Big Data as information that requires special methodology to handle.2 A National Institute of Standards and Technology definition exemplifies these: “Big Data consists of extensive data sets—primarily in the characteristics of volume, variety, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis.”3

Some definitions, such as that of the McKinsey Global Institute, expand on this idea to say that Big Data implies “data sets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”4 As computing technology and methodology advance to handle increasingly complex information structures and these techniques eventually become conventional database systems, does this information cease to be Big Data?

A consensus definition that can be applied in health care suggests that Big Data “represents the Information assets characterized by such a High Volume, Velocity, and Variety to require specific Technology and Analytical Methods for its transformation into Value.”2 In health care and radiology, one important way to generate added value is through the potential role for Big Data in decision support. Technologically augmented decision support has tremendous potential to enhance and transform the way the physician—and specifically the radiologist—brings value to patient care. Big Data represents an opportunity to mobilize reservoirs of information to make medical imaging more valuable to patients. The process of achieving value in terms of medical decision support does not remove the clinician or radiologist, but instead, provides easier access to information that might otherwise be inaccessible, inefficient, or difficult to integrate in real-time for the consulting physician. When this information is distilled in a way available to the radiologist, it becomes knowledge that can positively impact the clinician’s judgment in a personalized way in real-time (Fig. 1).4



Society has seen major efforts in the fields of business, advertising, government, science, and health care to amass vast data collections of uncertain significance, in general, because of the belief that such collections have some value, referred to by one author as a “scientism” that such data can be exploited into revealing profitable insights. This is not a particularly new phenomenon—it was remarked on in a discussion of the state of data science in a talk given by Dr. Donoho in 2000.5 Yet, 17 years later, this expansive view of data science is in its infancy. The Human Genome Project is one example wherein a huge investment was made in a complex data set to generate insights about biology, particularly in the area of protein expression, and yield discoveries to treat human diseases.6 In a sense, the genome sequence of a single individual, n=1, constitutes Big Data. Sequencing a patient’s genome for medical information is far from routine clinical practice today, largely because of the uncertain significance of this information; however, increasingly, health care data are available to computational resources in electronic formats. The purpose of this review is to provide an understanding of the concept of Big Data, to explore opportunities to improve the accuracy of diagnosis and management recommendations for patients, to emphasize the importance of the radiologist’s involvement in this emerging area, and to discuss some of the challenges ahead.

Back to Top | Article Outline


There is a great deal of variability in the way in which medical information in the electronic health record is collected and formatted. Various software programs utilize and share information in proprietary formats including data in the form of free text notes, billing statements, and laboratory values, data generated from monitors and automatic sensors, images and videos, and data that may be missing or inaccurate. This brings even more challenges for determining how to appropriately analyze and compare information from inconsistent sources. Attempts to create large, complete databases such as the Million Veteran Program have been made to minimize this limitation for Big Data in health care.7 New techniques in data analysis, utilizing imputation algorithms (substitution based on statistical methods such as regression), and machine learning methods have been developed to address missing values.8,9 Images exemplify the complex, multidimensional data that have presented new challenges and opportunities in the arena of data science. With imaging and videos, there is an element of “hyper-informative detail” about even a single observation—that is,a measurable variable in the dimensions of thousands or billions can be generated.5 Just as the genome sequence of an individual is Big Data, an imaging study can yield Big Data from the high-dimensional nature of this visual information. In mathematics and computer science, this brings about unique obstacles—that is, modeling multidimensionality as a function and developing methods to query images for this information is a sophisticated challenge.5

Major corporations such as Facebook have embraced the explosion of information (250 billion photos and 350 million more each day) by committing to open-source projects and information sharing.10,11 Radiologists and researchers should strive to make progress toward harnessing Big Data in imaging, which has the potential to lead to new, advanced clinical decision support, personalized diagnostic and prognostic tools, and the ability to optimize individual patient outcomes in ways previously not possible.

Back to Top | Article Outline


Big Data is appealing for its promise of generating breakthroughs in medicine. The idea has gained momentum with the National Institutes of Health and National Cancer Institute’s Cancer Moonshot initiative, which includes a plan to “Unleash the Power of Data” as a strategic goal, aiming to increase access to electronic medical records, genomic data, and large data sets of medical information.12

To understand the significance that better decision support could have for the radiologist, it is useful to review some classic statistics concepts such as the importance of pretest probability. Take, for example, a hypothetical imaging study that could diagnose lung cancer with 98% sensitivity and 99% specificity—in other words, it accurately detects cancer in 98% of patients who have the disease, and 99% of patients who do not have lung cancer will test negative. This hypothetical test would be far more accurate than virtually any other study in diagnostic imaging. Consider a test in a hypothetical population wherein 10 of every 50,000 people have lung cancer. Intuitively, it may seem reasonable that the presence of this imaging finding would equate to a high likelihood of having the disease. Regardless of the intuitive plausibility, a patient who tests positive only has a 1.9% chance of having lung cancer. This is because of the low prevalence of disease whereby only 10 people of 50,000 have the disease but the 1% false-positive rate implies that 500 people will test positive of 50,000 who do not have the disease. The pretest probability often has a major impact on the interpretation of any diagnostic test. As in this hypothetical example, the effect of pretest probability can be substantial. Bayes Theorem is an important statistical approach that emphasizes the importance of pretest probability for conditionals. Unfortunately, currently deployed computer-assisted detection and diagnosis algorithms for mammography and lung nodule detection and other coronary artery disease (CAD) algorithms do not take into account data that could help to determine probability of disease.

Radiology is a diagnostic field. A sophisticated, reliable system for incorporating the pretest probability of disease could enhance the value of information that can be obtained from imaging studies. Interpreting images without the pretest probability is a crude, one-size-fits-all approach—the opposite of the personalized, precision medicine that the philosophy of “Imaging 3.0,” championed by the American College of Radiology, emphasizes developing for our patients. It is not realistic that any physician could be aware of the entire context provided by each bit of data available for each patient. This problem is compounded by the modern era of ever increasing individual productivity demands. Instead, Big Data methods could one day glean information about pretest probability from the electronic health record, in real-time, to refine actionable results in diagnostic imaging studies.

Currently, expert/consensus-based appropriate use criteria and screening guidelines exist to reduce inappropriate utilization of imaging studies;13 however, no systems are currently available to suggest an appropriate test or a diagnosis on the basis of real pretest probability generated from the medical records. Consider some of the tools currently available for the detection of pulmonary nodules. The American College of Radiology Lung-RADS version 1.0 classification provides risk stratification criteria on the basis of whether the patient’s history meets screening guidelines and how the lung nodule features including size of the nodule and attenuation compare with prior imaging.14 The Fleischner Guidelines perform a similar function for incidentally discovered nodules, including generalized consideration of the patient’s age, smoking history, and environmental exposures. For pulmonary nodules, these guidelines place patients into broad categories of probability risk; however, personalized pretest probability is not widely available for every patient.15,16 Data made available by the National Lung Cancer Screening Trial (NLST) and emerging lung cancer screening registries can better personalize these diagnostic investigations.17 In addition to stratification offered by Lung-RADS or Fleischner criteria, web-based and picture archiving and communication system (PACS)-integrated tools will increasingly allow physicians and patients to provide more personalized risk assessment. With these tools, the clinician will select among personalized associated risk factors to inform analysis of pulmonary nodule characteristics.17–20 These tools work by comparing patient characteristics to a matched cohort from the NLST data set, or other data sets. Image databases and health registries should strive to incorporate data elements such as an image database with structured annotation, relevant clinical data, and final pathologic diagnosis to allow the development of similar interfaces for customized risk assessment tools.21,22

Many tools have been developed to risk stratify patients into categories of pretest probability for CAD by generalizing patients into low-risk, medium-risk, and high-risk categories. Examples such as the Diamond and Forrester method, the Duke Clinical Score, and the Framingham Risk Score incorporate prior clinical history of cardiac events, certain characteristics of the chest pain, family history, medical history, age, sex, and results of a lipid panel.23–26 Imaging findings have been used in this type of risk stratification as well, with coronary calcium scoring.27 These tools assist with the interpretation of diagnostic tests for CAD but have inconsistencies and limitations leading to variable pretest probability estimates (ACR Appropriateness Criteria Chronic Chest Pain).28,29 In cardiac imaging, personalized risk assessment tools using large, cross-referenced data sources are not available. Big Data may ultimately provide a vehicle for achieving personalized, precision medicine in cardiac imaging; however, there is much work to do before these tools are widely available.

Cardiac image interpretation may be enhanced by a Big Data approach to visual data. Cardiac single-photon emission computed tomography and positron emission tomography studies are high-dimensional visual data sets in which images and other data from regions of the heart can be digitized, quantified, and analyzed using artificial intelligence approaches. A visual representation of regional abnormalities in the heart can be created in the form of signal intensity maps, generated by comparing the visual signals to images from sex-matched and radiotracer-matched patients, in several hundred regions of the heart.30 Even more personalized regional signal maps could be based on patients matched on a broad array of information available in the electronic health record. Personalized image-augmentation techniques enhance the role of the radiologist in multiple ways. Tools could enable the radiologist to provide a more specific diagnostic evaluation by assisting in annotation of abnormalities. Furthermore, these tools can provide a radiologist with the enhanced ability to collaborate with other clinicians by providing more data-driven information about prognosis to inform more precise, personalized treatment plans. In addition, technology could highlight especially, subtle abnormalities and provide decision support for ambiguous visual patterns in order to increase the speed of interpretation while emphasizing clinical judgment and freeing the radiologist to devote more energy to collaborating with other physicians as a consultant or discussing the results with patients as an expert in diagnostic imaging technology.

Reaching the full potential of “Big Data” in medical imaging will rely on increased information sharing in health care. The problem of medical data sharing is complex, and it is important for patients, radiologists, departments, administrators, and institutions to work together to encourage democratization of anonymized health care information in order to encourage adoption of safe data-sharing standards. Effective data sharing will involve strategies for creating image data sets from multiple large cohorts that can be integrated and compared, decisions with regard to the security and accessibility of this information to researchers and clinicians, and the encouragement of standardized information formats to facilitate this process.18–20

Back to Top | Article Outline


Unrefined risk stratification results in wasted health care resources from unnecessary diagnostic tests. As of 2010, at least 8.6 million people in the United States were eligible for lung cancer screening.31 As of 2008, >10 million nuclear stress tests were performed annually.32 At least 20% of stress tests with imaging are either false positive or false negative.33 The NLST database demonstrated a 96.4% false-positive rate and a false-negative rate of between 0% and 20%.34 In this context, false positive implies a test that demonstrated a suspicious imaging finding, such as a lung nodule, that was later determined to be benign, whereas false negative implies an interval cancer developed without detection of a nodule. The cardiothoracic imaging community has an opportunity to improve diagnostic tools in order to avoid unnecessary diagnostic procedures and reduce wasteful health care spending. The Medicare physician fee schedule assessed 13.74 relative value units for single-photon emission computed tomography cardiac imaging (CPT code 78452-G) in 2016 for a reimbursement of $492.65 per study.35 Conservatively, nearly 1 billion health care dollars per year are expected to be attributed to inaccurate nuclear cardiac stress test results. Similarly, if all people eligible for lung cancer screening received the service, over 2 billion health care dollars would be lost on false-positive studies in 2016, based on the reimbursement of $254.93 for CPT code 71250 G0297.35 These economic costs should be viewed as incentives and opportunities to incorporate more of the available personal patient data with regard to risk into the diagnostic algorithm.

Back to Top | Article Outline

Data Integration

The most pragmatic way forward is to focus on the toolset needed to manage Big Data. The traditional method for validating a radiologic diagnosis is by testing a sample of observations against some form of gold standard. The sensitivity and specificity can be calculated as a measure of the strength of the test, along with positive and negative predictive values, if some information is known about the population in which these diagnostic tests will be applied. The traditional approach has been used for very large databases. Traditional data warehousing methodology has evolved to facilitate dealing with large data sets, with analytical processing of slice-and-dice, drill down, drill up, pivoting, online processing techniques, as well as data mining methods using pattern/association finding algorithms, analytic model construction, classification, and predictive modeling to make discoveries, which works well for relatively static data sets.36 For Big Data, data integration methods that allow for greater interoperability and user interaction are required, emphasizing decision support software applications, multidimensional data formats, and distributed data storage, illustrated in Figure 2.36



Looking at Figure 2,36 one can imagine how the Big Data approach could change the radiologist’s workflow. Imagine finding a lung nodule on a low-dose computed tomography (CT) screening examination. Instead of measuring the size and having to manually determine the category of the nodule, the imaging software could be integrated with a distributed data system that automates a query through the patient’s electronic information. The query results in a specific estimate of the pretest probability. A distributed data management system would extract the outcomes of thousands of other patients matched on many clinical and demographic features for reference. The significance of the nodule could be communicated with a much greater degree of clinical relevance. With high-dimensional data formats, one is not limited to a few frequently available data points—such as those seen in several of the data modeling tools today in cardiothoracic disease (ACR Lung-RADS version 1.0, Fleischner guidelines, Framingham Risk Calculation, the Duke Score, Wells Score).14,17,23,25,26,37 Instead, the pretest probability could be evaluated on as many variables as are available in the electronic record. Missing data points can be problematic for more traditional models of pretest probability, but could be addressed with imputation techniques in more sophisticated data management systems.

The ability to process complex visual data with data management systems could provide unique applications for the radiologist through the ability to query imaging databases for similar visual patterns. For example, while interpreting a CT scan with a lung nodule, an imaging query could be performed for similar nodule visual patterns and information provided about the final pathologic diagnosis, the patient’s prognosis, and even information about response to treatments. Real-time decision support using data integration could assist radiologists by highlighting visual patterns likely to be significant. Alternately, in unique and ambiguous situations, these tools could produce a collection of visually similar cases to support more informed, sophisticated interpretations. The achievement of this near futuristic scenario relies on a decentralized approach of data integration instead of the traditional data modeling, as described in Figure 2. With a flexible data integration platform, the strength of these capabilities would grow with the ability to incorporate new data over time.

Back to Top | Article Outline

Language Processing

Gleaning knowledge from electronic medical data and developing new automated tools will require the extraction of key information from diagnostic imaging reports and the electronic notes. Natural language processing (NLP) software allows semantic information to be automatically processed from unstructured text-based reports. Keyword-based queries are available using open-source NLP software to aid clinicians in diagnostic evaluation. Natural language understanding harnesses domain-specific knowledge representation to incorporate techniques in which computer algorithms process and extract information from NLP of human language. These techniques have been applied to chest radiograph reports using multiple strategies in a variety of settings, including critical care patients with only modest misclassification.38–41 Machine learning–based strategies in chest radiography reports were found to be superior to keyword algorithms in classifying reports on the basis of the likelihood of an acute lung injury diagnosis.42 In this example, keywords were suggested by domain experts as a conjecture, whereas machine learning strategies incorporate empirical data to enhance classification accuracy. These approaches are aimed at helping to organize medical imaging and report data into a structure that can be more easily studied. Machine learning approaches to language processing will facilitate the growth of large imaging data sets from more varied clinical settings, allowing more flexible analysis of the text in radiology reports for data mining and development of new image-based clinical tools.

Enrichment in natural language generation technology to generate automated structured reports (SRs) may also enhance clinical workflow and reduce the ambiguity of radiology reports.43 Documentation is time consuming and competes for physicians’ time with activities that could provide more value to patient care.30,44 Automated generation of standardized reports through software with standardized options selected by the radiologist may significantly expedite some of the redundant tasks of documentation and image interpretation—such as incorporating quantitative features—while reducing errors and creating more structured data for analysis. Natural language generation has been used in mammography reporting through many commercially available applications, such as MagView, and was recently described in cardiothoracic imaging. A custom software program has helped radiologists generate standardized reports of screening tuberculosis chest x-rays.43 The interpreting radiologist makes selections with regard to aspects of the image through a feature integrated into the user interface, and standardized reports are generated based on those selections. High productivity was demonstrated with sensitivity within reference ranges for screening radiographs, demonstrating feasibility of this workflow in cardiothoracic imaging. Residents interpreted 7580 relative value units per full-time equivalent of chest radiographs in addition to their usual workload at a busy major academic medical center per year on average over 9 years.43 The increased use of structured imaging reports may substantially aid the collection of clinical information for data mining, simplifying some of the natural language understanding challenges in interpreting image reports.43,45

Back to Top | Article Outline

Image Processing

Comparing and combining data from multiple image databases is, in general, difficult. An image database may include image reports, pathologic, genetic, and other clinical information. Frequently, images are segregated from the nonimage information, often referred to in imaging informatics as “metadata.” Extracting pertinent free text information in order to combine it with image data is difficult and laborious. Images are almost universally stored in the native Digital Imaging and Communications in Medicine (DICOM) format in which they were acquired. Typical DICOM syntax contains image data and technical image acquisition detail, but does not directly include interpretive information routinely. Image annotations largely remain in a proprietary form that does not transfer easily from one PACS environment, vendor-neutral archive, or enterprise viewer to another. This has resulted in major challenges when switching from one PACS vendor to another.

A standard for annotation of images would allow computer programs to access and search the image and metadata in a way that would facilitate large-scale data mining, integration of multiple clinical research initiatives, algorithm development, testing of new applications, and even communication with providers and patients through the enterprise viewer.46 Annotation interoperability is still an area of active development in radiology. DICOM-SR is a well-established but underutilized standard supported by several PACS vendors that incorporates structured interpretative information together with the image and usual DICOM information.46,47 DICOM-SR expands on the traditional DICOM standard by incorporating codified elements of the diagnostic report. Text, links to images, spatial and temporal coordinates, computer aided detection and diagnosis results, waveforms, and other composite objects can be stored along with information with regard to the acquisition of the imaging study performed, all within the DICOM file. This expands the interoperability of the DICOM and also creates opportunities for data mining. Although DICOM-SR provides the framework for storage, it requires mechanisms to capture complex report elements. Standardized or universally accepted annotation transfer syntax may facilitate the integration of image databases on a large scale. Progress is being made in developing structures to achieve this.

Annotation image markup (AIM), initially developed as part of the National Cancer Institute’s caBIG program, allows the user to annotate images and features within those images with graphical symbols containing descriptive information. Standardized templates are being developed for certain disease processes. For example, a Glioblastoma multiforme template was designed to include 26 features comprehensively describing the tumor morphology.48 Templates can be utilized to establish common data elements accessible for searching or for analysis by a computer. A similar use of AIM was recently demonstrated for lung cancer screening to translate annotations into structured report text.45 These templates leverage standardized lexicons such as the Systematized Nomenclature of Medicine (SNOWMED) and the Unified Medical Language System, both for core medical terminology, and RadLex specifically for radiologic vocabulary.46,47 The recent incorporation of AIM into the DICOM standard encourages a vendor-neutral approach to complex annotation design, which can be serialized to XML or DICOM-SR.

Technology-driven breakthroughs in medical imaging information systems will require (1) accessible databases of clinical patient descriptions, laboratory values, radiology reports, and (2) sharing of large medical image data sets. Real-time processing of Big Data in integrated clinical databases may allow for patterns and trends to emerge that will help personalize the management of individual patients.49 Some large data sets of cardiothoracic images are currently available, primarily in the realm of cancer imaging, an area of medicine wherein there has been great enthusiasm for progressing toward personalized medicine. Organizing these clinical data sets is a challenging process, requiring a balance of clear structure and accessibility, with information security to protect individual patients. Internet image libraries of radiology teaching files have utilized functions to deidentify patient data.49–51 This concept has expanded to include other types of patient medical data in a “virtual hospital” where users can explore real patient data for educational or research purposes. Publicly available, deidentified imaging data sets (and eventually entire anonymous virtual hospital data sets) help to reduce barriers and encourage democratization of the robust amount of medical data available today in order to discover new knowledge.52–59

Back to Top | Article Outline

Publicly Available Clinical Data sets in Cardiothoracic Imaging

The National Cancer Institute offers the Cancer Data Access System, which processes requests to access the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial and the NLST data. Available information includes clinical data from the trial, images, and biological specimens.60 The Cancer Imaging Archive is another large database for cancer imaging that includes freely accessible image sets, many of them including cases of patients with lung cancer. Genomic and pathologic data are integrated into these databases, which serve as a valuable source for data mining. The applicable cardiothoracic image sets within The Cancer Imaging Archive are summarized in Table 1.61 Descriptions of the imaging data from NLST and PLCO are included in Table 2.60 Steps for accessing the NLST and PLCO data sets are described in Figure 3.60







The Cardiac Atlas Project is a large database of normal cardiac images and pathology developed for clinical, research, and educational purposes (pending approval by a user review process). The Cardiac Atlas Project includes 3000 cardiac imaging examinations (along with other clinical data) and uses exclusively open-source software.62 To truly reach the scale of Big Data comparable to other data science initiatives and to optimize usefulness for machine learning, it will likely be necessary to expand shared databases by orders of magnitude to the level of millions of studies. This scale will allow deep learning that could vastly improve the capabilities of existing algorithms to provide added value.

Shared, accessible databases of cardiothoracic images with multisystem interoperability will be essential to advances in cardiothoracic imaging informatics.47 Currently, talented innovators in data science interested in medical applications face barriers in accessing imaging and clinical data, which limits progress.40 In addition to solving the technical challenges of amassing effective shared databases, as discussed above, there is a need for enhanced image registry reporting, accessibility, and health policy facilitating the secure, ethical exchange of such data.28

The Centers for Medicare and Medicaid Services requires providers of CT lung cancer screening to report data to a registry such as the Lung Cancer Screening Registry (LCSR) established by the American College of Radiology.63 The express purpose of this data reporting requirement is to confirm compliance with the coverage criteria, although there is also great potential for research applications. The reporting requirements include the following information: the facility, reading radiologist, patient identifier, ordering practitioner, CT scanner model and manufacturer, indication for the study, details of smoking history, effective radiation dose, screen date, initial or subsequent screen, and details of the lung nodule identification, classification, and reporting system. The data requirements are based on the minimum requirement to verify coverage of the services, without requiring additional potentially useful information. The reporting requirement is limited out of concern that excessively burdensome requirements may result in decreased access to CT lung cancer screening.48 Although it is not required by Center for Medicare and Medicaid Services, the LCSR is encouraged to include information on the size, location, and quantity of lung nodules, including benign nodules. Reporting of incidental but clinically significant findings, follow-up diagnostic tests, long-term outcomes, and adverse events are also encouraged, but not required.48 Amassing this type of data would be of benefit for further testing and refinement of the screening algorithm. Imaging data such as DICOM tags are not included in the required registries at this time.

Although some are concerned that cumbersome reporting requirements will limit access to cancer screening services, large-scale informatics initiatives may lead to breakthroughs in the efficiency of screening. Ways to more appropriately risk stratify patients for additional work-up may ultimately increase accessibility to screening examinations by allowing resources to be utilized more efficiently. To maximize technology-driven advancement in radiology, screening registries could include an image database of image files annotated with significant findings. Automated standardized reports for screening examinations, such as those introduced for tuberculosis,43 could be applied to lung cancer screening, in order to increase the efficiency of generating reports and help the formation of shared, multi-institutional databases.43,64 Registries for other frequently performed medical imaging studies beyond oncology imaging studies should be pursued as well.

Back to Top | Article Outline


The advancement of the Big Data toolset described above presents unique challenges and ethical implications. Similar to other clinicians attempting to harness Big Data for value, radiologists and imaging vendors are relatively inexperienced in finding ways to glean knowledge from unstructured masses of information. Without being extremely careful, we could be led astray by the hype associated with Big Data, which has already had notable failures in health care applications.

The Google Flu Tracker (GFT) uses internet search data to predict the Center for Disease Control influenza surveillance data. In general, GFT has been regarded as an excellent model for the potential of Big Data in epidemiology. Yet, GFT has had some high-profile disappointments, including completely missing the 2009 H1N1 nonseasonal flu epidemic, as well as severely overestimating peak influenza levels during the 2013 outbreak. Interestingly, for the initial version of GFT that missed the 2009 epidemic, developers reported actively excluding some search terms (seasonal terms, eg, high school basketball) despite strong correlation to CDC influenza data, suggesting that the GFT was honing on seasonal trends and merely detecting the occurrence of winter.65

With Big Data, there is also the potential for “Big Error.” Poor quality data will still yield poor quality results; having more data will not change this.66,67 Systematic errors or bias in how data are generated must be confronted to improve the analytic process. We are not liberated from questioning the data by the use of automated processing. Instead, these questions become even more important. In this sense, the idea of “theory-free data” or complete empiricism is an illusion. A theoretical framework always underlies data acquisition with value-based assumptions made by someone.68–70 The evolving role of the radiologist must not be one in which we are blindly fed knowledge by a machine, but instead one in which we are responsible for clinical and scientific oversight of the flood of information. Previously, one would make targeted observations on the basis of expert sentiment, for instance, reading extensively through the electronic medical record and measuring a finding on an imaging study over multiple consecutive studies in as similar a manner as possible—essentially manually locating the data. In the era of Big Data, a plethora of information will be available for radiologists to receive and evaluate on the basis of value judgments, prior understanding, context, and rational thought. The need for sound judgment is paramount and is displaced but not replaced.

Genomics research has attempted to improve its signal to noise ratio with more stringent validation standards, requiring stronger statistical significance criteria and requiring replication of findings. Yet, the signal to noise ratio in bioinformatics is quite low: in genomics, fewer than 1% of published research findings address validation and outcomes in the “real world.”67 Compared with genomics research, Big Data in radiology is in an early stage, and creative solutions for managing the signal to noise ratio in imaging informatics will have to be developed. A strong foundation in traditional validation techniques and clinical trials are essential for moving Big Data–generated hypotheses forward. However, Big Data could still be helpful in generating real-time data in clinical scenarios where more validated studies have not been performed and better information is not available.

Big Data applications intended for beneficial purposes could be inappropriately used in ways that present ethical problems. For example, Global Positioning System applications could be used to provide traffic predictions to an individual—or to stalk that individual. A mobile app could use a patient’s purchasing habits and activity level to generate a fitness score for the patient, which could help the patient track and improve healthy lifestyle goals in collaboration with health care providers; this fitness score could also become a basis for insurance or employment discrimination.71 Indeed, civil liberty violations have resulted from the pervasiveness of technological surveillance.72 Standards of data stewardship must be established to protect the use of data, which should be especially robust in terms of health care data. Standards for Big Data have been proposed for industry, which may be applicable to health care as well, including the need for transparent data stewardship practices, special policies, and consent for the handling of personally identifiable information, the designation of specific institutional professionals and/or boards to handle Big Data analytics and integrity of data processes, as well as transparency about the transfer or sharing of any data.71

In medical imaging, important medicolegal issues exist around Big Data. Informed consent and privacy must be addressed. Technology to assist with the deidentification of imaging may prove helpful, but, as more information is incorporated into data management systems, key features that may be personally identifiable must constantly be considered. There are also issues of data ownership for research purposes. The development of proprietary methods and technology can be thought as a “data divide.” Lack of data sharing, lack of open-source software, and lack of open agreed-upon standards results in a disparity between institutions that have access and means to analyze larger and more complex data sets and those who do not. It is important that, as stewards of Big Data, radiologists consider unintentional or intentional discrimination that may result from large isolated aggregates of data.73

Although medical information sharing can be difficult to regulate from an ethical and legal perspective, secure health information exchange programs can provide great value to patient care, helping providers obtain more complete information about a patient’s medical history, which could result in the reduction of redundant or unnecessary tests. Multi-institution health information exchange programs can increase availability of information to providers in real-time, such as the Chesapeake Regional Information System for the Patients (CRISP) clinical query portal utilized in Maryland.74 The Radiological Society of North America’s (RSNA) Image Share is a health information exchange utility that allows diagnostic images to be shared in a secure manner between imaging providers, patients, and clinicians.75 The ability to readily access a patient’s prior imaging studies is an important part of effective medical management, especially for patients who receive care across multiple institutions. The development of secure multi-institution image-sharing networks is one way that “Big Data” concepts of interoperability directly benefit an individual patient at the point of care. Further developments of the data structures in these networks may result in the development of regional or national registries of information for data queries that could be used for research or personalized medicine.

Back to Top | Article Outline


The National Cancer Institute has highlighted chest imaging as a potential area of interest for the creative application of data. The National Cancer Institute Division of Cancer Prevention and Division of Cancer Treatment and Diagnosis sponsored the third annual Data Science Bowl in 2017, a competition for data scientists to produce machine learning algorithms to improve upon the detection of lung cancer through screening low-dose CT.75 These examples simply represent the inevitability of the role of harnessing medical records and imaging data. Cardiothoracic radiologists have the opportunity to lead in this regard.

There is a focus in health care policy on establishing the added value of medical services on the basis of evidence. This has been set as a national agenda by the creation of organizations such as the Patient-Centered Outcomes Research Institute to “assist patients, clinicians, purchasers, and policy makers in making informed health decisions by advancing the quality and relevance of evidence concerning the manner in which diseases, disorders, and other health conditions can effectively and appropriately be prevented, diagnosed, treated, monitored, and managed through research and evidence synthesis” by promoting comparative effectiveness research.76,77 Specific knowledge of where imaging adds value to patient outcomes and health care expenditures must be demonstrated. If the radiology community does not utilize this opportunity to clarify its value-added equation and minimize wasteful practices that do not add value, then radiologists run the risk of commoditization and marginalization. Harnessing Big Data is one approach to yielding evidence-based solutions.

Although there are challenges and problems with early Big Data applications in health care, the diagnostic imaging community is making steady progress. Diagnostic ultrasound research encountered setbacks related to a low signal to noise ratio. In fact, some of the early investigations of diagnostic ultrasound in the 1950s were curtailed for several years after researchers at the Massachusetts Institute of Technology found that ultrasound signals suggestive of brain tissue could in fact be seen in an empty skull, stating that “compensated ultrasonograms may contain some information on brain structure” but “are too sharply ‘noise’ limited” and “of unqualified clinical value.”78 At one time, ultrasound signals were dismissed as mostly artifacts, but these challenges were overcome, and ultrasound is a prominent diagnostic tool today. The data science trend will ultimately push forward with radiology or without it. Although currently we may not have well-validated Big Data applications in diagnostic imaging in general use, researchers and physicians should keep an open mind to the unique challenges of today’s data landscape in order to foster innovation and prepare to fully embrace health care’s modern digital revolution.

Back to Top | Article Outline


How do we know what we know? How do we learn from observation? In other words, how do we extract “knowledge” out of “data?” This is a fascinating question occupying the mind of thinkers from the dawn of human civilization. The answer to this question has been well versed in epistemology (from Greek ἐπιστήμη, epistēmē, meaning “knowledge”, and λóγος, logos, meaning “logical discourse”), a branch of philosophy dealing with the “Theory of Knowledge.” With the emergence of machines with computational powers in the 20th century, this “theoretical/philosophical” question transformed into a “technical/pragmatic” question: How do we use computational power to “discover/extract” knowledge from data?1 (data=given (information) [Latin]; plural of datum, “(thing) given,” neuter past participle of dare “to give”). The similarities between this new technical necessity and the well-established “mining industry” were the basis for coining many new terms, such as “data mining,” “knowledge extraction,” “knowledge discovery,” etc. The metaphor in this situation was as if we were extracting valuable minerals (knowledge/wisdom) from the earth/orebodies (data).79

As computing power and the technological landscape in daily life have transformed to the constant generation of complex electronically transmitted and stored information, some new epistemological questions arise. Discussion of “Big Data” brings to the table questions of what constitutes data. What level of uncertainty are we willing to accept from the new types of electronic information that have become available? There is a shift of focus in the use of Big Data from hypothesis-based causal models of information to correlation-based discovery. Using classic statistics, the data are, in a way, supporting evidence that validate or disprove human-initiated thought. With machine learning, the data themselves independently can lead to discovery. The data are the starting point; they are analyzed with the belief that they contain valuable discoveries. In turn, this brings about fear of uncertainty. How much uncertainty about the accuracy of the data themselves can we accept or the bias in how the data are collected? Will coincidental correlations lead us astray by suggesting false relationships?

From a historical perspective, the concept of Big Data is relatively new, but these questions are not new in epistemology. In the middle ages, rationalism—the idea that reason rather than experience is the basis for certainty in knowledge1—was prevalent. Aristotle contrasted rationalism with an emphasis on sensory experience, on measurements and observations, which formed the basis of empiricism.80 Shifts in acceptance of empiricism as a valid theory of knowledge required centuries of development.81–83 Galilei, whose contributions to astronomy, physics, mathematics, and philosophy are legendary, was imprisoned in his home and forced by the Roman Inquisition to recant for his observation-based ideas of heliocentrism, including his work, the Dialogue Concerning Two Chief World Systems, which was “vehemently suspected of heresy.”84–86 In the 17th century, John Locke advanced the idea that all knowledge must be based on measurable experiences and observations.42

In turn, some data scientists have recently begun to argue that the concept of Big Data is the end of any element of rationalism or hypothesis-driven knowledge. The editor of Wired generated much discussion by advancing his idea that when you have petabytes of data, “correlation is enough.”87 So is Big Data the death of theory? Social scientists have since grappled with this question, with some insightful thoughts that Big Data primarily points us in the direction of what we decide to test. While Big Data may free us from the a priori requirement of a hypothesis, they do not obviate the role of mechanisms, underlying processes, and theory. David Jensen articulated this idea with regard to social science research: “we don’t just want to find statistical associations, we actually want to uncover the underlying causal processes by which social systems work … The data themselves don’t tell you about cause and effect, there’s actually (an) often very complex inferential process you have to go through in order to extract from the data the things that you really care about.”87 In medicine, for the foreseeable future, the same sentiment holds true. A contextless correlation is of little value. It does not offer any explanation of why we should care about the data, nor does it identify the truly important questions.

Yet, diagnostic radiology currently operates on the basis of knowledge, that is, uninformed by the vast amount of observational data, stored electronic health information, and imaging data. Some radiologists may be skeptical about mining these imaging data with artificial intelligence algorithms to yield diagnostic information, because of concerns that this will be ineffective, lead to spurious assumptions, or result in job marginalization for radiologists. The current practice of radiology is, in a broad sense, a model of rationalism today, and these concerns are largely associated with a fear of the unknown. The best approach to avoid being marginalized by the impact of Big Data on radiology is to embrace these improvements and demonstrate that diagnostic radiologists can provide valuable judgment in understanding and applying such evidence-based tools to diagnostic imaging. The radiologists can manage the information and synthesize knowledge as experts in imaging technology with close understanding of the impact on patient care.

The belief that diagnostic imaging is best served with a fundamental understanding of logical principles has led to problems in radiology as well. Consider the example of functional magnetic resonance imaging (fMRI) software, which performs imaging data analysis on the basis of assumptions that have not been fully validated using real data sets (empirically tested). A recent paper investigated the rate of false-positive findings from fMRI of the brain in a sample of 499 healthy control patients, expecting a 5% incidence of false positives using the standard 5% significance threshold. Surprisingly, 3 commonly used fMRI software packages generated up to a 70% false-positive rate, calling into question years of fMRI research in the brain.88 The principles of rationalism in diagnostic imaging have the potential to be tremendously problematic.

In general, the science of medical imaging stands to gain from empirical analysis of patient imaging data from large, shared data sets. Previously unknown associations of imaging findings with patient outcomes could be discovered on an empirical basis from available imaging data, and known associations can be validated. Although the potential exists to be unknowingly led astray by inaccurate data or false correlations, in many situations, Big Data methods can highlight areas of discovery for more conventional methods of validation, such as randomized trials. For clinical situations in which rigorous prospective testing is not possible, there is no model to validate correlational findings. Advancing medical science requires us to embrace that challenge with new models in order to deal with uncertainty. If we consider this problem in the context of the historical development of empiricism and deductive reason, ultimately, a new or hybrid scientific method may be needed to address the modern challenges of data science.

Back to Top | Article Outline


Big Data presents unique challenges in imaging and health care in general, but the trajectory is toward increasing capability to access and use expanding, complex types of digitized information. In medical imaging, much work remains before bearing the fruits of Big Data in routine clinical practice—which might result in more efficient, personalized; data-driven decision support tools, more sophisticated, precise imaging reports and clinical consultations; and a new emphasis on the radiologist as a valuable communicator and consultant for colleague clinicians and patients.

In the future, large shared publicly or commercially available data sets will be necessary to fully realize the potential of Big Data in cardiothoracic imaging and radiology more broadly. Connecting access to large-scale resources may accelerate the ability to use Big Data to mine for new previously unexplored knowledge. Institutions, private industry, and government are encouraged to continue in the shift from the practice of individual data silos toward adoption of a philosophy of data democratization in order to further progress. Cardiothoracic radiology is particularly well positioned to play a leadership role in this regard.

At this point, we are on the cusp of using Big Data to revolutionize the practice of health care. Validation and justification of each application with more traditional experimental methods is still necessary to ensure safe and beneficial outcomes. Efficacy, safety, and survival benefits will certainly continue to influence Food and Drug Administration clearance of these techniques. The process will likely be labor intensive and will require the collaboration of data scientists and radiologists. Although technical and ethical challenges of Big Data exist, it is important for radiologists to caution against shying away. Instead, we should challenge ourselves to contribute creative solutions to these problems. The role of imagers will, one day, be transformed by Big Data. An exciting, game-changing opportunity exists to create new methods radiologists can harness to impact patient care, to become more efficient in the face of increasing practice demands and available information and to re-define the role of the next generation.

Back to Top | Article Outline


1. Oxford English Dictionary. Available at: Accessed February 26, 2017.
2. De Mauro A, Greco M, Grimaldi M. A formal definition of Big Data based on its essential features. Libr Rev. 2016;65:122–135.
3. NIST. NIST Big Data Interoperability Framework: Volume 1, Definitions—NIST.SP.1500-1.pdf. 2015. Available at: Accessed June 11, 2017.
4. Wang H, Xu Z, Fujita H, et al. Towards felicitous decision making: an overview on challenges and trends of Big Data. Inf Sci. 2016;367–368:747–765.
5. Donoho D. High-dimensional data analysis: the curses and blessings of dimensionality in Mathematical Challenges of the 21st Century. 2000. Available at:
6. National Human Genome Research Institute (NHGRI). All About The Human Genome Project (HGP). 2015. Available at: Accessed June 12, 2017.
7. Million Veteran Program (MVP). Available at: Accessed February 26, 2017.
8. Luo J, Wu M, Gopukumar D, et al. Big Data application in biomedical research and health care: a literature review. Biomed Inform Insights. 2016;8:1–10.
9. Seffens W, Evans C, Taylor H. Minority Health-GRID Network. Machine learning data imputation and classification in a multicohort hypertension clinical study. Bioinforma Biol Insights. 2015;9:43–54.
10. Kotenko J. Facebook reveals we upload a whopping 350 million photos to the network daily. Digital Trends. 2013. Available at: Accessed February 26, 2017.
11. Our Projects. Facebook Code. Available at: Accessed February 27, 2017.
12. The Cancer Moonshot. Report of the Cancer Moonshot Task Force. Available at: Accessed October 17, 2016.
13. American College of Radiology. ACR Appropriateness Criteria. Available at: Accessed July 29, 2017.
14. McKee BJ, Regis SM, McKee AB, et al. Performance of ACR lung-RADS in a clinical CT lung screening program. J Am Coll Radiol. 2015;12:273–276.
15. MacMahon H, Austin JH, Gamsu G, et al. Guidelines for management of small pulmonary nodules detected on CT scans: a statement from the Fleischner Society. Radiology. 2005;237:395–400.
16. Naidich DP, Bankier AA, MacMahon H, et al. Recommendations for the management of subsolid pulmonary nodules detected at CT: a statement from the Fleischner Society. Radiology. 2013;266:304–317.
17. Pinsky PF, Church TR, Izmirlian G, et al. The National Lung Screening Trial: results stratified by demographics, smoking history, and lung cancer histology. Cancer. 2013;119:3976–3983.
18. Hostetter JM, Morrison JJ, Morris M, et al. Personalizing lung cancer risk prediction and imaging follow-up recommendations using the National Lung Screening Trial Dataset. J Am Med Inform Assoc. 2017;24:1046–1051.
19. Morris MA, Hostetter JM, Saboury B, et al. Personalized Patient Cancer Risk in Lung-Rads 1.0 and Fleischner Guidelines from NLST data. 2015.
20. Morrison JJ, Hostetter J, Wang K, et al. Data-driven decision support for radiologists: re-using the National Lung Screening Trial Dataset for pulmonary nodule management. J Digit Imaging. 2015;28:18–23.
21. Marcus PM, Pashayan N, Church TR, et al. Population-based precision cancer screening: a symposium on evidence, epidemiology, and next steps. Cancer Epidemiol Biomarkers Prev. 2016;25:1449–1455.
22. Dehavenon A. CT screening for lung cancer. N Engl J Med. 2007;356:743–747.
23. D’Agostino RB, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008;117:743–753.
24. Diamond GA, Forrester JS. Analysis of probability as an aid in the clinical diagnosis of coronary-artery disease. N Engl J Med. 1979;300:1350–1358.
25. Mark DB, Hlatky MA, Harrell FE, et al. Exercise treadmill score for predicting prognosis in coronary artery disease. Ann Intern Med. 1987;106:793–800.
26. Mark DB, Shaw L, Harrell FE, et al. Prognostic value of a treadmill exercise score in outpatients with suspected coronary artery disease. N Engl J Med. 1991;325:849–853.
27. Agatston AS, Janowitz WR, Hildner FJ, et al. Quantification of coronary artery calcium using ultrafast computed tomography. J Am Coll Cardiol. 1990;15:827–832.
28. Earls JP, White RD, Woodard PK, et al. ACR appropriateness criteria® chronic chest pain-high probability of coronary artery disease. J Am Coll Radiol. 2011;8:679–686.
29. Hoffmann U, Akers SR, Brown RK, et al. ACR appropriateness criteria® acute nonspecific chest pain-low probability of coronary artery disease. J Am Coll Radiol. 2015;12:1266–1271.
30. Dilsizian SE, Siegel EL. Artificial intelligence in medicine and cardiac imaging: harnessing Big Data and advanced computing to provide personalized medical diagnosis and treatment. Curr Cardiol Rep. 2014;16:441.
31. Ma J, Ward EM, Smith R, et al. Annual number of lung cancer deaths potentially avertable by screening in the United States. Cancer. 2013;119:1381–1385.
32. Berrington de Gonzalez A, Kim K-P, Smith-Bindman R, et al. Myocardial perfusion scans: projected population cancer risks from current levels of use in the United States. Circulation. 2010;122:2403–2410.
33. Arbab-Zadeh A. Stress testing and non-invasive coronary angiography in patients with suspected coronary artery disease: time for a new paradigm. Heart Int. 2012;7:e2.
34. Moyer VA. Screening for lung cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med. 2014;160:330–338.
35. Ahlman JT. AMA CPT® 2017 Standard Edition. 2016. American Medical Association; Chicago, IL.
36. Kune R, Konugurthi PK, Agarwal A, et al. The anatomy of Big Data computing: anatomy of Big Data computing. Softw Pract Exp. 2016;46:79–105.
37. Wolf SJ, McCubbin TR, Feldhaus KM, et al. Prospective validation of Wells Criteria in the evaluation of patients with suspected pulmonary embolism. Ann Emerg Med. 2004;44:503–510.
38. Asatryan A, Benoit S, Ma H, et al. Detection of pneumonia using free-text radiology reports in the BioSense system. Int J Med Inf. 2011;80:67–73.
39. Dublin S, Baldwin E, Walker RL, et al. Natural Language Processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf. 2013;22:834–841.
40. Hripcsak G, Austin JHM, Alderson PO, et al. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology. 2002;224:157–163.
41. Liu V, Clark MP, Mendoza M, et al. Automated identification of pneumonia in chest radiograph reports in critically ill patients. BMC Med Inform Decis Mak. 2013;13:90.
42. Locke J, Seth Pringle-Pattison A. An essay concerning human understanding. Harvester Press; Humanities Press, 1978.
43. Morris M, Saboury B, Bandla N, et al. Computer-aided reporting of chest radiographs: efficient and effective screening in the value-based imaging era. J Digit Imaging. 2017;30:589–594.
44. Dugdale DC, Epstein R, Pantilat SZ. Time and the patient-physician relationship. J Gen Intern Med. 1999;14(suppl 1):S34–S40.
45. Morris M, Saboury B, Hostetter J, et al. Web Interface for Computer Assisted Reporting of Structured Lung-RADS Reports Through Natural Language Generation. Radiological Society of North America Annual Meeting. 2015.
46. Channin DS, Mongkolwat P, Kleper V, et al. The caBIGTM Annotation and Image Markup Project. J Digit Imaging. 2010;23:217–225.
47. Clunie DA. DICOM structured reporting. PixelMed Pub. Bangor, PA; 2001.
48. Mongkolwat P, Channin DS, Rubin VKDL. Informatics in radiology: an open-source and open-access cancer biomedical informatics grid annotation and image markup template builder. RadioGraphics. 2012;32:1223–1232.
49. Simonite T. IBM’s automated radiologist can read images and medical records. MIT Technology Review. 2016. Available at: Accessed March 23, 2017.
50. RSNA Medical Imaging Resource Community (MIRC). Available at: Accessed March 23, 2017.
51. Pacsbin. Your personal cloud PACS. Available at: Accessed March 23, 2017.
52. D’Alessandro MP, Galvin JR, Santer DM, et al. Hand-held digital books in radiology: convenient access to information. Am J Roentgenol. 1995;164:485–488.
53. D’Alessandro MP, Galvin JR, Erkonen WE, et al. The Virtual Hospital: an IAIMS integrating continuing education into the work flow. MD Comput Comput Med Pract. 1996;13:323–329.
54. D’Alessandro MP, Galvin JR, D’Alessandro DM, et al. The virtual hospital: the digital library moves from dream to reality. Acad Radiol. 1999;6:78–80.
55. D’Alessandro MP, D’Alessandro DM, Bakalar RS, et al. The Virtual Naval Hospital: the digital library as knowledge management tool for nomadic patrons. J Med Libr Assoc. 2005;93:16–20.
56. Galvin J. The Virtual Hospital: the future of information distribution in medicine. J Am Med Inform Assoc. 1995;2:933.
57. Galvin JR, D’Alessandro MP, Erkonen WE, et al. The virtual hospital: a link between academia and practitioners. Acad Med J Assoc Am Med Coll. 1994;69:130.
58. Galvin JR, D’Alessandro MP, Erkonen WE, et al. The virtual hospital: a new paradigm for lifelong learning in radiology. Radiogr Rev Publ Radiol Soc N Am Inc. 1994;14:875–879.
59. Galvin JR, D’Alessandro MP, Kurihara Y, et al. Distributing an electronic thoracic imaging teaching file using the internet, mosaic, and personal computers. Am J Roentgenol. 1995;164:475–478.
60. NCI. Learn—PLCO—The Cancer Data Access System. Available at: Accessed June 13, 2017.
61. The Cancer Imaging Archive (TCIA). A growing archive of medical images of cancer. Available at: Accessed February 27, 2017.
62. Fonseca CG, Backhaus M, Bluemke DA, et al. The Cardiac Atlas Project—an imaging database for computational modeling and statistical atlases of the heart. Bioinforma Oxf Engl. 2011;27:2288–2295.
63. American College of Radiology. Lung Cancer Screening Registry. Available at: Accessed June 12, 2017.
64. Reiner BI, Knight N, Siegel EL. Radiology reporting, past, present, and future: the radiologist’s perspective. J Am Coll Radiol. 2007;4:313–319.
65. Lazer D, Kennedy R, King G, et al. Big Data. The parable of Google Flu: traps in Big Data analysis. Science. 2014;343:1203–1205.
66. Chiolero A. Big Data in epidemiology: too big to fail? Epidemiology. 2013;24:938–939.
67. Khoury MJ, Ioannidis JPA. Big Data meets public health. Science. 2014;346:1054–1055.
68. Salmond JA, Tadaki M, Dickson M. Can Big Data tame a“naughty” world?: environmental Big Data. Can Geogr Gographe Can. 2017;61:52–63.
69. Odoni NA, Lane SN. Knowledge-theoretic models in hydrology. Prog Phys Geogr. 2010;34:151–171.
70. Rhoads BL, Thorn CE. The Role and Character of Theory in Geomorphology. In: The SAGE Handbook of Geomorphology 59–77. Gregory K, Goudie A, eds. Thousand Oaks, CA: SAGE Publications Ltd. 2011. Doi:10.4135/9781446201053.n4.
71. Martin K. Ethical issues in the Big Data industry. MIS Quartery Exec. 2015;14:67–85.
72. Sadowski J, Pasquale FA. The spectrum of control: a social theory of the smart city. First Monday. 2015;20.
73. Mittelstadt BD, Floridi L. The ethics of Big Data: current and foreseeable issues in biomedical contexts. Sci Eng Ethics. 2016;22:303–341.
74. CRISP Clinical Query Portal—CRISP. Available at: Accessed March 23, 2017.
75. RSNA. RSNA Image Share. Available at: Accessed March 25, 2017.
76. Division of Cancer Prevention. Data Science Bowl Launched to Improve Lung Cancer Screening. 2017. Available at: Accessed June 12, 2017.
77. Kang SK, Lee CI, Pandharipande PV, et al. Residents’ introduction to comparative effectiveness research and Big Data analytics. J Am Coll Radiol. 2017;14:534–536.
78. Selby JV, Beal AC, Frank L. The Patient-Centered Outcomes Research Institute (PCORI) national priorities for research and initial research agenda. JAMA. 2012;307:1583–1584.
79. Woo J. History of ultrasound in obstetrics and gynecology, Part 1. A short history of the development of Ultrasound in Obstetrics and Gynecology. 2010. Available at: Accessed June 12, 2017.
80. Lakoff G, Johnson M. Metaphors we live by. University of Chicago Press. Chicago, IL; 2003.
81. Aristotle & Lawson-Tancred, H. De anima =: On the soul. Penguin Books; 1986.
82. Ibn Tufayl, M. ibn ‘Abd al-Malik & Goodman, L. E. Ibn Tufayl’s The Improvement of Human Reason: a philosophical tale. (The University of Chicago Press, 2009).
83. Long AA, Sedley DN The Hellenistic philosophers. Cambridge University Press. 1987.
84. Rizvi S. Avicenna (Ibn Sina)|Internet Encyclopedia of Philosophy. Available at: Accessed June 12, 2017.
85. Galilei G, Drake S. Dialogue concerning the two chief world systems, Ptolemaic and Copernican. Modern Library. 2001.
86. Santillana G. The crime of Galileo. Recording for the Blind & Dyslexic. 2010.
87. Anderson C. WIRED.The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. 2008. Available at: Accessed June 12, 2017.
88. Eklund A, Nichols TE, Knutsson H. Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. Proc Natl Acad Sci U S A. 2016;113:7900–7905.
89. NCI. CT Images—Learn—NLST—The Cancer Data Access System. Available at: Accessed June 13, 2017.

    big data; data science; informatics; data mining; machine learning; quantitative imaging; cardiothoracic imaging; oncology; nuclear cardiology; nuclear medicine and molecular imaging

    Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved