ARTICLE IN BRIEF
THE FIGURE shows an example of a hypothetical work queue. The first row demonstrates critical studies randomly placed within the work queue and waiting for a radiology interpretation. The second row demonstrates a neural network flagging several studies as being “high risk’, and then in the third row re-organizing the work queue so that the “high risk” studies are interpreted first by humans leading to a notable decrease in time to diagnosis (12 min to 3 min, and 27 min to 12 min).
Investigators developed a deep learning algorithm to provide rapid diagnosis of clinical head CT-scan images. The goal? To help triage and prioritize urgent neurological events, potentially accelerating time to diagnosis and care in clinical settings.
Scientists at the Icahn School of Medicine at Mount Sinai in New York have taught a computer to read head computed tomography (CT) imaging for serious brain events. While the computer program was less accurate than a neuroradiologist in correctly diagnosing a serious event, it was able to pull out the serious cases much faster than is humanly possible and to assist humans in identifying critical cases sooner, according to a study published August 13 in Nature Medicine.
The study authors, led by Eric K. Oermann, MD, a neurosurgeon and mathematician by training, said that the machine-learning algorithm could help neuroradiologists better triage the most serious cases. Once these cases are flagged, the neurologist can quickly diagnose and treat those patients that need immediate attention.
“The computer is faster at identifying critical events, which means the neurologists can start treatment earlier,” said Dr. Oermann. “Decreasing time to treatment can improve outcomes.”
Dr. Oermann began studying deep learning about a decade ago while working on his undergraduate degree in mathematics. One of his early interests was convolutions in higher spatial dimensions, such as 3D-CNNs, which is a similar technology to what is being used for supervised classification on 3D modeling and light detection and ranging data.
Three years ago, he and a colleague saw a string of clinical cases where they felt that the outcome would have been better if the imaging results were brought to their attention earlier. It got them thinking whether a 3D-CNN algorithm could be used to help classify CT images for acute neurologic events. (Dr. Oermann has always been interested in number crunching and even took a break during his neurosurgery residency to do a fellowship at Google.)
Could they teach a computer to identify critical problems? They began their investigation with tens of thousands of radiological medical records. They discovered that the language that radiologists use is highly structured. They compared the language in the reports to British novels, Amazon reviews, and Reuters' news stories and found that from a linguistic standpoint, the radiological reports were actually much simpler than the standard fare on Amazon, Reuters, or in novels. They taught the computer to read the reports — and then it was time to see how the program did with the scans. Would there be a match?
STUDY METHODS, FINDINGS
The investigators collected 37,236 head CTs that were annotated with a semi-supervised natural-language processing framework. There were also an additional 96,303 radiological reports. They wanted to simulate a clinical environment: a bustling emergency department, a busy radiological suite, neurologists and neurosurgeons trying to balance the patients ready to be diagnosed and treated. They wanted to test the machine's ability to accurately diagnose a critical event — an ischemic stroke, hemorrhage, or hydrocephalus — and the time it took to identify these serious incidents. They designed a randomized, double-blind, prospective trial, collaborating with Mount Sinai radiologists, neurosurgeons, and orthopedic surgeons, as well as bioengineers and bioinformatics specialists from Boston University.
DR. ERIC K. OERMANN: “The computer is faster at identifying critical events, which means the neurologists can start treatment earlier. Decreasing time to treatment can improve outcomes.”
The first challenge was to label everything on the images and test whether the machine could learn what the labels represented and if the labels could help the computer to correctly identify critical events. They created two types of classifications: a strong supervision, which means that the label would show the location of the ischemic stroke or hemorrhage or hydrocephalus, and weak supervision, which means that the label would represent a broader picture of ischemic stroke, hemorrhage, or hydrocephalus.
The labels were flagged as critical or non-critical and then ordered based on a probability of a critical event, explained Dr. Oermann. To test whether it works, they did a trial comparing image interpretation between radiologists and the computer to see how quickly each could identify a critical versus non-critical event.
They created two datasets: in one (a silver-standard label or weak supervision) the cases would be split into critical or non-critical. In another, a gold-standard label, it would include a manual review of the patient record to glean more information on what was going on with the patient, including information from multiple scans. (The average patient age in the samples tested was 60 years old, with an equal number of men and women. The cases came from the emergency department, inpatient units, and outpatient clinics.) The top three symptoms were headache (27 percent), altered mental state (17 percent), and dizziness/ataxia (9 percent).
The computer was trained to recognize labels that were set to flag a critical finding. The computer's response was tested against the findings from two radiologists and Dr. Oermann, a neurosurgeon. Not surprisingly, the machine-learner did much worse than the human experts. For gold-standard labels, the computer did slightly better than chance with around 73 percent accuracy. The physicians' accuracy was between 79 and 85 percent.
If the idea is to triage the cases, could the speed of splitting critical versus non-critical make up for accuracy? In the simulated clinical environment, the computer was 150 times faster at processing an image and flagging the urgent cases than the humans: 1.2 second versus 177 seconds, respectively. Even with the high percent of false positives (approximately one in five), the computer triaging led to decreased time in recognizing an urgent event.
DR. DAVID S. LIEBESKIND: “The study provides a good first step in developing an alert system to add information to the clinical paradigm. But neurologists have to understand that such an algorithm is not a diagnosis by itself. We have to be careful not to let this technology replace what we do as clinical diagnosticians.”
DR. ARTHUR W. TOGA: “A computer can remember everything, and we can't, so comparing past observations with new ones is vastly improved. The question is, when employing computational methods to triage cases how do you want to weigh your errors?”
“By choosing the right neural events to tackle, there was still a benefit in getting the critical events to the doctors faster,” said Dr. Oermann. “This is a surveillance tool. It will enable doctors to see the most critical events first and ensure they can make the proper diagnosis. Even if they get to see events that are not critical, they will still be able to triage those cases that demand immediate attention.”
“With a total processing and interpretation time of roughly one second, such a triage system can alert physicians to a critical finding that may otherwise remain in a queue for minutes to hours,” the scientists wrote in the paper. “Medical decisions are not made in isolation based solely on imaging, and the diagnostic work of radiologists requires careful tailoring of models to the prior probabilities of disease.”
“Computers can give you a great rough draft and a human can tweak it,” added Dr. Oermann.
He said that there is still a lot of work to do before this kind of modeling is put to use in the hospital. And he noted that this sample comes from one hospital, Mount Sinai. The research team is now working on a way to include a visualization layover classifier that can be used to label areas of activation. They also want to improve in the “quality and nature of the labels,” said Dr. Oermann.
Dr. Oermann said he also learned from his tenure at Google that computers, like people, learn how to cheat at classification tasks. This is one of his worries with artificial intelligence, he explained.
“In computer science, datasets have higher quality control and can be distributed more normally,” he said. “In medicine, we don't have that luxury. Artificial intelligence can catch on to these idiosyncrasies and cheat. This may be why AI is still so bad at making an accurate decision.”
Dr. Oermann's laboratory is funded in part by Intel.
“The idea of neural networks is to process data by modeling biological neural systems,” said Arthur W. Toga, PhD, professor of ophthalmology, neurology, psychiatry and the behavioral sciences, radiology and engineering, and director of the USC Stevens Neuroimaging and Informatics Institute.
“By giving neural networks sample data, it can teach itself sets of rules to derive an answer with other data,” Dr. Toga said. “In theory, it gets better and better at doing this. These strategies have not been that successful with clinical diagnoses from images alone. In this paper, the authors did not focus on computer-aided diagnosis but rather applied a neural network to test whether it could be used to triage cases more efficiently.”
“We will find over time that these computer-aided applications in radiology and medicine will continue to improve accuracy and efficiency,” he added. “A computer can remember everything, and we can't, so comparing past observations with new ones is vastly improved. The question is, when employing computational methods to triage cases how do you want to weigh your errors? It would be better to have false positives rather than false negatives. You want to err on the false positives to identify the cases that need immediate attention. False negatives might result in delays diagnosing something that needs immediate attention.”
“My bias is that I am a neurologist, and the issue is that we do things to make a clinical diagnosis based on history, symptoms, the exam, and imaging. Imaging plays a big part, especially in the acute setting,” said David S. Liebeskind, MD, FAAN, professor of neurology and director of the neurovascular imaging core at the University of California, Los Angeles. “But it is only one part. As the authors of the study point out, a diagnosis is not just based on an image.”
“CT scans are used most commonly to rule out ischemic stroke,” Dr. Liebeskind noted, “but the findings on a scan are often quite subtle. The implications of a wrong diagnosis are quite significant. It is imperative that you have clinical oversight. You don't want to miss something.”
“AI is going to be helpful,” he added. “The study provides a good first step in developing an alert system to add information to the clinical paradigm. But neurologists have to understand that such an algorithm is not a diagnosis by itself. We have to be careful not to let this technology replace what we do as clinical diagnosticians.”
DR. MANU S. GOYAL: “Overall, I am excited about these efforts because I think they will soon substantially improve my abilities as a neuroradiologist and ultimately improve patient outcomes. The field is evolving rapidly, and hopefully we will soon be able to incorporate such tools into our daily practice.”
“The main hope for AI in radiology is that it will improve our accuracy, and unfortunately this study does not reach that goal,” said Manu S. Goyal, MD, assistant professor in radiology and neurology at Washington University School of Medicine in Saint Louis. “However, the study authors found that their machine learning algorithm can help with triaging cases so that patients with critical results get seen sooner. In a busy practice, this could certainly be important since it is possible that patients with a large subdural hemorrhage, for example, will be evaluated for surgical intervention sooner.
“That said, some additional clinical input would still be necessary to triage cases appropriately. For example, it is critical to read head CTs, even if they are normal, on hyperacute stroke patients and level 1 trauma patients as soon as possible, because a normal head CT has immediate importance in deciding on what to do next.”
“Overall, I am excited about these efforts because I think they will soon substantially improve my abilities as a neuroradiologist and ultimately improve patient outcomes. The field is evolving rapidly, and hopefully we will soon be able to incorporate such tools into our daily practice,” Dr. Goyal said.