Machine Learning in Medicine

Photo of Man for Machine Learning Article

MUSC is developing machine learning algorithms that harness big data to transform patient care

By Kimberly McGhee

The avalanche of medical data now available to clinicians and researchers represents an unprecedented opportunity to transform health care and realize precision medicine. High-throughput screening of patient samples yields multiomics data (e.g., genomics, proteomics, glycomics), the electronic health record (EHR) provides information on clinical care and history, and wearables provide real-time updates on a patient’s activity and vital signs. However, the sheer volume of these data is daunting and can overwhelm and frustrate clinicians.

Machine and deep learning, subsets of artificial intelligence (AI) that have been waiting in the wings for decades, are now taking center stage because they cannot only handle these burgeoning datasets, they need them if they are to continue to grow “smarter.” In supervised machine learning, for example, labeled pairs of inputs and outputs are used to train an algorithm, which infers the relation- ship between the pairs and predicts outputs for new input values.

In unsupervised learning, the algorithm is provided unstructured, unlabeled data and is left to its own devices to classify and order the data. Eventually, however, machine learning’s ability to learn from the data plateaus. Deep learning avoids that “plateau” effect, continuing to grow smarter as it parses more data by transforming them through a series of layers or nodes before integrating them again. Machine and deep learning are widely thought to hold the key to unlocking the promise of big data. In their book Deep Learning, Ian Goodfellow and Aaron Courville contend that machine learning, along with deep learning, are “the only viable approach[es] to building AI systems that can operate in complicated real-world environments” precisely because they can learn from experience.1

Recent achievements by deep learning algorithms lend credence to that assessment. When trained on appropriate datasets, deep learning algorithms have referred patients for more than 50 sight-threatening diseases as well as a panel of ophthalmologists,2 distinguished malignant melanomas from benign moles with better sensitivity than dermatologists (95 vs 88 percent)3 and detected metastases in sentinel lymph nodes better than pathologists.4 None of these algorithms is intended to replace clinicians but rather to sup- port them in detecting disease earlier so that appropriate treatment can begin as soon as possible.

An electronic safety net

At MUSC, a data science team in MUSC’s Information Solutions, including Chief Data Officer Matthew Turner, Director of Analytics Brady Alsaker and lead data scientist Matthew Davis, uses machine and deep learning to predict patient risk and outcomes. Their analytics predict which patients are at highest risk of dying while in the hospital, of being readmitted or of visiting an ER. They can also identify — in real-time — patients who are likely to deteriorate or to develop sepsis, so that preventive measures can be taken.

“We see this as an AI safety net,” says Turner. “The model is able to look at thousands of features in real time versus a human being that looked at 40 or 50 variables. It looks at all of those micro-correlations in real time and puts together a risk prediction of how likely it is that this patient will descend into sepsis.”

The data scientists work closely with academic researchers in the Biomedical Informatics Center (BMIC) to develop their predictive analytics. These researchers are drawing on diverse data, including the unstructured data residing in clinicians’ notes in the EHR, to develop predictive analytics for precision medicine; predict patient preference; decide whether a patient would benefit from a targeted therapy; and revolutionize the way that research is conducted. Chief Research Information Officer Les A. Lenert, M.D., directs BMIC, which is home to the biomedical informatics program of the South Carolina Clinical and Translational Research Institute. “We want any investigator on campus to know that they can come to us to build a predictive analytics solution and test it at the point of care using Epic,” says Lenert. “And we also want to make it possible for researchers to use the health system as a tool or environment for testing algorithms for predictive analytics.”

Unlocking the power of clinical notes

To be robust, predictive analytics must tap into the richest data in the electronic health record — those that reside in clinicians’ notes. Traditionally, however, computers would be able to “read” only structured data that had been entered as code. Natural language processing, another subset of AI, enables computers to begin to understand human language. This is important because clinicians, pressed for time, may not do an exhaustive search of the EHR for a relevant clinical detail, such as an allergy, and instead rely on the latest clinical note, which may or may not be comprehensive and accurate. If data relevant to allergy were immediately available via a clinician dashboard, they would be far less likely to be missed. Stephane Meystre, M.D., Ph.D., an associate professor at BMIC and in the Departments of Public Health Sciences and Psychiatry and Behavioral Sciences, has developed a program that he hopes will one day be able to do just that. “This is not simple keyword searching,” explains Meystre. “It involves syntax and complex semantics.” In other words, the algorithm can recognize that a word’s meaning is changed by its context. For example, it must recognize negation so that it does not mistakenly attribute to the patient diseases excluded by the differential diagnosis in the clinician’s note.

To teach the program about what information to extract and con- text, Meystre asks specialists to read clinicians’ notes and annotate the information they want to be able to find (e.g., noting that “denies chest pain” is a symptom that is negated). Those annotations are then used to train the algorithm what to look for in clinical notes. The resulting data can then be presented to physicians in a dashboard for validation. They can also be used for research, quality improvement, or other secondary uses of clinical information.

Natural language processing could also improve voice recognition, perhaps enabling the creation of a digital “medical scribe” that could record what was said by whom during a patient visit and enter those data into the EHR, to be later validated by the physician.

Precision medicine bioinformatics

According to Lewis J. Frey, Ph.D., an associate professor in the Department of Public Health Sciences and a member of BMIC, precision medicine, at its simplest, is “wrapping data around a patient at an individual level to improve outcomes.” Precision medicine informatics, the field in which Frey specializes, analyzes all of that information in order to recommend optimal treatment that is personalized for each patient. The patient’s response to a certain treatment can be “predicted” based on how well groups of patients with similar features have responded to that therapy in the past. On the basis of these predictive analytics, patients can be pointed toward the therapies most likely to help them and be spared the rigors of therapies that likely will not.

Frey also helped develop a predictive analytics platform – the Clinical Personalized Pragmatic Prediction of Outcomes (C3PO) – that was used at Ralph H. Johnson VA Medical Center to cluster patients with diabetes who were undergoing knee or hip surgery by A1C levels and predict outcomes for each group. Because it was designed as infrastructure-as-code, it could then be reused and expanded to examine how chronic stress affects prostate cancer outcomes for the MUSC Transdisciplinary Collaborative Center (TCC), the goal of which is to support precision medicine in
minority men’s health. Infrastructure-as-code enables the computer infrastructure required to run a predictive analytics algorithm to be cloned in the cloud or on as many computers as are necessary at a new institution. “Infrastructure-as-code is a new way of programming architectures,” explains Frey. “You write the commands that create the machines that run the analyses. You set up all the scripts and then it instantiates the machine as needed in the new environment. So you have coded the infrastructure.”

Predicting patient preference

Lenert and Brian Neelon, Ph.D., associate professor in the Department of Health Sciences, are working together to create a collaborative filtering recommender system for patient treatment preference, much like those that Netflix or Amazon uses to recommend you new products based on the past purchasing history of people like you. There are three stages to developing such a system: first, eliciting patient preferences via surveys; second, clustering patients into preference phenotypes based upon their responses; and, third, using satisfaction and quality-of-life data from patients in the same preference phenotype who have already undergone treatment to recommend therapies that would be most aligned with their preferences. “Eventually you can envision developing an app for physicians and patients to use in clinical settings,” says Neelon. “Give patients a small survey instrument to help group them, load that into a system while they are waiting and then review recommendations with them during the clinic visit or at a future visit.”

Enabling citizen science

The ubiquity of smart phones and watches that can continuously track heart rates, steps taken and other health-relevant data could revolutionize clinical trials, making citizen science feasible. Compared with traditional trials that rely on study coordinators at medical centers to enroll patients from the surrounding region, citizen science trials aim to virtually enroll large numbers of participants from across the country. The much larger datasets that result from these high- enrolling trials could yield new insights into diseases and how best to treat them.

Christopher Metts, M.D., an assistant professor of Pathology and Laboratory Medicine and member of BMIC, has created a software platform that is being used to facilitate citizen science. The platform weds Apple’s ResearchKit and HealthKit to REDCap, a widely available, HIPAA-compliant data collection and management tool for researchers, via an API, basically creating a secure repository for data collected from health care apps. “By marrying those three pieces of software together, you get something that is greater than the sum of its parts,” says Metts. Using his platform technology, Metts has created more than 40 apps for specific health care use cases for a fraction of the cost one such app typically costs to develop.

At the request of Chief Nursing Officer Jerry Mansfield, Metts used one of those apps, RN Wellness, to shed light on nursing burnout and turnover. Interested nurses were asked to answer screening questions and give informed consent on the iPhone app and then were required to “virtually” enroll in two of four available studies. Each was given an Apple watch to continuously track heart rate. At the end of each shift, participants were asked to answer a single-question survey about their stress level, and at the midpoint and endpoint of the study to complete more detailed surveys. The goal of the study was to “quantify” the stress that leads to burnout so that measures can be established for assessing the efficacy of burnout interventions in the future.

Metts hopes that the studies will also demonstrate that his platform technology can enroll and track patients in multiple clinical trials at once. Metts’ goal is to eventually make thousands of studies available via his platform. As participants answer screening questions for one study, their answers could trigger other studies being recommended to them. Although primarily a research tool for now, Metts hopes his platform technology, which rigorously anonymizes its data, could be used to implement evidence-backed interventions as well.

Future state

In the not-too-distant future, artificial intelligence, including machine and deep learning and natural language processing, is poised to begin transforming medicine. To realize its promise, many more data scientists with expertise in health care will need to be trained. MUSC and Clemson University have begun jointly offering a doctoral program in data sciences, directed by Alexander V. Alekseyenko, Ph.D., an associate professor in the Department of Public Health Sciences and a member of BMIC. The program, which welcomed its second cohort of students this year, provides them training in mathematical and statistical modeling, “hacking” skills that enable them to implement algorithms, and an understanding of the data sciences needs of medicine. This joint Ph.D. degree in Biomedical Data Science and Informatics will help ensure that South Carolina has the workforce it needs to remain competitive in the health care of tomorrow and to make available to its citizens the
many medical breakthroughs that machine and deep learning are predicted to bring.


1. Goodfellow I, et al. Deep Learning. Cambridge, MA: MIT Press, 2016.
2. De Fauw J., et al. Nat Med. 2018 Sep;24(9):1342-1350.
3. Haenssle HA, et al. Ann Oncol. 2018 Aug 1;29(8):1836-1842.
4. Ehteshami Bejnordi B, et al. JAMA. 2017 Dec 12;318(22):2199-2210.