Big Data: Health Care’s Knight in Shining Armor?

organization| healthcare| business | | Lynn Heidmann

For years, people have been talking about big data’s potential to change health care, particularly in the United States. But with days growing shorter again, the end of the year looms closer, and it seems another year will pass without any major progress.


The amount of news about health care in the last few months leaves us wondering: where’s the big data health care revolution we’ve all been dreaming of? And will it really solve all our problems?

To be sure, U.S. health care faces a host of issues, and it’s a very complicated topic (to put it mildly). But we have more data in health care than ever before! Even so, it might be unrealistic to think that big data will suddenly have a noticeable, sweeping, save-the-day impact rather than small, more subtle victories. But let’s take a look at where we are today. We'll start with the good news.


The good news is that there are already some areas of medicine and health care benefiting immensely from big data, machine learning, and predictive analytics.

There are some areas of medicine and health care that are benefiting greatly from big data and machine learning, particularly in the area of clinical trials. For example, this year the Mayo Clinic started using machine learning to better match patients to clinical trials, a process that allows them to match in less than 10 minutes. This process was previously done manually and could take 30 minutes or more.

Also, there continue to be promising “medtech” startups that are leading the way in leveraging machine learning and artificial intelligence in health care. This is exciting, with a slight caveat that many of them are very focused on particular issues or conditions rather than the health care system as a whole, which is still a great thing, but - baby steps.

So even as health care remains top of mind (an August Gallup poll reveals that 17 percent of Americans believe that, among all issues we’re facing, health care is the largest) there’s still a lot of work to be done.

What's the Holdup?

Transforming an entire industry, especially one as complicated and involving as many parties with their own vested interests as health care, is a huge undertaking. Costs are rising everywhere, and health care is taking up a greater share of the U.S. economy than ever before. It seems there are so many issues that it’s proving difficult to make changes. But more specifically, big data is lagging behind in making changes in health care when compared to other large industries with sensitive data (like finance) because:

1. Lots of Health Care Data Is Unstructured.

The vast majority of health care data, some estimate up to 80 percent, is unstructured. Think patient notes plus claims notes, images from procedures, even scholarly texts that would help physicians. Unstructured data is more difficult to work with on a large scale than structured data - picture the difference between having the text of 100 emails from your inbox to analyze vs. 100 rows of Excel data. While it’s certainly possible to work with unstructured data and derive meaningful insights as well as machine learning, it is more challenging, and that’s a barrier for the health care industry.


Data in health care is largely unstructured, which is a barrier to overcome in the big data revolution

The industry has recognized this and made some attempts to rectify it by forcing data into structured formats - think check boxes instead of open text boxes for clinicians. Unfortunately, this has had some negative effects with regard to data quality. Medicine isn’t black and white, and when forced to make it so, data quality can suffer. For example, if forced to choose between two check boxes where neither is technically correct, the clinician will choose one to move on with his or her day. Poorly designed systems can mean this happens enough to render structured data inaccurate.

2. Information is Disparate.

Within a single provider or health care company, data is everywhere, stored in different places, systems, etc. Some are working to rectify this and unify data into one data lake.


Health care data comes from a variety of sources stored in many differerent places, making it difficult to get insights

But that doesn’t really resolve the problem because on top of that, people move around, leaving data covering medical history even more disparate. Partially due to sensitivity of medical data, there isn’t a central place where data about one patient sits, so the ability to use big data to uncover insights or predictions about specific people in an impactful way is extremely limited.

Additionally, the fact that data is stored all over the place means more overhead in administration and paperwork for providers, further putting them behind in efficiency. As an aside, in case you didn’t hear, France recently launched an initiative for a centralized health care database system that’s very exciting and could, maybe, pave the way for others to do the same.

3. Lack of Real Time Data Analysis.

Perhaps due to the previous two items, lots of unstructured data siloed in different places, there is a lack of emphasis on real time data analysis in health care. Until we stop thinking retroactively and start thinking predictive, machine learning, and (eventually) artificial intelligence (AI), big data in health care will continue to gain only small victories.

If you want to learn more about how to best leverage data in health care, including use cases for predictive analytics, download our white paper or check out this case study on predicting patient no-shows that helped Intermedix achieve significant cost savings in one month.

Dataiku Production Survey Report

Other Content You May Like