Big Data: Healthcare’s Knight in Shining Armor?

Use Cases & Projects Lynn Heidmann

For years, people have been talking about big data’s potential to change healthcare, particularly in the U.S. But with days growing shorter again, the end of the year looms closer, and it seems another year will pass without any major progress.

knight in shining armor

The amount of news about healthcare in the last few months leaves us wondering: where’s the data-driven healthcare revolution we’ve all been dreaming of? And will it really solve all our problems?

To be sure, U.S. healthcare faces a host of issues, and it’s a very complicated topic (to put it mildly). But we have more data in healthcare than ever before! Even so, it might be unrealistic to think that big data will suddenly have a noticeable, sweeping, save-the-day impact rather than small, more subtle victories. But let’s take a look at where we are today. We'll start with the good news.

knight in armor GIF

The good news is that there are already some areas of medicine and healthcare benefiting immensely from data, machine learning, and predictive analytics.

There are some areas of medicine and healthcare that are benefiting greatly from big data and machine learning, particularly in the area of clinical trials. For example, this year the Mayo Clinic started using machine learning to better match patients to clinical trials, a process that allows them to match in less than 10 minutes. This process was previously done manually and could take 30 minutes or more.

Also, there continue to be promising “medtech” startups that are leading the way in leveraging machine learning and artificial intelligence in healthcare. This is exciting, with a slight caveat that many of them are very focused on particular issues or conditions rather than the healthcare system as a whole, which is still a great thing, but — baby steps.

So even as healthcare remains top of mind (an August Gallup poll reveals that 17% of Americans believe that, among all issues we are facing, healthcare is the largest) there’s still a lot of work to be done.

What's the Holdup?

Transforming an entire industry, especially one as complicated and involving as many parties with their own vested interests as healthcare, is a huge undertaking. Costs are rising everywhere, and healthcare is taking up a greater share of the U.S. economy than ever before. It seems there are so many issues that it’s proving difficult to make changes. But more specifically, data and AI is lagging behind in making changes in healthcare when compared to other large industries with sensitive data (like finance) because:

1. Lots of Healthcare Data Is Unstructured

The vast majority of healthcare data, some estimate up to 80%, is unstructured. Think patient notes plus claims notes, images from procedures, even scholarly texts that would help physicians. Unstructured data is more difficult to work with on a large scale than structured data — picture the difference between having the text of 100 emails from your inbox to analyze versus 100 rows of Excel data. While it’s certainly possible to work with unstructured data and derive meaningful insights as well as machine learning, it is more challenging, and that’s a barrier for the healthcare industry.

structured vs. unstructured data

Data in healthcare is largely unstructured, which is a barrier to overcome in the big data revolution

The industry has recognized this and made some attempts to rectify it by forcing data into structured formats — think check boxes instead of open text boxes for clinicians. Unfortunately, this has had some negative effects with regard to data quality. Medicine isn’t black and white, and when forced to make it so, data quality can suffer. For example, if forced to choose between two check boxes where neither is technically correct, the clinician will choose one to move on with his or her day. Poorly designed systems can mean this happens enough to render structured data inaccurate.

2. Information Is Disparate

Within a single provider or healthcare company, data is everywhere, stored in different places, systems, etc. Some are working to rectify this and unify data into one data lake.

data sources and data sharing

Healthcare data comes from a variety of sources stored in many different places, making it difficult to get insights

But that doesn’t really resolve the problem because on top of that, people move around, leaving data covering medical history even more disparate. Partially due to sensitivity of medical data, there isn’t a central place where data about one patient sits, so the ability to use data to uncover insights or predictions about specific people in an impactful way is extremely limited.

Additionally, the fact that data is stored all over the place means more overhead in administration and paperwork for providers, further putting them behind in efficiency. As an aside, in case you didn’t hear, France recently launched an initiative for a centralized healthcare database system that’s very exciting and could, maybe, pave the way for others to do the same.

3. Lack of Real Time Data Analysis

Perhaps due to the previous two items, lots of unstructured data siloed in different places, there is a lack of emphasis on real time data analysis in healthcare. Until we stop thinking retroactively and start thinking predictive, machine learning, and (eventually) artificial intelligence (AI), data usage in healthcare will continue to gain only small victories.

You May Also Like

Taming LLM Outputs: Your Guide to Structured Text Generation

Read More

No-Code ML and GenAI With Dataiku and Fabric

Read More

The Objects of an LLM Mesh for Building LLM-Powered Applications

Read More

Data Lineage: The Key to Impact and Root Cause Analysis

Read More