Put Your Data to Work in Health Care

Use Cases & Projects Romain Doutriaux

Quantified self, population health, patient engagement, telehealth, interoperability… the health care IT industry is buzzing with plenty of opportunities but is missing a few basic standards to implement them. 

Here, we'll look at the complete methodology for developing data projects in the health care industry, specifically looking at the use case of no-show patients.  gif stormtrooper

Let’s get rid of it, bro

Step One: Order Out of Chaos; Collecting & Making Sense of Data

1) Define your Goal(s)

In order to keep costs within budget and to realize feasible results, it is necessary to specifically define the project goal. For this example, our goal is to score the likelihood of patient no-shows in real time. The scoring would be used to identify high-risk patients and schedule the best time slots for them in order to decrease the likelihood of subsequent no-shows.

2) Collect Historical Data (Appointment Dataset)

In order to create an algorithm, the predictive analytics solution needs to work with data. If possible, provide three months worth of historical show/no-show data; if that's not possible because historical data of this type is not available, you may need to collect this data for three months before beginning the predictive modeling process.

3) Gather Workable and Clean Datasets

Next, we need to determine the datasets that will be used to establish patient scoring. In other words, the factors that will determine whether or not a patient is likely to appear for a given time slot. Some possibilities include:

  • Appointment Dataset: Historical data of shows and no-shows
  • Patient Datasets: Age, location, health problems, diseases, children, status, etc.
  • External Sources: Social mapping of geographic area, transportation data, disease classification (i.e., effect of disease on the patient’s lifestyle — for example, wheelchair-bound? mobility? capabilities? limitations?), holiday calendars, weather, and so on

gif sherlock
Just like Sherlock, ask yourself the right questions

Some key questions to answer: how frequently are these datasets updated? Are they automated? Is accurate and up-to-date data available?

4) Combine and Clean your Sources

It is common for datasets to be available in different formats (.xls, calendar files, etc.), so one of the challenges of data collection will be shaping them all in a common processing-friendly format.

gif snow white
Dataset cleaning, cobblestones polishing: a boring life

 

Step Two: A Predictive Model to Test your Hypothesis

1) Highlight and Pinpoint Distinct Features

The process of building a predictive model involves a series of normalization and optimization steps designed to determine model accuracy. Some key steps in this process include feature normalization, testing and optimization of models, determination of model accuracy, and the specification of a user strategy. After the model is defined, the data scientist needs to overfit the model, evaluate, and ultimately validate it in order to isolate features.

The determination of accuracy is done by testing the underlying strategy in practice; for example, given patients who are likely to appear for a specific time-slot, do they actually show up as expected? How accurate is the time slot scoring for patients who do appear? If overbooking is implemented, is it being applied correctly? These questions all need to be addressed in order to determine the accuracy of the underlying analytical model — this involves comparing real-world results with the relevant predictions. This level of additional analysis will enable a data analytics solution to further refine the model’s accuracy, if needed.

Dataiku DSS on computer screen
Save a Data Scientist; use Dataiku Data Science Studio (DSS)

Of course if you are using an advanced software analytics solution, then many of the above steps would be automated. It would be able to clean datasets, isolate specific features, and automatically score the likelihood of patient no-shows.

2) Train Machine Learning Models on Test Datasets

If new features are added, then the models need to be re-trained. Additionally, data visualization needs to be done in order to determine if the features are relevant.

 

You May Also Like

Taming LLM Outputs: Your Guide to Structured Text Generation

Read More

No-Code ML and GenAI With Dataiku and Fabric

Read More

The Objects of an LLM Mesh for Building LLM-Powered Applications

Read More

Data Lineage: The Key to Impact and Root Cause Analysis

Read More