Managing one model at a time is pretty easy. But how do you go about managing tens of models, or even more? Vincent Gallmann, Senior Data Scientist at French bank FLOA, answered this question in a 2021 Product Days Session on managing data science projects with Dataiku.
Let’s Start With the Problem
Machine learning models are often designed to make predictions about future data. However, over time, many models' predictive performance decreases as a given model is tested on new datasets within rapidly evolving environments. This is known as model degradation.
Unfortunately, unlike wines, predictive models don't get better with age. That is why it is important to refresh models in order to maintain a high performance in the predictions. A static model cannot catch up with patterns that are emerging or evolving without a constant source of new data.
We can identify two main causes to this phenomenon of degradation:
- Changes in the characteristic of new customers
- External factors like COVID-19
Fortunately, every problem has a solution. In our case, this problem has three solutions.
1. Drift Detection
Drift detection is the first way to tell if a model’s performance is degrading. We can face two types of drifts:
The first one is conceptualized: It's the detection of change in statistical properties of the target variable. For example, let's say a company wants to detect spammers. At first, the rule was if the number of messages sent is greater than five, then you're a spammer. But after a few months, the company developed a chat bot. And, as a result, the number of spammers increased drastically, because the number of messages increased. So now the definition of spammer has to evolve.
The second type of drift, the most common type, is data drift: It's about detecting changes in the statistical properties of the independent variables. To do so, you need to compare the training dataset with the new up-to-date data. If the comparison shows distinct differences, then there is a strong likelihood that the model performance is compromised.
But be careful! A high drift does not necessarily mean a decrease in the performance of your model. It depends on how important the feature is in the model. With Dataiku, you can quantify how high the drift is thanks to a plugin that uses a domain classifier approach. It trains the model that tries to discriminate between the original dataset and the development dataset. In other words, it stacks the two datasets and trains a classifier that aims at predicting the data's origin. The performance of the model can be considered as a metric for drift level. Once you have detected drifts on your models, you can trigger two actions using the Dataiku scenario feature. First, you can get an email alert. Second, you can also trigger a model retraining on new data up to date data.
The idea behind AutoML is to automate the process of training a model. Using scenarios in Dataiku, you can set up your model to automatically retrain by including more recent data in your training dataset. This allows you to capture changes in your customers behavior, for example.
With a model version built, the next step is to compare the metrics with the current live model version. This is where a new feature of Dataiku comes into play: Metrics and Checks. It is very useful when you want to trigger an action based on a metric. Here, for example, if the metrics and checks suggest a wide variation between the models, you can think of updating the model on the production environment.
3. A/B Testing
A/B testing is a statistical process in which you can measure the impact of modification on the baseline condition on certain metrics. You do so with a control group, the baseline, a treatment group, the modification, and a metric that you want to follow. For instance, let’s say you want to measure the effectiveness of a vaccine. You will have a control group with no injection and a treatment group with the vaccine injection, and then you can check the cure rate.
In practice, there is often a substantial discrepancy between the offline and online performance of the models. Therefore, it's critical to take the testing to the prediction environment. This online evaluation gives the most truthful feedback about the behaviour of the candidate model when facing real data. Indeed, you have no guarantee that today's customer will behave like those of yesterday.
For instance, let's assume you have a predictive model with predictions for the last six months and you want to test if a new model you made has better prediction than this current one. You can do this by A/B testing. You would deploy both models into the live environment, but split requests between the two models (50/50). You can render the test for as long as necessary and if the results are conclusive and your new model is found to be more effective, you can think of replacing the older model. This strategy is easy to set up with Dataiku as you can choose the parameters ( or the probabilities) and Dataiku will log the results from the two models’ analysis.