One of the most common lines of questioning from data scientists today goes something like this: “How do I know if my model is good? Should I just take R2 score? Should I take AUC? What does it mean for a given model to know if it’s good or not?”
According to Jed Dougherty, VP of Field Engineering at Dataiku, the actual best way to decide whether or not a model is good is to have the business interests directly plugged into that decision.
That means instead of:
“My accuracy is very high.”
The question or concern should be something like:
“How much money do I gain (or lose) from each prediction?”
Data scientists who can’t answer whether the business is gaining or losing money from a given prediction, Dougherty says, shouldn’t even think about putting that model into production.
Another component of knowing whether a model is good or not comes down to proper monitoring. Often times when putting together a dataset that will be used to build a model, data scientists think about building the model that one time, but they don’t think about how they’re going to keep track of how well it’s actually doing in the future. How does one get the information — the ground truth — to compare the model against?
It’s a tricky feedback loop that must be built into workflows, but it gets even harder:
- The more rare the thing you’re trying to predict is OR
- The longer it takes for your prediction to be proven right or wrong
So for example, when trying to predict whether somebody will default on a loan, the data scientist might not know for 10 years if the prediction was correct or not. Nevertheless, (s)he must think about how to feed that back into the model.
The bottom line is making sure to work within a framework when building models instead of building in isolation and then figuring out monitoring — or even pushing to production — after the fact.
Note: This blog post is an adaptation of an interview between Roger Magoulas, VP of Radar at O'Reilly Media, Inc., and Jed Dougherty, VP of Field Engineering at Dataiku. Watch the full video below.
Watch the Interview
Catch the 7-minute interview from Strata Data Conference 2019 in New York City: