Fail Fast Without All the Annoying Technical Debt

Dataiku Product, Scaling AI Doug Bryan

A company recently told me that their AI initiative was very successful. After three years, it had zero project failures and their team of seven data scientists put two AI use cases into production each year. Is that failure rate too low?

Suppose your organization has 1,000 potential AI use cases and 25% of them have the potential for significant positive ROI. How do you find the good ones at a decent cost? Too many companies start with a handful of use cases that they’re confident are winners, but they lack the experience to justify that confidence. They put all their resources into a few use cases and often take years to find ROI. Many AI leaders take an alternative, lean, fast-fail approach. The IEEE micromouse competition from the previous (third) industrial revolution provides an interesting analogy.

1970s Micromouse Competition

micromouse competition

In the late 1970s, microprocessors enabled automation in many industries and use cases, just like AI today. To promote innovation, the Institute of Electrical and Electronics Engineers (IEEE) held a competition where teams built small autonomous robots called micromice that ran a maze they’d never seen before and the team with the fastest run won. Each team got multiple runs. There were 6,000 entrants from industry, government, and academia. (Micromouse maze running competitions are still held today, 45 years later. See YouTube for some amazing winners.) 

IEEE hoped that mice would be developed to “learn” the maze on early runs and then go through the shortest path in later runs, illustrating the power of microprocessors. It didn’t turn out that way.  

A “dumb” mouse from Battelle Institute won by simply hugging the right-hand wall and going very fast. Consider the maze above. The right-hand-wall approach makes eight “wrong” turns and travels 120% further than the shortest path. Yet it’s optimal if deciding not to make eight wrong turns takes longer than making them. Same with AI use cases: It’s faster to try many than to try to learn the best few. Dataiku has many platform features to support that approach.

Fail-Fast Development Features

Early adopters of AI often developed crawl-walk-run plans and now have hundreds of use cases in production. We’re now increasingly working with pragmatic, late adopters. One told me that crawl-walk-run is too fast for him so he uses roll-over-crawl-stand-up-walk-run per use case. Dataiku has platform features that support this approach without writing a lot of expensive, opaque code you have to maintain before you know whether it’s worth it. Here are examples.

Roll-Over: The first question that's often asked about a use case is, “How feasible is it to develop a training dataset?” Use our data catalog and feature store to find what’s available. Explore it in minutes with no-code statistical summaries, charts, and dashboards. Save any of them for future refresh with just a few clicks. Chat with data owners and users to answer questions. If it looks like there’s enough data then quickly develop a no-code data pipeline to aggregate it into a training dataset, ignoring quality and performance for now. How many rows and features are there? How many rows are labeled? Run a quick AutoML experiment to see if an accurate model is feasible. 

Dataiku automatically picks a few algorithms (Lasso regression, random forests, XGBoost, etc.) for the experiment and you can add (or remove) others if you like. If model accuracy is too low, then collaborate with subject-matter experts (SMEs) to create more features. SMEs don’t need to know R, Python, or SQL to help out. They can easily add to your no-code pipeline. If accuracy is still low, then estimate the cost of labeling more rows. Lastly, review what the model found with our built-in feature importance charts to verify that it makes sense and then make a go/no-go decision.

Crawl: Get the model in front of some real users. Write a wiki explaining the model and register it in the model catalog. For users who prefer real-time scoring, create an API and push it to production in minutes. If others prefer interactive apps, then create a no-code app that anyone can use. Monitor usage, get user feedback, and estimate the value generated. Make another go/no-go decision.

Stand Up: Significant ROI looks possible, so systematize what you’ve developed so far and push for adoption. Automate the data aggregation and training pipeline to run periodically or to be triggered when new training data is available. Add simple metrics and checks to the pipeline to ensure quality, such as the maximum age of training data, minimum number of rows, and maximum null value rates. Set up alerts to be sent via email, Slack, and Teams when checks fail. Add statistical notebooks and dashboards to the project so new users can quickly understand it. Evangelize it, track usage, and if value generation still looks good then on to the next phase.

Walk: Harden the pipeline by adding more metrics and checks. Experiment with new features to improve model accuracy. Hand-tune the model by manually setting some hyperparameters. Run A/B or champion/challenger tests in production to measure model improvements in the real world. Estimate value and if it’s still generation acceptable ROI. 

Run: Push for widespread adoption by developing a sophisticated app in Dash, Shiny, Bokeh, Streamlet, Custom Flash + HTML, or Javascript. Add data source syncs and select the infrastructure to use (Spark, Snowflake, Snowpark, etc.) to improve efficiency or manage costs. Continuously monitor input data drift and trigger alerts or model retraining when your thresholds are exceeded. Continuously monitor model accuracy and do the same. Identify sensitive subgroups that could yield model bais that you care about. Monitor model accuracy and feature importance across those subgroups and, again, retrain or alert when thresholds are passed. Create visual, user-friendly dashboards for new users to understand the model, document the value generated in the model catalog, and evangelize them.

Dataiku, Like Battelle's Micromouse, Allows You to (Safely) Go Fast

Dataiku’s full lifecycle platform for systematizing AI allows teams to manage tech debt every step of the way so that cost doesn’t significantly surpass value. It allows teams to become product focused rather than project focused, to generate value in order to get development resources rather than getting development resources to generate value. That deceptively simple switch drives big AI ROI.

You May Also Like

Large Language Models in Banking: What Are They Good For?

Read More

Building a Modern AI Platform Strategy in the Era of Generative AI

Read More

The AI Governance Challenge: How to Foster Trust

Read More

5 Surprising Stats About the State of AI Today

Read More