Get Started

Decrease Frustration and Increase Anticipation With AutoML

Dataiku Product, Featured Marie Merveilleux du Vignaux

During the 2021 Dataiku Product Days, Krishna Vadakattu, senior product manager at Dataiku and Christopher Peter Makris, lead data scientist at Dataiku, presented some of Dataiku’s AutoML features and applied them to a predictive analysis use case. This blog post will walk you through this case study to give you an idea of how AutoML can increase your anticipation abilities.

→ Check Out This 2021 Dataiku Product Days AutoML Demo: Data Best Practices &  Guardrails for Beginners

Before I Keep Reading, How Complex Will This Get?

With the AutoML features of Dataiku, the complicated intricacies of data modeling become quick and easy. No matter if you're a seasoned data scientist or diving into data analysis for the first time, Dataiku is there to help guide you every step of the way.

Christopher Peter Makris collaborated with teammates in a Dataiku flow to predict whether flights leaving the airports of New York City will be delayed by more than 15 minutes. To keep things from getting overwhelming, the team kept the flow organized by incorporating flow zones and tags.

flow zones on dataiku

Plus, the Dataiku guided AutoML is built right in. So you can easily follow what is happening.

How Do I Model Data on Flights We’ve Already Seen?

This is exactly where the guided AutoML comes into play with any dataset. You can easily perform both supervised predictions and unsupervised clustering by accessing the Dataiku lab for predictions.
First, you type in your target variable. Then you can select options for quick prototypes, interpretable models for business analysts, and high performance models.

How Do I Know Which Option to Use? How Can I Compare and Evaluate All the Models I Built?

Dataiku makes it easy to design, build, and compare models through an intuitive user interface. It's nicely organized too. Especially when you've got a lot of different sessions.

How Do I Configure Each of These Algorithms?

You can head over to the design tab to find a step-by-step walkthrough of all the different tools at your disposal to help you construct models. Even better, you can do all of this without having to write a single line of code! This tab is also where you can choose the metrics against which to optimize your models and deal with feature handling and generation. You can tell Dataiku which variables you want to include or exclude, and even how to rescale them or input missing values.
Each algorithm operates in its own special way and has different parameter choices available. Dataiku enables you to easily explore the various parameter options one at a time and even learn more about how each algorithm works.

Alright, I’m Ready to Start!

Once you’re ready to go, Dataiku will train your customized models and help you compare them based on runtime and various other aspects. Dataiku keeps everything organized by sessions, individual models, and even an evaluation table of metrics.

You can then use Dataiku’s AutoML features to analyze what’s actually driving your predictions. These features make results easy to digest by breaking the details down into smaller parts and helping us investigate our model from both interpretation and performance standpoint.

We can see here that for a random forest, the average arrival delay is strongly associated with whether or not the next flight will be delayed.

With Dataiku 9, you can also test out assumptions and see predictions for different scenarios with a new interactive scoring feature. You can ask various hypothetical questions about your outcomes and see how the predictions may change as your data changes. Kind of like asking “What If?”

You could ask,”What if my destination city is Boston? What if it's Atlanta? How about if it's Denver? Well, how do my predictions vary?” You can quickly add each of these scenarios to your comparator and investigate how the different predicted outcomes may vary. In this case among these three destination cities, the comparison says that Denver has the highest likelihood of delay, whereas Atlanta has the lowest.

There you go! You built yourself a strong predictive model and know how to interpret and analyze these predictions. You can now apply these skills to your very own use case.

You May Also Like

Who Should Deploy My Data Science Models?

Read More

Navigating the Data Provider Jungle

Read More

Tackling Dataiku’s Carbon Emissions with Tēnaka

Read More

How to Measure AI Maturity & Value

Read More