Explaining AutoML: What It Is and How Dataiku Can Help

Dataiku Product, Scaling AI Lauren Anderson

Up until the past several years, the process of developing machine learning (ML) models was limited to a handful of experts with specialized skills and deep knowledge. With advancements in automated machine learning (AutoML), the barrier to entry into ML lowered and, with just a basic understanding of key concepts, less technical profiles can now get involved in model creation. Not only that, data scientists are able to accelerate projects and impact through automation of key, time-consuming elements associated with ML model creation. Ultimately, AutoML is a key piece in enabling companies to scale AI projects and increase business impact.

What Is AutoML?

In ML, there are usual sets of steps involved in model creation, from processing a dataset to evaluating models prior to deployment. The goal of AutoML is to automate and simplify key steps of the ML model development process — from beginning to end — to both gain process efficiencies while opening up ML to non-ML experts. 

You can think of the difference between the traditional ML process like the creation of a self-portrait by an expert painter versus someone using an AI photo generator like the currently trending My Heritage AI Time Machine. Both might have the same goal — create a self-portrait  —  but in the traditional process, a painter has to painstakingly combine reference images with selection of the right material, paints, colors, and techniques to build a portrait. 

In the other, someone with little knowledge of portraiture can input reference images and get out a similar stylized image. Both could potentially be considered works of art, both are self-portraits, but one is created in seconds with little understanding by the “artist” and the other created in days or weeks by someone who can explain every detail and step. And, just like the auto-generated image will be sufficient in many cases (like sharing on social media), while the portrait will be required in others (like hanging on your wall), AutoML will not be the answer for every ML model.

Thanks to this Dataiku webinar for the original painting metaphor. Check it out!

Common Limitations of AutoML

While AutoML is a great way to accelerate work, there are usually limitations to what AutoML tools are able to do (and where you need to bring in an expert artist aka data scientist to build a fully customized solution). Some common limitations and criticisms include: 

    • Lack of precision and flexibility: AutoML tools can be limited in the level at which they can be customized, creating a lack of flexibility that could impacts model performance in some situations.
    • Lack of explainability: Often to comply with internal controls, regulatory reporting, or to ensure a lack of bias, many steps of the ML process are scrutinized. With AutoML, users may not be able to explain these steps or fully understand the impact of different variables. 
    • Lack of control: In giving non-experts access to ML, companies may feel like they’re giving up control for scale. How do leaders ensure that ML models done in AutoML are up to par before they’re deployed and have business impact?  
    • Most of the work is elsewhere: Often AutoML can be isolated from broader projects with difficulty connecting together frameworks and tools for different parts of the ML process (ex. Data preparation, analysis, deployment, governance, etc.). This makes it difficult to see the bigger picture and can create silos that impact performance.

AutoML in Dataiku

Dataiku’s capabilities help to solve many of the common challenges associated with AutoML, while also empowering expert users to build complex models in the same solution. 

Easily Customizable: To extend the painting metaphor, with Dataiku you can create an AI-generated image, but with many digital filters and brushes to edit the portrait into what you need. AutoML in Dataiku provides automatic feature generation and reduction techniques and applies handling strategies for feature selection, missing values, variable encoding, and rescaling based on data type. Accept the default settings or easily modify any part for your specific objectives. You can easily choose the level of AutoML you’d prefer based on your level of technical understanding — whether prioritizing speed, interpretability, or performance.

AutoML template

Choose the AutoML template based on your objectives and regulatory environment.

Interpretability and Explainability: Dataiku features various ways to probe into models and understand variable importance and subpopulation impact, all in a visually guided way that empowers non-technical users while saving time for seasoned data scientists.

IndividualPredictionExplanations_SHAP

Individual predictions allow you to explore important features for a single variable, just one of several built-in interpretability options.

Built-In Controls: Guardrails like debugging and built-in assertions alert you if your model behaves unexpectedly, while automated model documentation outlines what the model does, how the model was built (algorithms, features, processing, etc.), how the model was tuned, and performance.

Collaboration and Connection to Broader Work: It’s easy to connect to the bigger picture and establish best practices with the collaborative visual flow, which shows everything that’s occurred along the data pipeline. Search for previous relevant projects using Dataiku’s global search functionality to ensure the reuse of best practices. 

SimpleFlow_fullscreen_postgres

You May Also Like

Conquering the Data Deluge Through Streamlined Data Access

Read More

I Have Databricks, Why Do I Need Dataiku?

Read More

Dataiku Makes Machine Learning Accessible, Transparent, & Universal

Read More

Explainable AI in Practice (In Plain English!)

Read More