How AutoML Accelerates Model Creation

Dataiku Product, Scaling AI Dairenkon Majime

This is a guest article by Dairenkon Majime. Majime is a data scientist intern, a data facilitator and mentor for bootcamps at Tera, and a dedicated writer who is always looking for new challenges. He writes about MLOps, data science, and machine learning.

Automated machine learning, or AutoML, refers to tools and services that abstract the expertise required for machine learning (ML) by automating the operations necessary to make ML happen. The procedures usually include standardizing data and feature engineering, training various combinations of algorithms’ hyperparameters, and assessing and comparing outcomes. According to the research paper “AutoML to Date and Beyond: Challenges and Opportunities,” AutoML is essentially a paradigm for automating the application of ML to real-world problems. Although automation and efficiency are some of AutoML’s main selling points, this process still requires human involvement.

AutoML intends to systemize access to analytical tools for data professionals by providing tools that need little to no coding while helping advanced data scientists get quicker answers in some scenarios.

Whether a data scientist or a nonspecialist, the corporate world values speed. It takes time and resources to build a model using the traditional method, which often requires expensive time and resources for simple tasks such as feature handling and experiment tracking. Dataiku’s AutoML capabilities provide resources to streamline those tasks so you can focus on higher-value work, such as evaluating and comparing model performance, identifying and correcting potential model bias, and interpreting results for the business.

AutoML Versus Traditional Methods

Dataiku’s AutoML accelerates every stage of developing an ML model. When coding models manually, there are many steps you should include in your code to follow sound methodology, including: 

  • Feature selection and handling
  • Feature generation or reduction
  • Train-test and validation sampling definition
  • Optimization metric selection
  • Algorithm selection
  • Run time engine and compute infrastructure setup
  • Model training
  • Model validation (for example, cross-validation with test samples)
  • Hyperparameters tuning and optimization
  • Performance metrics analysis
  • Model interpretability and assertions analysis
  • Model comparisons
  • Report and documentation

As you can see, coding and performing each of these steps by hand requires a lot of time, concentration, and organization and repeats many of the steps each time you consider a new model. When a data scientist begins a new project, it’s essential to rapidly create an understanding of its viability because the primary purpose of the expert in a business is to deliver value. Performing rapid prototypes using automated ML technologies at this early stage helps estimate the project's potential cost, risks, and future value.

The models created in this manner can serve as a baseline, providing the lowest possible performance from which an organization can determine whether the entire concept is viable and whether it’s prudent to devote additional time and resources to enhance this preliminary overview.

At this early stage, AutoML also assists in narrowing the universe of potential options to quickly identify the best paths to explore for future iterations. For instance, assume initial results indicate that the best performance seems to come from a particular family of models or ensemble approach. If the project does continue, data scientists have a clear view of where to focus their attention for future experiments.

Another significant benefit of employing AutoML is when a company matures in its AI journey and is transitioning to a data-driven business. Training an in-house analyst or citizen data scientist to use AutoML to conduct this preliminary research is faster for low-risk or exploratory use cases. It may make more sense than hiring an advanced (and very expensive) data scientist with specialized expertise to build a custom solution.

Check out Dataiku visual ML at work in the video below from the Gartner Data Science and ML Bake-Off or go further yourself with this hands-on course.

Move Away From Manual Tuning and Embrace Automation

Delivering value quickly and simply within a set time frame fosters team trust and momentum, allowing for further improvements. Spending weeks manually developing and testing different model designs and architectures is an arduous task that gets tallied under the costs column, not benefits. Leaving AutoML to grind the data makes all the difference, freeing up time to perform essential analyses like checking for model fairness and bias and stress tests and what-if scenarios to assess model robustness.

Dataiku provides agility and flexibility for the demands of your projects. It’s not only an accelerator for advanced data scientists but also a solution for business professionals or data analysts or citizen data scientists who don’t have specialized coding expertise in the ML development domain.

You May Also Like

Alteryx to Dataiku: AutoML

Read More

Conquering the Data Deluge Through Streamlined Data Access

Read More

I Have Databricks, Why Do I Need Dataiku?

Read More

Dataiku Makes Machine Learning Accessible, Transparent, & Universal

Read More