Automated machine learning (AutoML) shows great promise in providing more efficient, explainable, and reproducible AI solutions. Organizations might wonder, however, are all AutoML tools created equal? In other words, are the AutoML capabilities offered in AI platforms and other technologies all doing the same thing? The short answer is: no, not all AutoML is created equal.
AutoML involves automating the process of applying machine learning (ML). This includes all the time consuming, iterative tasks included in ML model development. Such automation enables building and analyzing more ML models quicker and more efficiently. Amongst other things, AutoML thus allows for greater access to AI and ML and faster and greater production of reliable results for the business.
Different Degrees of Automation
It’s important to note that automation can be present in varying degrees. The vision for the future of AutoML is one of complete (or nearly complete) automation, but this is still just a vision. The reality of most AutoML tools and systems today is that they are not completely automatic — yet.
Organizations can evaluate systems that offer AutoML capabilities based on how their expectations and requirements match the level of automation present in the tool. These levels can be separated as follows:
- Manual: The system does not help you to do it — code your way in!
- Tooled: The system provides tools or components that can be combined to perform the task.
- Assisted: The system helps or guides you along the way in a simple fashion, but some important choices are still up to you.
- Auto: The system does everything, end to end.
Automation of the Data Science Pipeline
The development of AutoML has spurred the application of automation to the whole data-to-insights pipeline, from cleaning the data to tuning algorithms through feature selection and feature creation and even to operationalization. Some of the steps of the data science pipeline that can be automated through AutoML to increase the speed of the process include:
- Automated Data Preparation: Automating data preparation consists of defining a series of steps or actions that will occur each time a defined trigger occurs. For example, one useful cleaning step is that of parsing dates automatically. Dataiku offers a parse to standard date format processor which recognizes the true, unambiguous meaning of the date. So when you have a column that appears to be a date, Dataiku is able to recognize it as a date. Such steps allow you to notice any non-conforming data that needs to go through further manual inspection.
- Automated Feature Engineering: Feature engineering refers to the process of deriving new information from existing data. Automated feature engineering is the process of exploring the dataset for possible combinations of features automatically rather than manually. An example of when feature engineering might be necessary is if there are large absolute differences between values — in this case, we might want to apply a rescaling technique. Feature scaling is a method used to normalize the range between the values of numerical features. Why? Because variables that are measured at different scales do not contribute equally to the model fitting, and they might end up creating a bias.
- Algorithms Comparison and Parameter Optimization: Quickly hone in on the most promising regions of the search space of the algorithms’ hyperparameters to explore and find better models in the limited amount of time available to train and test models. Automatically compare algorithms and hyperparameters, preselecting only those that make sense given the data, and select the best performing one(s).
How It Works in Dataiku
Dataiku —the world’s leading AI and machine learning platform that supports agility in organizations’ data efforts via collaborative, elastic, and responsible AI, all at enterprise scale — contains a powerful AutoML engine that allows you to get highly optimized models with minimal intervention.
In Dataiku, you can select between:
- The Expert Mode: Having the full control over all training settings, algorithm settings, and optimization process, including writing your own custom models and using advanced deep learning models
- The AutoML Mode: Using Dataiku’s powerful automatic ML engine in order to effortlessly get models
The AutoML engine of Dataiku will analyze your dataset, and, depending on your preferences, select the best features handling, algorithms and hyperparameters. Note, however, that in the AutoML mode, you will still be able to define the types of algorithms Dataiku will train. This will let you choose between fast prototypes, interpretable models, or high-performing models with less interpretability.
Dataiku also offers features that go beyond AutoML and toward the automation of the entire data-to-insights pipeline. You can also choose to automate actions and workflows in Dataiku to leverage powerful scheduling capabilities. Preparing data, for example, requires repetitive tasks like flagging invalid rows and parsing to standard date formats, converting currencies, and more. With Dataiku, scenarios and triggers automate repetitive processes by scheduling for periodic execution or triggers based on conditions.
A scenario has two required components:
- Triggers that activate a scenario and cause it to run
- Steps, or actions, that a scenario takes when it runs
There are many predefined triggers and steps, making the process of automating Flow updates flexible and easy to do. For greater customization, you can create your own Python triggers and steps.
With automation in place and a strong team of data scientists and citizen data scientists, organizations can manage more projects and scale AI across the enterprise.