Inspired by the latest CDiscount tech post giving an overview of some MetaML solutions, we thought this would be the perfect time to also talk about MetaML. Because what could be more meta than to write a blogpost about another blogpost?
Now that my “meta” joke is over, we are going to refer to MetaML as AutoML from now on, because that's what most people nowadays are calling it.
As its name suggests, MetaML - I mean AutoML- is about using machine learning techniques to, well... automatically do machine learning. Or in other words, automating the process of applying machine learning. AutoML has big implications for how data teams might work in the long term, but even today, it can add efficiency to parts of the process.
AutoML and AI
Often times, people will only focus on the automatic tuning or selection of the best-performing algorithms for a given task. However, AutoML has a broader scope, and it can be applied to the whole pipeline of machine learning, from cleaning the data to tuning algorithms through feature selection and feature creation.
The AI dream would be to feed a dataset and a target to this automated pipeline and get back cleaned data with engineered features, together with the best performing model on top. That might put data scientists out of a job, but wouldn’t that be nice?
AutoML and Dataiku
At Dataiku, we’ve been working on exploring those different spaces. We’ve lately been focusing on the feature space: how can we automatically generate meaningful features for a given prediction task (like this)? But this is for another time and another post, so stay tuned for more information on that later!
Since the early days of Dataiku, we have proposed a visual machine learning suite that guides the user through all of the machine learning steps (train-test split, features handling, metrics to optimize, different templates of pre-set algorithms). The interface offers a one-button option, simply called "Train" - but of course, it's still up to the user to tune those parameters and select the best possible settings based on their experience. This will automatically infer the feature handling, pre-select a collection of algorithms, and returns the best performing one.
We took the CDiscount test to see how well we performed with only this visual AutoML on the three Kaggle challenges. This would allow us to see how AutoML performs compared to humans completing the tasks from scratch, and here are the results:
So here you have it, with just a few clicks, from importing your dataset to submitting your predictions to Kaggle, you can already leverage the power of AutoML and score high in the CDiscount benchmark! All of those features are already available in our community edition, so why not take it for a spin here and tune those models for an even better result!