Exploring CI/CD in a Machine Learning Project With Dataiku

François Sergot, Senior Product Manager at Dataiku, shared some insight into CI/CD features in Dataiku during the 2021 Dataiku Product Days. This blog post will highlight parts of his session by going over some of the basics of CI/CD and presenting a demo led by François in Dataiku.

What Is CI/CD in a Machine Learning Project?

It all starts with the notion of Operationalization. This defines how you are going to serve your machine learning (ML) project to your business user. Technically, you can operationalize your projects using methods than range from fully manual to fully automated. This topic is focusing on the fully automated approach. However interesting this ‘full automation’ approach seems, remember that it is not wise to set a goal of everything automated: this would be unreasonable. Instead, you need to evaluate each project according to its criticality and the resources that you have to determine whether it should be automated or not.

Full automation in this context means CI/CD. CI/CD refers to the combined practices of continuous integration and continuous deployment. The Continuous Integration part means merging a shared work into a shippable product as often as possible.And the Continuous Deployment part means deploying this shippable product as often as possible. And both through an Automated process.
You can have a more complete understanding around CI/CD from our previous blog post.

Machine learning projects can benefit from CI/CD at many levels, we can highlight:, there are some specificities to consider.

Models decay and need to be renewed: Machine learning inherently deals with models that decay over time and need to be retrained and monitored. To do this, you need this notion of frequent updates.
Complexity of dependencies for model deployment: Models heavily depend on data preparation, infrastructures, and the data that you're manipulating. So it makes moving models to production a complex operation which could greatly benefit from automation.

CI/CD in Action

In this example, we will be using the churn Prediction project and see how it can be push in production using a fully automated Jenkins pipeline

In this example, François walks us through a project that was on his design node and follows these steps:

Validating and packaging the project
Pushing the project to the test Automation node
Running Tests on the project
Moving the project to the production Automation node
Running a smoke test and rollbacking if necessary

We will also see some additional thoughts and ideas to help you start such a project.

The step-by-step explanation with code samples used in this video can be accessed here.

Exploring CI/CD in a Machine Learning Project With Dataiku

What Is CI/CD in a Machine Learning Project?

CI/CD in Action

You May Also Like

Everything to Know: AI Agents for Supplier Risk Assessment

Building AI Agents for Life Sciences: From Silos to Synthesis

Scaling GenAI in Financial Services With Dataiku and NVIDIA

How Databricks & Dataiku Embed Governance Into AI Workflows

Exploring CI/CD in a Machine Learning Project With Dataiku

What Is CI/CD in a Machine Learning Project?

CI/CD in Action

Watch the Full Video

Subscribe to the Dataiku Blog

You May Also Like

Everything to Know: AI Agents for Supplier Risk Assessment

Building AI Agents for Life Sciences: From Silos to Synthesis

Scaling GenAI in Financial Services With Dataiku and NVIDIA

How Databricks & Dataiku Embed Governance Into AI Workflows