Scaling AI With Operational Controls and Automation: DataOps, MLOps, and ITOps

Dataiku Product, Scaling AI Timothy Law

Focusing on quality control and process efficiency has helped manufacturers cost-effectively scale their operations. Scalable operations withstand stressors, handle fluctuations in volume, anticipate disruptions, and use automation to keep the business running smoothly and  efficiently. 

Like manufacturing, ML models and data pipelines are subject to operational stressors, constraints, and disruptions. Dataiku, the platform for Everyday AI, applies operational controls and automation to AI to help scale data science operations by leveraging DataOps, MLOps, and ITOps.

DataOps and Data Pipelines

In many ways, ML pipelines are similar to physical pipelines in a process manufacturing plant. These pipelines have to feed a manufacturing process continuously. They must operate within specific parameters, can’t be leaky, and must be monitored and alert operators when the inputs have changed.

DataOps is very similar, except data moves data through the pipeline instead of mixtures. DataOps automates the continuous feeding of data to live production models, repeating each preparation and transformation step used to build and train the models. It includes ensuring that timely, accurate data is available to populate analytics products (such as reports and dashboards), analytics applications, and production AI and ML models. 

An essential step in the DataOps methodology is organizing the data pipeline. In Dataiku, the data pipeline is represented by a well-organized visual representation called the Flow. Everything in Dataiku is organized by project with full collaboration, a core pillar of any DataOps methodology.

dataiku pipeline

ML data pipelines must perform as flawlessly as industrial pipelines. And the processes and technology that support the ML pipelines must be robust. The entire DataOps lifecycle must be managed, monitored, and governed for enterprises to scale AI and ML, including data documentation

An AI platform must include capabilities for maintaining data quality, monitoring the pipeline for data and schema changes, and automatic rebuilding. Automated data integrity checks are critical to ensure the integrity of the data pipeline. Dataiku automates the steps of building and rebuilding pipelines with checks, metrics, and scenarios

metrics, checks, actions

Data quality and checks in Dataiku allow for automatic assessment of pipeline elements to compare with specified or previous values, ensuring that automated flows run within expected timeframes and with expected results. When data pipeline items fail checks, Dataiku returns an error, prompting an investigation and promoting quick resolution.

Data pipelines require repetitive tasks like loading and processing data. With Dataiku, scenarios and triggers automate repetitive processes by scheduling periodic execution or triggers based on conditions. With automation, production teams can manage more projects and scale confidently to deliver more AI projects.

Once built, these data pipelines must be operationalized along with the ML model trained on the data, which brings us to ML operations, better known as MLOps.

Integrated DataOps and MLOps to Manage Cost and Risk

There has been a lot of recent focus on MLOps. Enterprises have indeed struggled to get models into production. Unfortunately, getting models into production can be wrongly viewed as an isolated problem requiring point solutions. The best approach to scaling AI operationally is to use a platform with integrated MLOps. The two critical operational principles that apply here are automation and centralized observability. Together, these remove two obstacles to scaling AI: cost and risk. 

Automation is necessary to manage the costs associated with scaling to hundreds or thousands of models in production. Enterprises look to automation as a way to help their operations scale and make operations teams more productive. Automated triggers allow operators to monitor models in production in real time and alert operators when a model is not performing within established metrics. Integrated MLOps enable more significant levels of automation and efficiency.

For example, operators must have tools to diagnose and fix the problem quickly when a model begins to fail (e.g., model drift). Dataiku includes a model registry where operators can review all models' status and performance metrics across multiple Dataiku instances and projects. 

One of the biggest inhibitors to scaling AI is actual and perceived risk. Management teams need to be assured that models are safe before deployment and that controls are in place to ensure the model’s performance and safety. And they want their operations teams to have tools to identify and mitigate risks and manage models cost effectively. Dataiku features a host of tools for achieving responsible and governed AI. Dataiku Govern helps identify and mitigate project risk, giving operators a robust framework to determine the necessary levels of controls to apply to each model.

Dataiku Govern

Dataiku Govern provides controls to help scale AI

An industry best practice is to simulate potential production issues in the pre-production environments. Dataiku takes an innovative approach to stress testing AI, enabling simulations of production scenarios — such as data leakage and flaws in the data pipeline — to test model robustness before deployment. 

Stress testing provides operators a glimpse of potential stressors and their impact on live production models. It also helps to inform the type and level of operational controls and governance required in the production environment. 

target distribution shift

Stress testing of models informs the type and level of required monitoring

The architecture of Dataiku enables operational scale, removing cost and risk with an integrated workflow and automation, an AI Governance framework, and controls for AI initiatives. 

Dataiku features a complete governance and operational control framework.

Dataiku features a complete governance and operational control framework

ITOps Is a Critical Component

None of the steps above matter if your physical or virtual infrastructure fails. So, when Dataiku says our platform is agnostic to infrastructure, it doesn’t mean we don’t care. We care a lot! Dataiku has invested in ensuring our AI platform runs seamlessly and securely with the most highly scalable, available, and performant infrastructures like AWS, Azure, GCP, Snowflake, and others. 

Dataiku provides IT operations with many tools to ensure the infrastructure's smooth operation. For example, Dataiku employs an infrastructure-as-code approach to provision instances on your infrastructure in hours, not days. This approach makes it easy for ITOps to manage, maintain, and upgrade Dataiku instances. 

Dataiku also monitors the availability of physical/virtual infrastructure, logs, and resource consumption so that operations teams always have a complete view of platform health. 

Scale AI Operationally

Operational controls and automation are the keys to scaling AI and confidently putting more models into production. Dataiku integrates deep DataOps, MLOps, and ITOps capabilities to ensure models and data pipelines can withstand stressors, remain robust and keep running smoothly in production.  

The platform provides a complete set of tools for operators to test models, observe their production environment, diagnose and fix issues quickly, and automate operations for cost efficiency. Now that you understand how Dataiku scales AI operationally, you can learn more about how it helps enterprises scale AI technically and organizationally

You May Also Like

Alteryx to Dataiku: AutoML

Read More

Conquering the Data Deluge Through Streamlined Data Access

Read More

I Have Databricks, Why Do I Need Dataiku?

Read More

Dataiku Makes Machine Learning Accessible, Transparent, & Universal

Read More