Ensuring Smooth MLOps With Unified Monitoring

Dataiku Product, Scaling AI, Featured Chad Kwiwon Covin

The concept of an MLOps workflow should be as smooth as butter. The idea of monitoring models and data products should be seamless — incorporating automation, metrics, and checks to ensure that human intervention can swiftly address any issues. This approach allows organizations to continue their AI initiatives without disruptions. However, in practice, the smoothness of many MLOps strategies often falls between gravel and sandpaper. 

As organizations create more data products and machine learning (ML) models, finding a standardized way to track them and intervene when issues arise becomes challenging. This is particularly tough if multiple ML platforms are used to deploy models and track their post-deployment health. These issues lead to the last three words an executive wants to hear: lack of oversight.

Even for mature organizations, oversight and governance over the entire MLOps process can be tricky. Enterprise-level organizations often employ multiple platforms for model development and operationalization. However, siloed platforms require operations engineers to access various distinct systems to oversee AI models’ and pipelines' activity and health. This can cut down on efficiency and make it difficult to maintain standardized governance and rulesets across different platforms. Because this is such a common issue, Dataiku has made it a priority to be the solution that centralizes this process by bridging your ecosystem.

If you haven’t noticed, Dataiku has been evolving into the comprehensive solution for managing all models and projects in production for your organization. Initially, we introduced support for External Models, enabling the use and tracking of already deployed third-party models in your Dataiku Flow. Next, we launched the Deploy Anywhere capability, allowing models developed with Dataiku to be deployed on any cloud ML platform. Now, the most important piece of the puzzle has arrived: Unified Monitoring.

Unified Monitoring: Bringing Unified Oversight to MLOps

Unified Monitoring is a one-stop hub for all visibility and oversight into your MLOps. By utilizing External Models and Deploy Anywhere capabilities to extend coverage, this central watchtower enables operators to oversee and monitor pipelines and models developed and deployed across diverse platforms. With consolidated monitoring, you can broaden the scope of your deployments, projects, or APIs deployed via the Dataiku Deployer, as well as cloud endpoints from AWS SageMaker, Azure Machine Learning, and Google Vertex AI. 

Most importantly, you can merge various monitoring statuses into a single location, including activity, deployment, execution, and modeling statuses. This allows IT operators and ML engineers to discern which deployments are functional, which are not, and how to swiftly identify and rectify issues. Let’s dive into how it works (feel free to watch the video below or read further!).

Monitoring in Practice

Located within your Dataiku deployer, Unified Monitoring features three distinct screens: Overview, Dataiku Projects, and API Endpoints. 

Overview

We'll start with the primary overview screen. Dataiku projects for batch scoring and API endpoints for real-time scoring are listed and monitored, as well as API endpoints on AzureML, AWS SageMaker, or Google Vertex AI. The overview screen acts as a triage dashboard, surfacing any deployments with errors and warnings towards the top, so that operators can quickly identify issues requiring their attention and resolve them. Users can click on objects to show a more comprehensive view, listing all deployment details.

The overview in Unified Monitoring shows a count of how many deployments are on each infrastructure.

The overview in Unified Monitoring shows a count of how many deployments are on each infrastructure.

Users can click on objects to show a more comprehensive view, listing all deployment details.

Dataiku Projects

The Dataiku Projects dashboard provides an overview of all batch deployment projects on your Dataiku automation node, featuring four critical statuses: global status, deployment status, model status, and execution status. These indicators facilitate rapid identification of potential issues within projects.

For instance, if the model status presents a warning, an ML engineer can revisit the project to examine the model evaluation store. Similarly, an error in the execution status might indicate a problematic automation scenario. These statuses offer you a swift, efficient, and standardized method for pinpointing concerns at a glance.

Filter dashboards by different stages and statuses to see only the information that is important to you.

Filter dashboards by different stages and statuses to see only the information that is important to you.

API Endpoints

The final screen is the API Endpoints dashboard. Each row on this screen signifies an individual endpoint, either from the Dataiku API node or a cloud-based one. Endpoints possess three status types similar to those for Dataiku projects but also have details specific to real-time scoring. 

Hovering over the activity plots will show the volume of an endpoint at a specific time over the past 24 hours.

Hovering over the activity plots will show the volume of an endpoint at a specific time over the past 24 hours.

Critical health details like response time, volume, and activity can be viewed at a glance, enabling IT operators and ML engineers to assess API performance and reliability. Instant visibility helps teams proactively tackle issues, optimize resource allocation for real-time use cases, and improve user experiences by ensuring APIs are performing well.

Advanced Settings

Administrators have the flexibility to choose which specific project and API infrastructures to explicitly monitor, and they have easy access to full monitoring logs for troubleshooting purposes. The standout functionality in the advanced settings, however, is the introduction of monitoring scopes. This unique capability allows status details about remote cloud API deployments to be passed through to Dataiku’s Unified Monitoring dashboards, regardless of an explicit integration with a Dataiku project.

Multiple scopes can be added to Unified Monitoring, even from disparate cloud environments.

Multiple scopes can be added to Unified Monitoring, even from disparate cloud environments.

The impact? From this central cockpit, IT operators now have a clear, consolidated view of deployment health across all ML platforms and a single place to monitor myriad MLOps activities.

Unified Monitoring Is Necessary to Move Forward

As more and more data and AI products become operational in an organization, the need for effective monitoring grows larger. Comprehensive oversight and solid governance empower organizations to accelerate progress and generate more models. Establishing an automated system for managing and supervising deployments is vital to thriving in this landscape.

This is where Unified Monitoring becomes indispensable. A single, comprehensive view of your organization's deployments, irrespective of location or infrastructure, has evolved from a luxury to a necessity.  

Let Dataiku streamline MLOps with Unified Monitoring and automation, quality checks for models and deployments, endpoint responsiveness details, and so much more. Through global oversight of your entire production terrain, Dataiku is the platform that makes MLOps as smooth as you dreamed it to be.

You May Also Like

Democratizing Access to AI: SLB and Deloitte

Read More

Secure and Scalable Enterprise AI: TitanML & the Dataiku LLM Mesh

Read More

Revolutionizing Renault: AI's Impact on Supply Chain Efficiency

Read More

Leveraging AI in Human Resources for Enhanced Recruitment

Read More