According to the O’Reilly book “Machine Learning Logistics” by Ted Dunning and Ellen Friedman, “90% of the effort in successful machine learning is not about the algorithm or the model or the learning. It’s about logistics.” Many of these logistics fall within the confines of machine learning model management which, without a crystal-clear process in place for, is bound to cause errors (or worse, failures) within a given project.
Okay, so let’s take a step back. At the surface, the AI project lifecycle seems relatively straightforward: identify a business objective, collect data, build a model, deploy, and iterate. However, there are a myriad of challenges that can make machine learning model management — in other words, MLOps — tremendously difficult:
- The conflation of shifting business needs and constantly changing data can make it challenging to ensure that the reality of the model aligns with expectations and addresses the original problem.
- Often, business, data science, and IT teams are all involved in some step of the AI lifecycle but likely either don’t have a centralized place to align on data needs and/or don’t communicate on everyone’s precise role until it’s too late.
- A lack of data and AI governance policies can lead to “shadow IT,” or the deployment of other policies or systems outside of one central team. This typically is a sign of deeper-rooted issues with the organizational structure around the AI project lifecycle.
- Without a way to create a culture of reuse when it comes to data and AI projects, teams can drastically waste time and manpower (i.e., if two people working in different parts of the company spend time creating the same solution or replicating a workflow that already exists).
Machine Learning Model Management Is Continuous
A common misconception is that deploying machine learning models in production is the last (or one of the last) steps in the AI project lifecycle. Machine learning model management, however, is just the beginning when it comes to monitoring model performance and ensuring the models behave as anticipated. Machine learning model management is an underlying process that encompasses and informs all of the steps in the AI project lifecycle, helping the organization accomplish tasks such as mitigating risk and building Responsible AI systems. It’s meant to make working with models easier, more transparent, and more efficient.
In addition to it being an ongoing process, machine learning model management also requires different types of profiles with diverse skills in order to be successful. By looking at the steps AI projects involve at an individual organization will help teams identify who needs to be involved and ensure they have the right tools and processes in place to develop, monitor, and govern models that will not put the business at risk.