Building a Modern AI Platform Strategy

Scaling AI, Featured Lynn Heidmann

When it comes to Enterprise AI strategy, the big question today has moved away from pure build vs. buy for AI platforms. Instead, the conversation has shifted to: should I buy one end-to-end platform for AI, or should I go for best-in-breed tools in each area and build the connections between those?

100% Build vs. Buy Is a Thing of the Past

Before getting into the core of the end-to-end vs. best-in-breed challenge, let’s take a step back — what happened to build vs. buy? The truth is that most organizations today won’t consider fully building an AI platform solution from the ground up for many reasons, one of which is the hidden technical debt in machine learning systems identified by Google

There is so much “glue” — so many features that are outside the core functionality of simply building a machine learning model — that building all of them from scratch to have an AI platform that truly allows for the scaling of AI efforts is prohibitively challenging (see Figure 1).

chart from google research ML technical debtFigure 1 (credit — Google): Only a small fraction of real-world machine learning systems is composed of machine learning code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.

This reality has spurred the fundamental realization that building an AI platform from scratch isn’t an option, but it has also begged the question — what is the alternative?

The New Build vs. Buy Discussion

Building a modern AI platform for most organizations today boils down to two options:

  1. Buying one end-to-end platform for data science, machine learning, and AI that covers the entire lifecycle (Figure 2), from the ingestion of raw data to ETL, building models to operationalization of those models and AI systems, plus the monitoring and governance of those systems.
  2. Buying best-of-breed tools for each of the steps or parts of the lifecycle and stitching together these tools to build the overall platform that is more customized for the organization and its needs. 

Note that in many cases, the second option is situational, meaning it’s dictated by existing investments (i.e., we already have tools for x, y, and z, what can we add to complete the stack and how can we tie it all together?) rather than driven by explicit choice in making new investments that are the best fit for the organization’s needs.

data science machine learning and ai project lifecycle map

Figure 2 : A Representation of the data science, machine learning, and AI lifecycle from raw data to AI product

Providing the very best tool for ETL, the very best for AutoML, for data cataloguing, for model management, etc. (see Figure 3), will allow each team to choose the technology they want to work with, which is a tempting prospect when attempting to keep everyone happy — getting consensus across an organization is, admittedly, no easy task. However, the “glue” between these components, while not as complex as building everything from scratch, remains a huge challenge.

pieces of the whole lifecycle-1

Figure 3 :When looking for best-of-breed tools, there are multiple pieces of the puzzle across different areas of the data science, machine learning, and AI lifecycle — gluing even just a few of these together can become complex quickly.

Besides the glue problem, there are also important components of the end-to-end lifecycle that are lost when moving from tool to tool. For example:

  • Data lineage is difficult to track across tools. This is problematic for all organizations across industries, as visibility and explainability in AI processes are crucial to building trust both internally and externally in these systems (and for some highly regulated industries like financial services or pharmaceuticals, it’s required by law). With option two as outlined above, it will be difficult if not impossible to see at a glance which data is being used in what models, how that data is being treated, and which of those models using the data are in production vs. being used internally.
  • Stitching together best-of-breed tools can also complexify the handoff between teams (for example, between analysts and data scientists following data cleansing, or between data scientists and IT or software engineers for deployment to production). Moving projects from tool to tool means some critical information might be lost, not to mention the handoff can take longer, slowing down the entire data-to-insights process.
  • As a follow up to team handoffs and collaboration between data practitioners, another challenge is the pain of managing approval chains between tools. How can the business reduce risk by ensuring that there are checks and signoffs when AI projects pass from one stage to the next, looking for issues with model bias, fairness, data privacy, etc.?
  • Option two also means missed opportunities for automation between steps in the lifecycle, like triggering automated actions when the underlying data of a model or AI system in production has fundamentally changed. 
  • In the same vein, how do teams audit and version the various artifacts between all these tools? For instance, how does one know which version of the data pipeline in tool A matches with which model version in tool B for the whole system to work as expected? 

The End-to-End Advantage

Given the aforementioned challenges, the energy organizations put into building a modern AI platform shouldn’t be spent cobbling together tools across the lifecycle, which ultimately results in losing the larger picture of the full data pipeline (not to mention adds technical debt). Instead, investing in an end-to-end platform for AI provides:

1. Cost Savings via Reuse

Seeing AI pipelines from end to end in one place contributes to the reuse and capitalization of data artifacts across the organization. For example, data that has already been cleaned and prepared by analysts can be used by data scientists in other business units, avoiding repetitive work and ultimately bringing more return on investment from AI at scale.

 

Figure 4::What capitalization and reuse across the organization can look like, leveraging parts of big, cornerstone use cases to fuel hundreds of smaller use cases with little additional marginal cost..

→ Get the Ebook: The Economics of AI

2. Focus on Implementing High-Impact Technologies 

End-to-end AI platforms like Dataiku serve as a centralized abstraction layer that allows IT and architecture teams to focus on the constant, breakneck-pace evolution of underlying technologies (see Figure 4) to benefit the entire organization instead of focusing on maintaining the interplay between tens of different tools for working with data across business units.

 

Figure 4::A look based on Google Trends at the pace of evolution in the world of data infrastructure, which can be mitigated with an end-to-end AI platform that serves as an abstraction layer on top of these technologies.

3. Smooth Governance and Monitoring

For most organizations, the concept of governance is much wider than simply data governance — it covers all the controls and associated processes that a business must put in place to mitigate risk in operations and for regulatory reasons. Having one tool with which everyone at the organization interacts exponentially simplifies efforts to mitigate growing AI risks that come with democratization as well as adhering to mounting data privacy regulations. 

The story is similar for monitoring, largely done through MLOps systems. MLOps needs to be integrated into the larger DevOps strategy of the enterprise, bridging the gap between traditional CI/CD and modern machine learning. That means systems that are fundamentally complementary and that allow DevOps teams to automate tests for machine learning just as they can automate tests for traditional software. Achieving this level of automation is possible (and simple) with one end-to-end platform, like Dataiku. It can become messy quickly when working with multiple tools across the lifecycle.

→ Get the Ebook: O'Reilly "Introducing MLOps"

The End-to-End Risk

Of course, the fear that comes with investing in one end-to-end platform is that the organization becomes tied to a single vendor. This isn’t a small risk and is not to be overlooked — lock in is a real consideration, as the company becomes dependent on that vendor’s roadmap, decisions, and more. 

To that end, it’s important to invest in end-to-end technology that is open and extensible, allowing organizations to leverage existing underlying data architecture as well as invest in best-of-breed technologies in terms of storage, compute, algorithms, languages, frameworks, etc. 

When looking at AI tools, ask questions about not only the ability of the potential platform to be integrated with all current technologies (programming languages, machine learning model libraries that data scientists like to use, and data storage systems), but about the vision of the company. It should be wide enough such that any new technologies the company may want to invest in the future can be easily integrated with the platform later on due to the vendor’s interest in staying open and cutting-edge.

You May Also Like

Make Data Prep Less of a Hassle

Read More

The Art of Collaborative Data Science at Scale

Read More

Building a Culture of Experimentation

Read More