With Generative AI on the scene, the build vs. buy for AI platforms conversation has evolved yet again. Previously, it was pure build vs. buy, then transitioned into buying one end-to-end platform for AI (including Generative AI) vs. building the connections between best-in-breed tools in each area of analytics and AI.
Now, with dedicated, AI-powered point solutions available to augment individual operations and processes for Generative AI, the conversation is once again figuring out the best path forward. This article goes deeper into the latest evolution of the build vs. buy discussion, the benefits and drawbacks of each option, and some important considerations now that Generative AI is so prominent in the analytics and AI landscape.
Generative AI Brings the Build vs. Buy Discussion Back Into Purview
First, let’s explore the build route. The truth is that most organizations today won’t consider fully building a platform solution from the ground up for many reasons, one of which is the hidden technical debt in machine learning systems identified by Google. That was true before and even more true today with Generative AI.
There is so much “glue,” meaning so many features that are outside the core functionality of simply building a machine learning model. Therefore, building all of them from scratch to have an AI platform that truly allows for the scaling of AI efforts (from moonshot use cases to integrating Generative AI) is prohibitively challenging (see Figure 1).
On top of that, in the midst of the current release battle between major AI players, it's almost always uncertain if you're choosing the absolute best model for your use case, with the next great thing typically right around the corner. Spending the time and resources to build your own AI platform that works with just one particular Generative AI model, could be costly and detrimental if, come tomorrow, you decide that the newest release from another player is actually better suited for your needs. The resources and effort spent building for one model become sunk costs as your team scrambles to build for another option — an option which could soon be replaced too.
This reality has spurred the fundamental realization that building an AI platform from scratch isn’t really an advantageous option, but it has also begged the question — what is the alternative?
Let's Discuss Buying and the Caveats
There are a couple of different pathways for buying, but it’s never 100% buy, because some building is always involved:
1. Point Solutions (Especially for Generative AI)
With Generative AI taking center stage, more and more companies have developed AI-powered point solutions that solve specific problems in the organization (i.e., AI-powered email generation for sales development representatives, AI-powered contract review for purchasing, etc.). Indeed, the. models developed can offer very high performance in one specific area and can offer rapid time to value, as they can be used nearly off the shelf.
However, they are not scalable. They augment one process, but do not provide any benefit to adjacent processes. Further, buying off-the-shelf solutions does not help develop the core skills necessary to develop Generative AI-powered capabilities throughout the organization. Technical debt accumulates as more point solutions become part of core business processes, creating dependencies on external vendors.
Finally, because these solutions are commoditized, they provide the same quality performance to an organization’s competitors. While that may be acceptable for some back office tasks, it doesn’t allow companies to differentiate their core offering (so one option that we’ll cover below includes building their own AI-powered apps that offer differentiated performance for core business functions).
2. Buying Best-of-Breed Tools for Each Lifecycle Stage
The other buy option is buying tools for each of the steps or parts of the analytics project lifecycle and stitching together these tools to build the overall platform that is more customized for the organization and its needs.
Note that in many cases, this option is situational, meaning it’s dictated by existing investments (i.e., we already have tools for x, y, and z, what can we add to complete the stack and how can we tie it all together?) rather than driven by explicit choice in making new investments that are the best fit for the organization’s needs.
Figure 2 : A Representation of the data science, machine learning, and AI lifecycle from raw data to AI product
Providing the very best tool for ETL, the very best for AutoML, for data cataloging, for model management, etc. (see Figure 3), will allow each team to choose the technology they want to work with, which is a tempting prospect when attempting to keep everyone happy — getting consensus across an organization is, admittedly, no easy task. However, the “glue” between these components, while not as complex as building everything from scratch, remains a huge challenge.
Figure 3: When looking for best-of-breed tools, there are multiple pieces of the puzzle across different areas of the data science, machine learning, and AI lifecycle — gluing even just a few of these together can become complex quickly.
Besides the glue problem, there are also important components of the end-to-end lifecycle that are lost when moving from tool to tool. For example:
- Data lineage is difficult to track across tools. This is problematic for all organizations across industries, as visibility and explainability in AI processes are crucial to building trust both internally and externally in these systems (and for some highly regulated industries like financial services or pharmaceuticals, it’s required by law). With option two as outlined above, it will be difficult if not impossible to see at a glance which data is being used in what models, how that data is being treated, and which of those models using the data are in production vs. being used internally.
- Stitching together best-of-breed tools can also complexify the handoff between teams (for example, between analysts and data scientists following data cleansing, or between data scientists and IT or software engineers for deployment to production). Moving projects from tool to tool means some critical information might be lost, not to mention the handoff can take longer, slowing down the entire data-to-insights process.
- As a follow up to team handoffs and collaboration between data practitioners, another challenge is the pain of managing approval chains between tools. How can the business reduce risk by ensuring that there are checks and sign-offs when AI projects pass from one stage to the next, looking for issues with model bias, fairness, data privacy, etc.?
- Option two also means missed opportunities for automation between steps in the lifecycle, like triggering automated actions when the underlying data of a model or AI system in production has fundamentally changed.
- In the same vein, how do teams audit and version the various artifacts between all these tools? For instance, how does one know which version of the data pipeline in tool A matches with which model version in tool B for the whole system to work as expected?
The End-to-End Platform Advantage
Given the aforementioned challenges, the energy organizations put into building a modern AI platform shouldn’t be spent cobbling together tools across the lifecycle, which ultimately results in losing the larger picture of the full data pipeline (not to mention adds technical debt). Plus, as mentioned, the other route of point solutions simply isn’t scalable for the enterprise.
Instead, investing in an end-to-end platform for AI — including Generative AI — that covers the entire lifecycle, from the ingestion of raw data to ETL, building models to operationalization of those models and AI systems, plus the monitoring and governance of those systems, provides:
1. Cost Savings via Reuse
Seeing AI pipelines from end to end in one place contributes to the reuse and capitalization of data artifacts across the organization. For example, data that has already been cleaned and prepared by analysts can be used by data scientists in other business units, avoiding repetitive work and ultimately bringing more return on investment from AI at scale. Ingraining the concepts of reuse and capitalization into its very fabric is critical for any organization aiming to scale its AI strategy.
Figure 4: What capitalization and reuse across the organization can look like, leveraging parts of big, cornerstone use cases to fuel hundreds of smaller use cases with little additional marginal cost.
2. Focus on Implementing High-Impact Technologies
End-to-end AI platforms like Dataiku serve as a centralized abstraction layer that allows IT and architecture teams to focus on the constant, breakneck-pace evolution of underlying technologies to benefit the entire organization instead of focusing on maintaining the interplay between tens of different tools for working with data across business units.
Further, Dataiku’s vision was always to provide the platform that would allow organizations to quickly integrate new innovations from the fields of machine learning and AI into their enterprise technology stack and their business processes. The arrival of modern Generative AI and LLMs is perfectly in line with that original vision — Dataiku includes integrations to leading Generative AI providers like OpenAI, Azure, AWS, and Hugging Face. With Dataiku’s model and provider-agnostic approach, teams can leverage the latest and greatest Generative AI technologies.
3. Smooth Governance and Monitoring
For most organizations, the concept of governance is much wider than simply data governance — it covers all the controls and associated processes that a business must put in place to mitigate risk in operations and for regulatory reasons. Having one centralized tool simplifies efforts to mitigate AI risks that come with democratization.
Additionally, a centralized platform with the flexibility to customize governance processes facilitates the systematical implementation of operational workflows of AI projects. By providing essential foundations and streamlined, transparent processes that allow you to pivot strategies quickly with oversight, this accelerates preparation for the EU AI Act and other regulation changes.
The story is similar for monitoring, largely done through MLOps systems. MLOps needs to be integrated into the larger DevOps strategy of the enterprise, bridging the gap between traditional CI/CD and modern machine learning. That means systems that are fundamentally complementary and that allow DevOps teams to automate tests for machine learning just as they can automate tests for traditional software. As LLMOps emerges to manage the complexities of LLMs, it introduces unique challenges, such as specialized metrics to measure LLM response quality and new processes for dynamically swapping out the AI model services and technologies powering production applications as better ones emerge. These complexities make it critical to integrate LLMOps with existing MLOps practices to streamline deployment, health monitoring, and model management. Achieving this level of automation is possible (and simple) with one end-to-end platform, like Dataiku. It can become messy quickly when working with multiple tools across the lifecycle.
4. Combine the Best Elements of the Other Options
For nearly all organizations, investing in an AI platform will be the best choice for the simple reason that it allows them to take the best aspects of the other three possible paths, all while also addressing their shortcomings.
With a platform:
- Service providers can still provide staff augmentation, working in the platform to ensure that their deliverables are leveraged and maintained after the end of their engagement.
- Point solutions can be integrated when they provide an incremental benefit for a particular application, while maintaining an overall governance structure across all AI initiatives.
- The highly technical builders within the organization will see their work accelerated, while allowing them to collaborate more closely with their business beneficiaries.
The End-to-End Risk
Of course, the fear that comes with investing in one end-to-end platform is that the organization becomes tied to a single vendor. This isn’t a small risk and is not to be overlooked — lock in is a real consideration, as the company becomes dependent on that vendor’s roadmap, decisions, and more.
To that end, it’s important to invest in end-to-end technology that is open and extensible, allowing organizations to leverage existing underlying data architecture as well as invest in best-of-breed technologies in terms of storage, compute, algorithms, languages, frameworks, etc. With Dataiku, for example, organizations can choose the right Generative AI model for a given application via the LLM Mesh, a common backbone that provides the components companies need to build safe applications using LLMs at scale. For example, choosing between a public model provided as a service, or running an open-source model on their own private infrastructure.
When looking at AI tools, ask questions about not only the ability of the potential platform to be integrated with all current technologies (programming languages, machine learning model libraries that data scientists like to use, and data storage systems), but about the vision of the company and its overall AI strategy. It should be wide enough such that any new technologies the company may want to invest in the future can be easily integrated with the platform later on due to the vendor’s interest in staying open and cutting-edge.