How to Make Your AI Projects Successful: Insights From NVIDIA

Scaling AI Catie Grasso

In a recent fireside chat on understanding analytics and AI project successes and challenges, we were joined by Will Benton, principal product architect at NVIDIA, to double click into how organizations can clearly understand their most common AI project failure points, where they go off track, and how to prevent them from happening in the future. 

NVIDIA logo

Benton works to make data scientists more productive by helping to make benefits of accelerated computing accessible not just to research-focused data scientists, but to working data scientists in mainstream enterprises. This blog post is a transcription of some of our favorite responses from the discussion.

→ Watch the Full Fireside Chat Here

Reframing Project “Failure” 

Catie Grasso: So I would love to learn what you hear most from leaders that you talk to as some of the top reasons that their AI projects fail. 

Will Benton: Yeah, that's a great question. But I think that it's a great question for two reasons. And one reason is that I think we as an industry should be really careful about how we talk about data science and AI projects failing.

Failure is a necessary part of exploring the unknown. In general, it's absolutely part of doing good science, doing advanced development, and I feel like a lot of times in the industry, we'll say, well, 85% of data science projects never make it to production and present that like it's a fundamental flaw in the way we're doing data science.

Now when we do that, I feel like we can create the wrong expectations for practitioners and almost take away that psychological safety that people need to explore really bold new things that are gonna lead to greater business value and greater advancements. By analogy, I don't think we'd say something's wrong with drug discovery because 99.99% of compounds never make it to a clinical trial. We just say that we learned a lot of things, and ideally, we're benefiting from that. So I think it's important to focus on how to help practitioners iterate as quickly as possible so they can get to that point of failure or potential success quickly.

And then if a project has failed, what did we learn from it? Did we learn something? Did we do this in a disciplined enough way that we actually learned something about the world or about this modeling technique or whatever? And then when we're getting to production, which I suspect is the question you really wanted to ask — can we get there without any speed bumps?

bandaid over a crack

What’s Preventing Teams From Getting to Production?

Will Benton: When it comes to the kinds of failures we see that keep us from getting from a successful experiment to a successful production system, I think a really big thing that I'm sure you've seen as well with your customers is that this development production process is often a lot of manual work.

It's a data science team throwing something over the fence to another team who has to understand it and reimplement it, often in a different language or framework, often under wildly different constraints. The data science team is relatively unconstrained.

They have flexibility to choose the best tool for the job. The enterprise developer team who puts us into a production system has to deal with it. They have to deal with audits, they have to deal with security, and they have to deal with things like single sign-on. So I think this mismatch between the way that data scientists work and the way that enterprise developers work is really a big part of the problem. It's one of the social causes of this problem,

Data scientists are focused on their research process. They're used to that flexibility. These ML engineers or enterprise developers are focused on doing something that's hard to get wrong. They're focused on an environment where you don't want to fail, where failure isn't learning, where failure means you did something wrong.

So a really extreme example I saw from this while helping a customer in a previous role was someone who was trying to get an ML pipeline into production. And this data scientist just ran every project they worked on out of a single Python virtual environment. And over the years, this environment just sort of collected packages like barnacles on a ship,

So there were around 700 Python packages in this environment. And, you know, to put the app into production, we had to say, well, not only how can we reimplement this in a sensible way, but how can we get this through an IT security audit? Which meant going through and saying, “Which of these packages are really necessary?”

And it's not just a case of saying, “What do we import?” Because sometimes Python packages will behave differently depending on if something's installed, even if they don't really require it. It looked more like archeology than like engineering or research at that point, but sure. There are a lot of challenges around that kind of workflow where just the way that the data scientist wants to work is, and, and I'm not saying that that's the right way to work, but the way that the data scientist wants to work is incompatible with what we can actually put into production.

I think that for a long time it's been popular to suggest that data scientists just need to have some more engineering discipline. They just need to be more like enterprise developers or software engineers. And I think there's a lot of energy from the industry and from open source communities put into making these tools that assume that the thing that's missing from a data scientist's life is editing YAML files or building containers or doing some kind of DevOps stuff to make their work more reproducible. And I don't think a lot of data scientists want to care about that. There are a lot of places where more engineering discipline will make your life easier as a data scientist dealing with your source control writing tests sort of thing.

But you don't want to be a release engineer. You don't want to be thinking about Kubernetes or containers or any of these other things. That's someone else's job. And it's just getting in the way of what you wanna do as a data scientist. So I think we need to have tools for data scientists that meet them where they are and make it possible to do good reproducible work without imposing a new style of working on them.

What About the Business Side of the House?

Catie Grasso: Is there something that you observe that is the most frequent failure point for the business elements of a project?

Will Benton: I think it’s not really a question of the project failing so much as the overall program failing in a lot of cases. One challenge has been that we set out to solve for a certain set of metrics and we solved for those metrics, and it turned out they weren't the right metrics. And sometimes you have this sort of communication challenge between stakeholders and data scientists.

Now, sometimes that takes the form of data scientists saying, well, I improved my AUC and therefore I did my job properly versus I drove the right business outcome. I think even for senior data scientists who understand the business, often it's a challenge and often what stakeholders want changes over time.

And it's not always easy to sort of say, hey, take what you've learned from solving this problem and use it to solve a related problem. Sometimes it works, but sometimes it doesn't. So I think this question of setting the right metrics and really identifying realistic expectations for what you can do and what you want to do with AI is super important for a successful project.

The Role of IT in Successful AI

Catie Grasso: What role do IT leaders play in the success of an analytics and AI project? They often don't get as much credit for the active role that they might play, even if it's a little bit behind the scenes. 

Will Benton: It’s so very important to think about the sort of interplay between IT and data science teams, because both are coming at these problems from different perspectives. Both have different incentives and goals, and I think we only get the best outcomes when we think about both of those participants.

So with IT, their main concern is control. Their main concern is making sure that their pager doesn't go off in the middle of the night. Making sure that no one loses data, making sure that no one gets sued. With data science teams, their focus is more innovation. It's flexibility. It’s asking, “Can I try this latest technique, can I use this latest library? Can I work in the way that I'm most comfortable?” 

And I think where these teams meet is that there is a lot of value in the sort of governance aspect of it. If you think about the kinds of processes that you have to go through, obviously with machine learning systems, especially those that are handling personally identifiable information, you can have really disastrous results with a security exploit. And this is an obvious benefit of having oversight over machine learning systems. You're less likely to suffer those security bugs than you are if you're just taking whatever a data scientist threw over the fence.

Another advantage though is around datasets, which I think is actually an area that not enough people are spending enough time thinking about. A lot of times data scientists, if they need to augment the data they have, will pull down a publicly available dataset. But a lot of times those datasets come with usage restrictions and you really need legal and IT support to look at this and say, “Can we build a product based on this? Can we put something we learned from this dataset into production, given the terms that it's distributed under?”

And a lot of times, something you can download for free on the internet is not actually free to use as you see fit. So I think that kind of oversight on datasets is very important.

How Does MLOps Fit Into All of This?

Will Benton: A lot of people have tried to define MLOps in a lot of different ways. The definition I keep coming back to is that it's the processes, the culture, and the tools that make it possible to do responsible, repeatable work building machine learning systems. I feel like as an industry, we've sort of gotten distracted by the machine learning models.

A lot of people have talked about this, but models are exciting. You know, models have names, models are celebrities. But the system that goes around the model is just as complicated as any other software system, except you have this opaque, complicated thing in the middle that's making decisions without explaining them. So I think you can’t force people to do the right thing or to do the most responsible thing when they're doing their jobs as creative professionals. But what you can do is you can create a culture of processes that are easy to follow and likely to produce better results. And you can create tools that make it easier to do the right thing than to do the wrong thing. 

And I think that's where MLOps comes in. You can't prevent any failure or any mistake, but you can produce a robust system and use practices that are likely to lead to producing a robust system.

The Value of an Adaptive, Flexible, Future-Proof Platform for AI

Catie Grasso: How can having an adaptive and flexible platform help organizations operationalize these projects to drive true business impact? What value do you see in Dataiku for this? 

Will Benton: Some of the challenges we've been talking about are making it easier for practitioners to do their best work, making it easier for practitioners to do responsible and repeatable work and share their work, whether that's with development teams or other.

Data scientists and sort of having all of this in a way that an IT department is gonna be willing to deal with it. So I think the huge advantage of the Dataiku platform is that it really does provide data science teams with that flexibility, with that sort of easy management for IT organizations — so you have this combination of flexibility and power. 

As far as benefits for practitioners, some of the things that impress me the most about Dataiku are the project and environment management. So it's sort of easier to say, “I'm working on this project now. I have these libraries, I have this data in one place.” I really like that quick overview of a dataset so we can look and see if there are outliers, if we can understand this dataset that's messy or unlabeled, which is obviously a huge problem in the early stages of any project. 

You May Also Like

The Ultimate Test of ChatGPT

Read More

Maximize GenAI Impact in 2025 With Strategy and Spend Tips

Read More

Maximizing Text Generation Techniques

Read More

Looking Ahead: AI Hurdles IT Leaders Need to Overcome in 2025

Read More