In the fast-moving world of AI, it’s easy to be dazzled by the raw power of large language models (LLMs). Whether it’s GPT-4o, Claude 3.5 Haiku, or any of the latest cutting-edge models, the language model itself — the engine, if you will — is often the star of the show. But as co-founder and chief data scientist at Trust Insights Christopher S. Penn and Dataiku head of AI strategy Kurt Muehmel recently discussed, an engine alone won’t get you very far.
To actually reach your destination — to achieve meaningful outcomes with AI — you need the rest of the car. Did you miss the first article in this series where Kurt and Christopher talked about what it looks like to think differently about AI? Be sure to check it out here.
.jpg?width=800&height=533&name=ChatGPT%20Image%20May%202%2c%202025%2c%2007_47_43%20PM%20(1).jpg)
ChatGPT-generated image to highlight the “rest of the car” analogy
The Engine Gets All the Hype — but It's Just the Start
When most people think about adopting AI, especially generative AI, they focus first (and sometimes only) on the model. As Kurt pointed out, many first-time enterprise users understand the model’s capabilities because of their experiences with consumer tools. They know how to have a chat, ask questions, or get content generated.
But the model is just the engine. It’s the shiny, powerful piece that makes a lot of noise — but without a chassis (the load-bearing framework), wheels, steering, and fuel systems, it can’t go anywhere.
What Is the Rest of the Car?
Christopher breaks it down using his past experience with the Dataiku platform:
The same applies to LLMs and generative AI projects today. Here’s what the full car looks like:
1. Data Sources and Collection
Where is your data coming from? Whether it’s internal databases, CRM systems, the open web, or other proprietary data lakes, you need a reliable, structured pipeline to gather the right data. With Dataiku, teams can connect to dozens of on-premises and cloud data sources — like Amazon S3, Azure Blob Storage, Databricks Lakehouse, Google Cloud Storage, Snowflake, and much more — to centralize access to data of any size or format.
2. Data Cleaning and Preparation
Raw data is rarely ready for use. It’s messy, incomplete, and inconsistent. Data must be cleaned, formatted, and often enriched before it’s ready for a language model to consume. With Dataiku, teams can connect, cleanse, and prepare data for analytics, ML, agentic, and generative AI — and do it fast.
3. Data Pipeline and Orchestration
How does the data flow to the model? Do you need real-time streaming or batch processing? Are there rules for how different data sets interact? This orchestration is critical for efficient, scalable AI operations. With the Dataiku Flow — a visual representation of your data pipelines — teams can view and analyze data, join and transform data, build predictive models, work with GenAI, and more.
4. Prompt Engineering and Context Structuring
Models don’t just magically “know” what you want. Inputs often need to be structured carefully, sometimes using techniques like retrieval-augmented generation (RAG), where models can search for relevant facts dynamically before answering. With Dataiku Answers, teams can connect to approved LLMs and vector stores, apply RAG techniques from trusted knowledge bases and datasets, and have a ready-to-use chat application — no code or front-end development resources required.
5. Fact-Checking and Mathematical Reasoning
As Christopher pointed out, transformers — the architecture behind most language models — can’t do deterministic math:
No matter how smart a model gets, it will still not be able to do math in the same way that you would do math deterministically with Python code.
-Christopher S. Penn
If you need precise computations or fact validation, those capabilities need to be built around the model. This is often where predictive machine learning models come into play — math-based intelligence designed to produce reliable, deterministic results. While language models are great for generating and interpreting language, predictive models excel at quantitative tasks like forecasting, scoring, and optimization, complementing the LLM’s capabilities.
6. Output Handling and Action
What happens to the model’s output? Does it populate a report? Trigger a workflow? Update a dashboard? The final leg of the journey requires clear output handling that ties the AI’s output to business outcomes. With Dataiku LLM Guard Services, teams can control costs, maintain quality, and reduce operational risks — all from one place — keeping IT happy and ensuring guardrails for AI at scale.
Further, with Dataiku, teams can build AI agents powered by their larger data ecosystem for maximum business impact. Agents can use tools like dataset lookup, web search, and email notifications to take action and deliver results — fast. Dataiku's agentic AI capabilities can help teams automate tasks, streamline processes, or create custom solutions.
A Real-World Example: Counting Halloween Candy
To illustrate how all these pieces come together, Christopher shared a fun (and surprisingly complex) project. He and his team analyzed 7,000 Halloween articles to determine the most commonly mentioned candy. The language model did some heavy lifting — but it was only one component. They had to:
- Scrape the web for articles.
- Clean the content (removing ads, navigation clutter, etc.).
- Store the data in a database.
- Pass articles into the language model one by one.
- Save the model’s output back to the database.
- Aggregate the results and visualize them in a chart.
The outcome? Snickers emerged as the most popular candy.
But without the rest of the car — the scraper, cleaner, database, connector, and visualization scripts — this fun insight would have been impossible. While a more consumer example, you can imagine that this could easily be applied in an enterprise context.
From Consumer Use to Enterprise AI
Most consumers interact with AI like renting a car for a short trip. They hop into ChatGPT or another app, take a spin, and get out. The car’s infrastructure — the data sources, cleaning, and orchestration — is hidden. In the enterprise world, though, companies need to own the whole car. They have to design, build, and maintain the entire vehicle because their data, processes, and outcomes are unique.
As Christopher put it:
For business use, for enterprise use, you have to think about what is the infrastructure. Where do my databases live? How do you clean it on the way in? How do you prepare into a prompt structure? How do you do retrieval augmented generation? And then where does the output go?
At the end of the day, the model is just a tool. The true value comes from how well an organization can connect the engine to the rest of the car and chart a clear course to their destination — whether that’s improving efficiency, gaining insights, personalizing customer experiences, or developing entirely new products.
So, the next time someone marvels at the latest, greatest AI model, remember: It’s not about the engine alone. To actually get somewhere, you need the whole car.
 
                                                 
                                                 
                                                 
                                                