Transforming Data Analysis and Workflows With GenAI and Dataiku

Dataiku Product, Scaling AI Renata Halim

At Dataiku, we're revolutionizing how data experts and domain experts handle data with the transformative power of GenAI. This blog recaps highlights from our product keynote at Everyday AI Chicago earlier this year. Explore the key takeaways and expert advice shared during this insightful session, and discover how Dataiku's GenAI capabilities can transform complex data into actionable insights and drive peak efficiency for teams.

→ Find an Everyday AI Summit at a City Near You

GenAI in Dataiku and GenAI on Dataiku

In the opening minutes of his presentation, Jed Dougherty, VP of Platform Strategy at Dataiku, shared his views on leveraging GenAI within Dataiku. He described two key approaches:

GenAI in Dataiku involves embedded applications within our platform that allow users to leverage LLMs to analyze and derive insights from datasets. Key features include:

  • Prompt Studio: Apply tailored prompts to datasets to generate specific outputs.
  • NLP Recipes: Update NLP pipelines with no-code text recipes powered by pre-trained HuggingFace models and LLMs for tasks like summarization and sentiment analysis.
  • Retrieval Augmented Generation (RAG): Enhance LLMs with your knowledge base using RAG and semantic search for the most relevant, accurate, and trustworthy chatbot information.

GenAI on Dataiku acts as a co-pilot, enhancing user interactions with the platform. It augments users' ability to work within Dataiku through features such as:

  • AI Prepare for typing plain English commands to perform tasks like parsing dates and merging columns, with steps generated automatically.
  • AI Explain for explaining or documenting workflows or code. 
  • AI Code Assistant for faster development and support while writing and testing code.

These integrations make advanced data manipulation accessible and efficient, democratizing high-level tasks for a broader audience. Together, GenAI in Dataiku enhances effectiveness and speed, while GenAI on Dataiku simplifies tasks and user interactions, creating a powerful synergy for better problem-solving and workflow efficiency.

Bridging the 2 Strategies With Agents

According to Dougherty, the concepts of GenAI in Dataiku and GenAI on Dataiku converge in the form of agents. An AI agent is both an augmentation of analysts’ work and an application of GenAI to the problem they're trying to solve, wrapped up in one package. This approach simplifies tedious tasks without replacing analysts, data scientists, or engineers. By automating routine tasks, agents preserve the creative aspects of their jobs and improve workflow efficiency. 

The promise of GenAI is not to take away the most inventive, intuitive, interesting parts of your job, but to take away the grunge and to always be able to provide a framework automatically built for the problems you're trying to solve.

-Jed Dougherty, VP, Platform Strategy, Dataiku

What does an analytics ai agent look like?

Transforming Analytics With 3 Core Concepts

Accelerating GenAI Use Cases

The LLM Mesh — a common backbone for Generative AI applications — promises to reshape how analytics and IT teams securely access Generative AI models and services. This innovation allows organizations to efficiently build enterprise-grade applications while addressing concerns related to cost management, compliance, and technological dependencies. It also enables choice and flexibility among the growing number of models and providers.

With the LLM Mesh, you can seamlessly transition from legacy systems like Hadoop to modern environments such as Snowflake and Databricks, ensuring flexibility and ease of use. Choose from a variety of LLM models via a simple dropdown menu, catering to diverse analytical needs. 

This flexibility allows for the rapid deployment of chatbots and other GenAI applications using visual recipes. Users can ingest large document sets, apply LLMs, and develop web applications without writing code. For example, our Dataiku Answers leverages this capability to democratize enterprise-ready LLM chat and RAG across business processes and teams. As Jed notes, "Chatbots are just a small fraction of the cool things you can do with GenAI and LLMs... You can do that entire process at Dataiku very rapidly without even writing any code if you don't want to."

At Dataiku, our legacy value proposition has always been about sitting on top of complicated infrastructure and making it accessible and leverageable to the masses. Whether the technology was Hadoop, Spark, or now GenAI, this remains our core benefit, empowering organizations to harness the power of the latest technologies without the complexity.

LLM Cost Guard and Governance

To provide organizations with better financial oversight, we implemented the LLM Cost Guard, which offers a transparent view of GenAI-related expenditures. The dashboard tracks all costs and provides in-depth usage monitoring, identifying which teams use specific components. This includes tracking expenses for data services like Snowflake, ensuring comprehensive financial management.

We also enhanced Dataiku Govern with advanced LLM guardrails. Dataiku Govern now features a GenAI portfolio one-stop-shop, giving executives a centralized view of all GenAI projects, tracking their status, deployment, and level of risk. Additionally, an LLM registry within Dataiku contains details about all approved LLMs, their suitability for various tasks, and ensures usage is aligned to project qualification. To comply with the upcoming EU AI Act, we have released a readiness solution to help organizations identify and address components of their workflows impacted by the regulations, ensuring compliance and providing actionable recommendations.

Enhancing Efficiency With XOps

Our XOps framework extends beyond GenAI, offering comprehensive insights into all machine learning model deployments and pipeline operations. This holistic approach ensures precise resource management and operational efficiency, which are crucial for maintaining effective data-driven initiatives. Additionally, the XOps framework offers centralized monitoring and deployment flexibility across platforms like SageMaker, Azure, GCP, Databricks, and Snowflake. Dataiku not only streamlines production workflows but also provides a unified monitoring dashboard, enhancing the robustness and reliability of data-driven operations.

As an example, Dougherty described one major EU bank that had previously decided Amazon Sagemaker would be the platform they used for all model deployments. However, this bank also wanted to incorporate Dataiku into their ecosystem because of the ease of model prototyping and additional benefits it delivered to the team. Because Dataiku can not only work with external models developed in other tools but also deploy models to other platforms such as Sagemaker, they were able to leverage Dataiku’s unified monitoring capabilities to oversee all their models — even those built and deployed anywhere. This flexibility makes Dataiku a universal tool for overseeing all models in production and solidifying its role as a robust, production-ready solution within the broader ecosystem.

Advanced Metrics and Enhanced Quality Checks

Dataiku empowers users with advanced data quality management tools. Our platform enables data engineers and analysts to track, verify, and address data quality issues, ensuring insights are built on a solid foundation. With visual dashboards, data quality rules, and exploratory data analysis, users can effectively manage data quality across projects, enhancing the reliability and accuracy of their analytics. Additionally, our improved metrics and checks provide fine-grained yet easy control over data flowing in and out of any Dataiku pipeline, ensuring comprehensive data quality monitoring throughout the workflow.

In the era of GenAI, while the technology is transformative, the essential components of an analytics workflow — quality data, thorough preparation, and robust engineering — remain crucial. Dataiku supports the entire workflow, seamlessly integrating LLMs where necessary while maintaining the integrity of the complete pipeline.

You May Also Like

10 Key Insights Every Executive Should Know About GenAI

Read More

Data Lineage: The Key to Impact and Root Cause Analysis

Read More

Celebrating Data & AI Innovation: Dataiku Frontrunner Awards 2024

Read More

4 Strategies That Set AI Pioneers Apart

Read More