💌 Love and Code: Dataiku's Top 5 Features for Data Scientists From 2024

Dataiku Product, Scaling AI, Featured Nanette George

It’s almost Valentine’s Day and, in the spirit of love, I’d like to share this article for data practitioners. Across my career in journalism and technology, I’ve met many data experts who encouraged me and helped educate me about data science and what it looks like when it is done well. They’ve shown me how great data science lays the critical and solid foundation for building great AI applications.

Today, I’m at Dataiku, where I have the good fortune of working with data scientists and other technical experts, educating the marketplace about Dataiku’s capabilities that empower data practitioners to do their best work. This is my love letter to you, the people who work with data every day, about what we built at Dataiku over the last year to support your efforts.

2024 Was a Big Year for Data and AI at Dataiku

In 2024, powerful partnerships were forged in AI and analytics, and Dataiku's latest features played matchmaker. From seamless cloud integrations to sophisticated model monitoring, these new capabilities were designed to spark joy in the hearts of data scientists, data engineers, AI engineers, and developers alike.

Whether you're crafting custom code or orchestrating enterprise-wide deployments, these five features were among the most notable over the last year. They demonstrate Dataiku's commitment to building lasting relationships between teams, tools, and technologies. Let's explore these love stories that transformed how organizations work with data.

💞 The Perfect Match: Enhanced Integrations With Databricks 

For those who love their custom code, this integration preserves your freedom to code while adding enterprise-grade orchestration. Deploy your Python and SQL code seamlessly to Databricks, and unified monitoring provides programmatic access to deployment metrics and health checks. It's infrastructure-as-code friendly, making it a perfect match for smart DevOps practices. Dataiku even earned the distinction of being named the 2024 Databricks Innovation Partner of the Year!

The image is a screenshot of a flow diagram in Dataiku. The flow is a visual representation of how data, recipes, and models work together to move data through an analytical pipeline. In this screenshot, the user has connected various data sources, recipes, and models — all of which are represented by icons. Data sources, recipes, and models that include Databricks are highlighted in the diagram.

In Dataiku, you can seamlessly deploy Python and SQL code to Databricks.

💡Learn More About the Dataiku Partner Ecosystem

In Dataiku AutoML, you can incorporate diverse data modalities into a model. Under the hood, this feature leverages state-of-the-art embedding models through the Dataiku LLM Mesh API, giving programmatic control over feature extraction. For developers who prefer fine-grained control, you can customize the embedding process while maintaining the governance benefits of the LLM Mesh architecture.

🤘 Love Speaks All Languages: Multimodal AutoML

In Dataiku AutoML, you can incorporate diverse data modalities into a model. Under the hood, this feature leverages state-of-the-art embedding models through the Dataiku LLM Mesh API, giving programmatic control over feature extraction. For developers who prefer fine-grained control, you can customize the embedding process while maintaining the governance benefits of the LLM Mesh architecture.

The image is a screenshot of a user’s view in Dataiku, where the user is working on modeling gas on sensors. Highlighted in the screenshot are the variable type, where the user has selected “image” and these fields: image handling, where the user has selected “image embedding;” image location, where the user has selected “thermal camera;” and pre-trained model, where the user has selected “Hugging Face local models — EfficientNet B4 ImageNet- 1k (connection: hf_co).”

In Dataiku, you can enable text or image features and then choose an embedding model from the LLM Mesh.

💡Discover the Art of Multimodal AutoML

💌 Love Notes in the Margins: Free-Text Annotation

For those who believe relationships involve more nuance than just "it's complicated," free-text annotation brings unrestricted expression to data labeling. This feature liberates data scientists and developers from the constraints of predefined labels, allowing annotators to add detailed, contextual notes that capture the full complexity of their data. 

For developers, it integrates seamlessly with existing labeling workflows, providing programmatic access to annotations and a robust review system that ensures quality control. Whether you're training models on SOAP notes or fine-tuning language models, it's the difference between passing notes in class and writing love letters — every nuance matters.

The image is a screenshot of a user’s notes to validate data annotations. On the left side, raw data is shown — in this case, the transcription of a patient’s conversation with a doctor. On the right side of the image is another dialog box, where the user is manually typing additional notes for the model to consider in its analysis.

In Dataiku, free-text annotations can be added on the right-hand side of the labeling interface. Users can save their notes to prepare for review. In this example, a healthcare practitioner uses free-text labeling to annotate SOAP notes. The reviewer adds an assessment note with a potential secondary diagnosis to ensure the model captures crucial medical details. This aligns the dataset better with clinical expectations and can improve the model’s accuracy in future iterations.

💡Learn More About Custom Labeling and Quality Control With Free-Text Annotation

💓 Speaking Your Love Language: LLM Fine-Tuning

When your use case demands models that truly understand your domain, whether that's processing industry-specific documents or generating specialized content, fine-tuning is essential. Access the full power of Hugging Face's transformers library, write custom training loops, and integrate with open-source fine-tuning techniques to adapt pre-trained models for your specific tasks and domain requirements — all while maintaining enterprise governance. It's like having a high-level API and low-level control when you need it. While this capability offers a no-code approach, it doesn't sacrifice developer flexibility.

user’s settings in the “Fine-tuning preparation” dialog box. The user has selected OpenAI’s GPT 3.5 Turbo model, set the prompt column to “input,” set the competition column to “output,” and set the system message mode to “no system message.” In the hyperparameters dialog box, the “auto” setting is selected. In a highlighted area of that dialog box, this text appears: “OpenAI will automatically select hyperparameters for you. Depending on the performance of your model, you might want to update them using the advanced mode following the official documentation.”

In Dataiku’s recipe interface, the default setting for the hyperparameters field is “Auto” but can be turned to "Explicit" for manual adjustments.

💡Explore How You Can Fine-Tune Your GenAI Models in Dataiku

🫶 Tracing Love's Journey: Column-Level Data Lineage

Beyond visual tracking, this feature provides programmatic access to lineage information. Perfect for teams who want to automate impact analysis, generate documentation, or integrate lineage data into their existing tools. The manual editing capability means your custom transformations won't break the chain.

image1-Feb-07-2025-09-03-05-7080-PM

You can conduct root-cause analysis and impact analysis with data lineage between columns and across all projects in Dataiku.

💡Learn More About Data Lineage With Dataiku

Wrapping It Up

These features represent more than just technical improvements. They embody Dataiku's understanding of what makes data scientists and developers tick. By bridging the gap between high-level automation and low-level control, these capabilities create harmonious workflows that respect both efficiency and customization needs.

As we move further into 2025, these features lay the groundwork for even deeper integrations and more sophisticated capabilities. Whether you're a Python purist or an AutoML enthusiast, Dataiku's latest additions prove that with the right tools, you can have both power and simplicity. Now that's a relationship worth celebrating.

You May Also Like

AI & Human Connection: Empowering Businesses, Elevating People

Read More

The End of Static Presentations: How We Share Insights Is Changing

Read More

Upskilling in Data and AI Made Simple With The Dataiku Academy

Read More

GenAI Alone Won’t Give You an Edge in 2025 — But These Trends Will

Read More