It’s almost Valentine’s Day and, in the spirit of love, I’d like to share this article for data practitioners. Across my career in journalism and technology, I’ve met many data experts who encouraged me and helped educate me about data science and what it looks like when it is done well. They’ve shown me how great data science lays the critical and solid foundation for building great AI applications.
Today, I’m at Dataiku, where I have the good fortune of working with data scientists and other technical experts, educating the marketplace about Dataiku’s capabilities that empower data practitioners to do their best work. This is my love letter to you, the people who work with data every day, about what we built at Dataiku over the last year to support your efforts.
2024 Was a Big Year for Data and AI at Dataiku
In 2024, powerful partnerships were forged in AI and analytics, and Dataiku's latest features played matchmaker. From seamless cloud integrations to sophisticated model monitoring, these new capabilities were designed to spark joy in the hearts of data scientists, data engineers, AI engineers, and developers alike.
Whether you're crafting custom code or orchestrating enterprise-wide deployments, these five features were among the most notable over the last year. They demonstrate Dataiku's commitment to building lasting relationships between teams, tools, and technologies. Let's explore these love stories that transformed how organizations work with data.
💞 The Perfect Match: Enhanced Integrations With Databricks
For those who love their custom code, this integration preserves your freedom to code while adding enterprise-grade orchestration. Deploy your Python and SQL code seamlessly to Databricks, and unified monitoring provides programmatic access to deployment metrics and health checks. It's infrastructure-as-code friendly, making it a perfect match for smart DevOps practices. Dataiku even earned the distinction of being named the 2024 Databricks Innovation Partner of the Year!
In Dataiku, you can seamlessly deploy Python and SQL code to Databricks.
💡Learn More About the Dataiku Partner Ecosystem
In Dataiku AutoML, you can incorporate diverse data modalities into a model. Under the hood, this feature leverages state-of-the-art embedding models through the Dataiku LLM Mesh API, giving programmatic control over feature extraction. For developers who prefer fine-grained control, you can customize the embedding process while maintaining the governance benefits of the LLM Mesh architecture.
🤘 Love Speaks All Languages: Multimodal AutoML
In Dataiku AutoML, you can incorporate diverse data modalities into a model. Under the hood, this feature leverages state-of-the-art embedding models through the Dataiku LLM Mesh API, giving programmatic control over feature extraction. For developers who prefer fine-grained control, you can customize the embedding process while maintaining the governance benefits of the LLM Mesh architecture.
In Dataiku, you can enable text or image features and then choose an embedding model from the LLM Mesh.
💡Discover the Art of Multimodal AutoML
💌 Love Notes in the Margins: Free-Text Annotation
For those who believe relationships involve more nuance than just "it's complicated," free-text annotation brings unrestricted expression to data labeling. This feature liberates data scientists and developers from the constraints of predefined labels, allowing annotators to add detailed, contextual notes that capture the full complexity of their data.
For developers, it integrates seamlessly with existing labeling workflows, providing programmatic access to annotations and a robust review system that ensures quality control. Whether you're training models on SOAP notes or fine-tuning language models, it's the difference between passing notes in class and writing love letters — every nuance matters.
In Dataiku, free-text annotations can be added on the right-hand side of the labeling interface. Users can save their notes to prepare for review. In this example, a healthcare practitioner uses free-text labeling to annotate SOAP notes. The reviewer adds an assessment note with a potential secondary diagnosis to ensure the model captures crucial medical details. This aligns the dataset better with clinical expectations and can improve the model’s accuracy in future iterations.
💡Learn More About Custom Labeling and Quality Control With Free-Text Annotation
💓 Speaking Your Love Language: LLM Fine-Tuning
When your use case demands models that truly understand your domain, whether that's processing industry-specific documents or generating specialized content, fine-tuning is essential. Access the full power of Hugging Face's transformers library, write custom training loops, and integrate with open-source fine-tuning techniques to adapt pre-trained models for your specific tasks and domain requirements — all while maintaining enterprise governance. It's like having a high-level API and low-level control when you need it. While this capability offers a no-code approach, it doesn't sacrifice developer flexibility.
In Dataiku’s recipe interface, the default setting for the hyperparameters field is “Auto” but can be turned to "Explicit" for manual adjustments.
💡Explore How You Can Fine-Tune Your GenAI Models in Dataiku
🫶 Tracing Love's Journey: Column-Level Data Lineage
Beyond visual tracking, this feature provides programmatic access to lineage information. Perfect for teams who want to automate impact analysis, generate documentation, or integrate lineage data into their existing tools. The manual editing capability means your custom transformations won't break the chain.
You can conduct root-cause analysis and impact analysis with data lineage between columns and across all projects in Dataiku.
💡Learn More About Data Lineage With Dataiku
Wrapping It Up
These features represent more than just technical improvements. They embody Dataiku's understanding of what makes data scientists and developers tick. By bridging the gap between high-level automation and low-level control, these capabilities create harmonious workflows that respect both efficiency and customization needs.
As we move further into 2025, these features lay the groundwork for even deeper integrations and more sophisticated capabilities. Whether you're a Python purist or an AutoML enthusiast, Dataiku's latest additions prove that with the right tools, you can have both power and simplicity. Now that's a relationship worth celebrating.