Dataiku 6 is here! This release adds a suite of new features - including managed Kubernetes cluster capabilities - to better execute on data projects that are elastic, inclusive, and responsible. Ultimately, Dataiku 6 empowers organizations to build AI systems that are fit for the future.
Heads Up!
This blog post is about an older version of Dataiku. See the release notes for the latest version.
Dataiku 6 Highlights
- Self-service spin-up and auto-scaling of Kubernetes clusters for both Spark and in-memory compute on AWS, Azure, and GCP.
- Dive into coding or debugging Dataiku recipes and notebooks in your favorite IDE with improved external IDE integration (RStudio, VS Code, Sublime Text and PyCharm).
- SQL pipelines for faster and lighter execution of SQL recipes.
- New Plugins store, now visible to everyone, making it easier to install, create, and manage plugins.
- White box machine learning models with improved subpopulation analysis and partial dependence plots.
- Support for time-series preparation (resampling, aggregations, segmenting, etc.).
- Track model performance and degradation over time with model drift monitoring.
- Key usability improvements including global search, beautiful custom charts, improved right panel, and more.
Elasticity
Dataiku 6 delivers a fully managed Kubernetes cluster capability that enables users to easily spin-up and manage Kubernetes clusters (on AWS, Azure or GCP) from inside Dataiku's platform. This means that all users can now quickly spin up Kubernetes clusters for optimized execution of Spark or in-memory jobs while enabling admins to isolate and manage compute power so that every team gets exactly what they need to run their analysis and deploy AI.
Dataiku 6 also enables Snowflake users to experience faster runtimes in Dataiku with the new optimized sync with WASB and native execution of Spark jobs in Snowflake. The new SQL pipelines make it lightning fast to execute long, multi-step SQL data pipelines, allowing for an optimized compute and storage environment when working with SQL data.
White Box ML
Dataiku 6 has added two new visual machine learning capabilities: partial dependence plots and subpopulation analysis, which enable users to deep-dive into key aspects of model behavior that can help teams avoid undesirable model biases. With subpopulation analysis, users can easily weed out these unintended model biases and create a more transparent and fair deployment of AI. Partial dependence plots help model creators understand complex models visually by surfacing the relationship between a feature and the target.
Collaboration and Efficiency
Improved IDE integrations (RStudio, VS Code, SublimeText, PyCharm) enable coders to work in their environment of choice while fueling collaboration on the Dataiku platform. Better visualization is critical for communicating AI deployments to business stakeholders and for data scientists to understand and track the progress of AI projects -- Dataiku 6 makes it seamless to work with external data visualization tools like Qlik and Tableau.
With the collaboration features added in this release, data analysts can easily leverage the revamped plugins store and reuse code created by data engineers and data scientists in their everyday workflows. With features like custom model plugins for Visual ML and custom charts, coders can now create and share beautiful custom charts and custom ML models with clickers.