Dataiku has always worked to stay at the cutting edge of innovation for modern analytics. This means that it can often attract users from other self-service applications like Alteryx, and I’ve been writing articles to help ease the transition. You can check out the series on moving from Alteryx to Dataiku here.
In this new add-on to the series, I’m excited to do a retrospective on some key features Dataiku has continued to release that not only make the switch from Alteryx easier, but also open brand new ways of working. In particular, I’ll focus on features that tackle more advanced functionality in Alteryx and lead to some nice quality of life improvements. Then, we’ll close with a brand new way of using GenAI in Dataiku.
Making the Switch Even Easier: Multi-Row Formula and Repeating Recipes
Dataiku has recently added two new features that make migrating even easier. First up is Dataiku’s new ability to reference previous rows in calculations in its prepare recipe. This functionality is very similar to what Alteryx accomplishes with its own Multi-Row formula tool.
In the past, a lot of this work could be done in a window recipe but now there’s an entirely new way to do things like generate a unique row ID or grab a previous value if your current row is empty. It’s as simple as creating a formula in your prepare recipe that now lets you point directly at a previous row. If I have a column called Sales and I’d like to reference the value of the previous row, it’s as simple as entering numval(“Sales”,1). And just like other logic in the prepare recipe, you can immediately see your results on your sample data without having to run your pipeline.
Easy creation of a Row ID field!
Next up is the ability to repeat certain logic a number of times to do more advanced manipulations all without code. In Alteryx, this is commonly done by using a Batch or an Iterative macro. Think about trying to easily grab the latest file to import or dynamically run multiple SQL queries and stack together the results. This is now even easier in Dataiku with dynamic recipe repeat.
All it takes is some configuration in the Dataiku recipe itself which lets me define how I’d like things to repeat. Similarly to Alteryx’s ability to feed values into a control parameter, Dataiku provides the ability to point at a “parameters” dataset in the recipe configuration. The world’s now your oyster for no-code looping!
Looping through SQL queries from a parameters dataset
Know Your Data Flow: Quality, Lineage, and Flow Zones
No matter the use case, it’s always important to be able to track and understand your data, especially when things aren’t going as expected. This article in the Alteryx to Dataiku blog series discussed data quality checks and how they can both set off alerts and cause different parts of your flow to run in response. Recently, there have been some great improvements to these checks to give even more flexibility out of the box. In particular, I love the option to compare two key metrics to see if one is greater than the other, even across datasets for easier reconciliations.
In the same vein of understanding the impact of issues in your process, Dataiku has also added a new way to understand data lineage. Imagine seeing an error in your data quality and then instantly being able to trace where that field was created and what source dataset was at fault. This powerful new view is available to investigate from any dataset in Dataiku and even tracks changes across linked projects when sharing results downstream. Now you can automate with confidence!
Tracing the origin of the “revenue” field
Finally, it’s important to have control over how your flow is displayed for easy interpretability. Dataiku does a great job of laying out recipes and datasets as you build but sometimes a little customization is needed. With Dataiku 13.3, you can now click and drag flow zones to get things looking exactly the way you want.
And if you realize you need to add or remove a recipe from your flow, you can now do it without a bunch of manual changes of recipe inputs and outputs. Just right click and go to “Insert recipe after this dataset” to add, or right click, then “Delete and reconnect to remove.”
Can GenAI Help? Generate Recipes
Generative AI and LLMs are everywhere — and for good reason! It’s never been so easy to do things like get answers from your documents. But can GenAI help you speed things up in Dataiku? The answer is yes!
For several releases, Dataiku has allowed for typing natural language in a prepare recipe to generate steps. This feature, AI Prepare, lets you type in something like “Flag rows with a 1 if Sales is over 100 and the Region is United States” and then generates a step for you to review and verify. Now, Dataiku has taken a step further and let you create full standalone recipes from your text with a generate recipe!
Ask for some date help and Dataiku generates a prepare recipe!
This is a huge step forward in helping new users get started and accelerating how quickly experienced builders can work. And, importantly, you don’t have to trust what’s generated right away — just pop open the recipe to see the logic or give it a quick run and check the results.