Alteryx to Dataiku: Working With Datasets

Dataiku Product Chris Helmus

Welcome to part two of this Alteryx to Dataiku series! If you missed part one, my goal is to help Alteryx users understand Dataiku and its take on modern analytics. I’ve spent quite a bit of time helping customers get the most out of both platforms and I’m excited to show you how Dataiku is the best of both worlds for current Alteryx users: familiar, but also different in all the right ways. 

My last article was a high-level look at how Dataiku’s visual flow compares to Alteryx’s workflow. Today, I’ll dive deeper into how both technologies treat the most fundamental analytical ingredient: data. In particular, I’ll highlight how Dataiku’s approach to working with data:

  • Avoids data movement
  • Emphasizes collaboration and reuse to remove silos
  • Makes data exploration a breeze

What's It Like in Alteryx: Inputs, Outputs, and Anchors

In most cases, Alteryx generally thinks of data as inputs and outputs. An Input Data tool lets you point at files and databases and bring the data locally to Alteryx. As you connect new tools, anchors help verify what’s been done, and Browse or Chart tools are ready when you need a deeper dive. Fundamentally, all of this information is temporary (even when caching) until actually using an Output Data tool to save results to a new location.

If your data is too big to move, you also have the option of a subset of In-Database tools. These provide another path to orchestrating data, as long as you know they’re needed upfront and can resort to a little SQL in some cases.

At the end of the day, this approach absolutely works, but let’s break down how Dataiku tackles things differently with datasets.

How Dataiku Does It Differently: All About Datasets

The easiest way to show Dataiku’s approach is to go step by step.  We’ll start with connecting to data, move on to exploring and transforming it, and end with getting the answers to where they matter.

Inputs: Find, Understand, Analyze

How Dataiku connects to data looks a little different than what you see in Alteryx. It all starts with Connections. Users can always access the Connections they’re supposed to across files, databases, and more. These connections are shareable, always up to date, and can even take each user’s own personal credentials. Everyone sees their view of the world, it’s easy to use with no local setup, and IT is happy.

Depending on how you connect with Alteryx, your experience with data connections could be similar to Dataiku or slightly different. But in Dataiku, once you successfully point at some data there’s definitely a shift. Instead of an Input Data tool with an anchor, you have a big blue square, a Dataset!

new dataset in Dataiku

Choose from Connections based on your permissions

That Dataset acts as a window into wherever your data is stored. It isn’t moving the whole file or table to Dataiku. Instead, you’re able to see a Preview of your data right at the bottom of your screen. For a deeper look, a double click brings you to the Explore pane. This area, along with others in the top right, lets you do everything you need to understand your data — like a Browse tool and anchor in one.

Like a Browse, you can Analyze each column for data quality and statistical info, filter, sort and more right in this view. But if you’d like to get a look at more of your data, you have fine-grained control over the records in your Sample. For visual exploration, the Charts pane in the top right let you flip to a drag and drop charting experience with 25 different types. The Statistics pane can even automatically suggest ways to serve up deeper insights and correlations.  Best of all, any of these visuals can be published to Dashboards and easily shared.

Analyze columns for quality and outliers

Analyze columns for quality and outliers

25 Chart types including Pivots

25 chart types including pivots

Where things get more interesting is once you want to collaborate. Maybe you’d like to get a colleague’s help validating some numbers — Dataiku has comments attached directly to the Dataset itself. You can even add extra detail on columns’ metadata, write in some tags to help with future searching, or share a Dataset to another Dataiku Project for another teammate to pick up.

In the Middle: New and Improved

So what happens after you use a Dataiku Recipe to transform your data? The main difference here is that Alteryx tools commonly have output anchors. In Dataiku, we have intermediate Datasets. Instead of a temporary validation anchor, Dataiku creates a more permanent save point. 

Intermediate datasets act as save points

Intermediate Datasets act as save points

I prefer this way of working as it’s a huge help when iteratively designing new processes. In Dataiku, you can always run any part of your flow at any time without worrying about caching — the caching is baked in! It also means that all of the benefits around exploration and sharing I mentioned come into play at every stage of your process.

Running just the highlighted part of the flow

Running just the highlighted part of the flow

Of course, once a project is finished and automated, Dataiku doesn’t force you to keep all of those intermediate save points. It’s easy to drop them after a job runs or even run multiple steps all as one big pipeline (similar to In-Database options with Alteryx, but with a much expanded set of operations that can be done without SQL).

Outputs: Keep It Flexible

For Dataiku, outputs can really be any Dataset! Just like with Alteryx, I can choose to save my results anywhere I have permission to write, regardless of where my inputs are. It’s also possible to export directly from a Dataset if you just need that quick flat file extract.

What’s more exciting is what you can do with that output Dataset once it’s created.  I already mentioned adding comments, tags, and other context but you can also seamlessly publish to one of Dataiku’s Data Collections. Think of these as areas for golden data that your team or another would reuse frequently. In fact, when I’m working in Dataiku, it’s common for me to start in the Data Collection area before I start searching through the catalog or tables and files directly.  

Data Collections help find trusted data

Data Collections help find trusted data

Just like you’d hope, once I find a Dataset it’s a quick click to use it directly as a read-only for my own use case. Finding the right data isn’t always easy, but it’s easier with Dataiku!

Making the Switch

Just like Alteryx, Dataiku acts as a single access point to all the data you’ll ever need, coupled with the ability to store your results just about anywhere. But once you try Dataiku, I know you’ll come to love how easy it is to find trusted data, explore it and share it with your colleagues.

You May Also Like

5 New Dataiku Features to Streamline Your RAG Pipelines

Read More

Dataiku Is a Gartner Peer Insights Customers’ Choice

Read More

Keep Track of All Your Models (Including LLMs) With Dataiku

Read More

AI Isn't Just for the Super Technical

Read More