Dataiku offers powerful data prep features to help you transform raw data into actionable insights, from seamless data ingestion to advanced feature engineering. So, when it comes to cool and underrated data preparation features in Dataiku that we think people should know about, we couldn’t stop at just one list.
In this blog, we’ve rounded up even more data preparation hidden gems for you to explore. Let’s dive into how these lesser-known but powerful capabilities can elevate your data projects, this time focusing on how Dataiku helps you validate data.
Previews: Explore Data From the Flow
Have you ever pulled up a project in Dataiku and wanted a quick way to view a dataset? You might not know that, as of this year, you can do it easily. With the preview option, you can view a sample of the output and quickly see what’s going on without ever leaving the flow.
Find preview in the bottom right hand corner of your flow view to see the first 50 rows of data.
Pair this capability with new features like data lineage, which lets you trace each column in a dataset back to its root source so you can go even further. Data lineage helps you easily track where your data comes from and understand the transformations applied along the way.
Compare Row Values: Identify Differences at a Glance
Going a level deeper, working with detailed datasets — like those with LLM-generated outputs — often requires a clear understanding of how the data differs. The compare column values feature in Dataiku provides a side-by-side comparison of rows in a column, highlighting the differences between corresponding entries. This tool proves especially helpful for verifying data integrity or investigating discrepancies, allowing you to pinpoint exact areas that require further analysis.
In the dataset view, you can now compare row cells side by side.
AI Explain: Quickly Summarize Flows
Flows can get complicated, especially when looking at a project that you’re not familiar with. Sometimes, it can be a little daunting to figure out exactly what’s going on in a project. With AI Explain in Dataiku (one of several new AI assistants you should check out), you can get a quick summary of what’s going on in the flow with the help of Generative AI.
Create project descriptions with ease in a few simple clicks.
Statistics Recipe: Unleash the Power of EDA
Data preparation goes beyond cleaning and transforming — exploratory data analysis (EDA) is key to unlocking deeper insights. You may not know that Dataiku offers built-in statistical testing, making EDA faster and easier than ever. Access these tests as a recipe within the flow, and with just a few clicks, generate detailed analyses and visualizations like histograms and box plots.
Find the Generate statistics recipe in the visual recipe tab.
With dozens of tests to choose from, you can easily explore your data to identify potential issues or outliers prior to diving into model creation.
Leverage these hidden gems in Dataiku to streamline your data workflows, uncover valuable insights, and make more informed decisions. Embrace the power of previews, column comparisons, AI Explain, and the statistics recipe to elevate your data projects and stay ahead of the curve.