About two years ago, I decided it was time I learned how to use Dataiku. Having been part of Dataiku's marketing content team for a while, I was, of course, familiar with our software at a high level, but at that point my job didn't require me to use it for creating insights so much as simply consuming them. Still, I was curious and eager to learn and, when an opportunity presented itself in the form of a bunch of really messy video marketing data, I jumped at it, and my journey as a Dataiku user began.
As most non-technical business users, my previous experience with manipulating data was pretty much limited to some moderate use of Microsoft Excel. Back when I first started, I wrote about the joys and frustrations of my first steps with Dataiku. This time, as a now more experienced user, I thought I would share (in the form of a blog series) some of the main challenges and a few exciting benefits that I discovered during my transition from spreadsheets to Dataiku — and why the benefits completely outweigh the initial challenges.
Challenge: Why Can't I Write in This Cell?
The first thing I struggled with when I first started using Dataiku (and I'm pretty sure I'm not alone in this) was having to process and get used to the idea that datasets in Dataiku are not spreadsheets and cannot be edited like ones.
Unlike in a spreadsheet where, in order to transform or enrich data, you enter, replace, or compute data points directly on the cells of the spreadsheet itself, data transformations in Dataiku are achieved through recipes, which contain the transformation logic, or commands (also called "steps") that act upon datasets.
I am a little ashamed to admit that for my first 30 minutes or so of using Dataiku, I kept trying to click and type into the cells of the dataset, which probably looked something like that viral video of a toddler trying to "swipe" on the page of a magazine like it's a touch screen. However, what seemed like a difficulty at first eventually turned out to be a much appreciated benefit.
Benefit: A World of Visual Data Prep Possibilities, None of the Data Loss Headache
As strange as it felt at first to have to create recipes, modify their settings, add transformation steps, etc. to prepare my data, it allowed me to do so in much more advanced and efficient ways than I could have imagined doing in a spreadsheet. From your usual sorting, filtering, and computing new columns, to working with complex geospatial data or using regular expressions to extract patterns from text data, nothing seemed impossible once I got the hang of Dataiku's visual data prep interface.
The best part? I never lost a single bit of data or process, because every transformation is logged in the recipe settings, and the pre-transformation dataset remains intact. A key difference between Dataiku and unwieldy spreadsheets is that, when applying transformations to the data, the original (or input) dataset remains available in the workflow. The recipe produces an "output" dataset, which allows you to apply many different transformations without worrying about losing the original data, and you’re able to easily go back to an earlier version.
Gone are the days when I used to freak out over "messing up" my spreadsheet and desperately try to go back to an earlier version, without forgetting how I got to that point at the same time. If you, like me, have reached a point in your professional (or personal) life where you feel like it's time to move beyond spreadsheets, check out the new Excel to Dataiku Quick Start course on the Dataiku Academy. Until next time, when I'll share more intimate insights of my post-spreadsheet life.