If you’ve spent much time working with Excel, you know that things can get really frustrating when you are (or Excel thinks you are) working with dates. In its efforts to be helpful, it often causes huge headaches, and worse still, it doesn’t give you a clear way to see what it changed and how to change it back. Many, if not most, data projects involve a time dimension, and so as Excel-using data crunchers, we’re particularly sensitive to this frustration.
Let’s look at a couple examples where users complain that Excel comes up short with dates and timestamps.
Auto-Formatting Is Not Very Smart
A frequent complaint about Excel is that it auto-formats entries or important data incorrectly, but then it doesn’t provide a way to reverse the auto-formatting.
Quick example: let’s say you want to create some column headers with ranges, like 1-5, 6-10, 11-15, and 16-20. Well, first of all, Excel will automatically reformat the first three columns as dates - in our case, “01-May,” “06-Oct,” and “15-Nov.” Ah, but it’s just a formatting issue, right? Manually reformat the cells as “General,” and suddenly you have three brand new numbers: 42009, 42165, and 42323.
Uh… what?? How did that happen?
You see, Excel uses a date system where Day 1 is January 1, 1900. That means that January 1, 2000 -- one hundred years later -- is represented as 36526. Hours, minutes, and seconds are represented as fractions of days.
Now, that’s not really unique - all computer programs use some variation of this methodology. Where Excel goes wrong is that in its haste to figure out what you were trying to say, it applies the date without giving you an option or a way back.
A Single-Format Program for a Multi-Format World
Another area of frustration is that Excel can really only understand one date format at a time, and so if you’re working with data from, say, the United States and the United Kingdom, you’re going to have a real mess on your hands.
In Windows, Excel is actually tied to how you have your date configured in your operating system, meaning you have to change your Windows settings just to change your Excel settings. So if you have a UK computer and you upload a file with the dates 06/10/2016 and 06/30/2016, Excel will interpret the first one to mean October 6, 2016… and it won’t recognize the second one as a date at all.
It’s not an impossible problem to solve, but you wind up needing to write a bunch of formulas just to get your data formatted correctly, and then you always have that risk of missing something.
An Easier Way: Dataiku
In Dataiku, parsing dates is as easy as two clicks. Furthermore, Dataiku will intelligently interpret the format of dates, but it will always give you the final say on which format you want to use.