If you run a website, it’s so simple to use a service like Google Analytics that you probably never consider doing it yourself. But there are advantages to building your own tool to analyze your weblogs, including the ability to keep your data private and to drill down as deep as you’d like.
At the same time, there are some potential downsides to building your own tool, namely the time and complexity in building something that offers the same features as an off-the-shelf product. But maybe it’s not as difficult as you think - as a little demonstration of Dataiku’s data preparation features, we decided to build a simple weblog analytics tool that could give you everything (or at least the most important things) you want with minimal effort required on your part.
Best of all, you can dive right into this project in your browser and see Dataiku in action without downloading a thing.
Stop Wasting Your Time on Mindless Data Prep
One of the worst parts of data preparation is the mindlessness of it. Even if you code, there’s a pattern of trial and error to get the data in the exact shape you want, and it tends to be more annoying than challenging. As much as you know it’s absolutely vital to what you’re working on, it feels like a waste of time.
In this project, we take raw weblog data and put it into Dataiku, and then we build a visual preparation recipe to clean it and prepare it exactly the way we want. If you click on the link, you’ll see that all the preparation steps are listed on the left side of the screen.
Two of these are worth highlighting. The first is a geolocate processor with which we can get all sorts of geographical information, from country all the way down to latitude and longitude points, all from an IP address (just to be clear, the IP addresses in this dataset have been randomized).
The second is the URL-splitting processor, which extracts the path so that we know exactly which parts of our website our users are visiting.
Nobody Should Stop You from Writing Code
If you know how to write code, you probably don’t have a huge affinity for visual data tools since for the most part, you could do the same steps more quickly (and more tailored to your needs) on your own.
We agree. You can see in this project that we use a standalone SQL recipe for some basic aggregation. In this case, it’s because pushing the calculation to SQL is much faster and more efficient, but in many other cases, you might have particular needs where coding would be necessary. In the broader flow, we’re agnostic as to whether you code or use the visual interface.
Dashboards Are Always Better with Maps
If you’re going to build your own Google Analytics, then you’d better have a nice dashboard, preferably with some maps. In our dashboard, we have four different charts, including a map that uses the geocode data we discussed earlier. The charts can use data from any dataset in the workflow.
Just so you know, the latest version of Dataiku has much more powerful dashboards that what is shown here -- take a look at what we can do with our RShiny webapp! And if you like maps, take a look at our earlier post on mapping the streets of New York and Paris.