At Dataiku, we truly believe that businesses will need to be data-driven to survive, and our vision is to enable companies of any size to build and deliver their own data products more efficiently. We often partner with organizations worldwide who share this vision and who can help us bring collaborative data science to more companies.
In this post, one of our partners, Cynozure, talks about their experience working with data teams and how any business leader can improve the chances that data projects produce results.
Cynozure advises businesses on their data strategy, helping them find and deliver innovative solutions to strategic business problems.
Today, every business seems to have analytics and data science on their agenda. But whenever we speak to them, we hear the same story again and again: companies have a great vision for a new data product, and they have a team of data scientists, and yet they still don’t get the results they’re looking for.
This is, in part, because far too many businesses lack the fundamental tools necessary to give their data teams a fighting chance to produce that amazing data project that drives their business forward. While providing tools for other teams across the organization seems obvious (sales, IT, etc.), many companies don’t think to provide their data team with a tool that allows them to do their job efficiently, collaborate on projects, automate processes, and safeguard their work with proper version control.
When we talk about analytics, we normally talk about use cases, technology platform design, data quality and trust - the list goes on. Rarely do we turn our attention to the day-to-day frustrations of analysts and data scientists who are actually building the models that deliver the value. And if there's one group of people who need to be given an environment to succeed, it’s the ones who have the so-called sexiest job of the 21st Century! Put simply, data teams need a few things for everything else to fall into place.
Easy, Understandable Data
The first thing they need is data. It goes without saying that the data they are working with needs to be accurate and trustworthy, so I'm not going to talk about that here. What I want to focus on is how it's structured.
There are two types of tables that analysts and data scientists need. The first type of table, and perhaps the most important, is the big, wide, flattened table. These include all the most useful information about the subject they're serving, all pre-joined together. This lends itself well to all sorts of different analysis cut many different ways and is always popular with analyst teams.
The second type of table are ones with a specific mission. For example, if you wanted to do some specific cross-channel analysis on how web visits translated to store transactions, you're going to be much better served by a deep, narrow table linking the various visits together. These kinds of tables are usually built in response to a specific request by an analysis; but help them publish it, and you'll be their friend forever. By getting our data right, we can get our analysts focusing on analytics and not data prep.
Simple, Functional Data Access
Once the data is in place, the next hurdle to address is that teams absolutely need a simple, functional way to access the data through a number of different languages (SQL, Python, R, etc.). And they shouldn’t have to use two or three different tools to manage all of their work for the different languages.
There are some excellent tools on the market that help individuals on the data team do all of those things in one place and, even better, that allow for the use of notebooks like Jupyter (increasingly popular with analysts teams). One of my favorite tools even provides a ton of code snippets for pandas (the library, not the animals…) making it easy for anyone to get started with Python.
And that's a really important point in and of itself. A lot of analysts are moving from what are often viewed as traditional tools (like SAS) into the open-source world. Some of the new data science platforms out there can make this transition a lot easier and help reduce the cost of ownership for doing analytics at the same time.
Ability to Transform + Automate
These next two common tasks go hand-in-hand, and they are often some of the most poorly implemented processes we see.
First is the ability to transform data. No matter how great the data models you provide are, analysts will always need to be able to do their own transformations. This can be pretty painful when you have to write and manage a bunch of SQL scripts. And too often, IT departments are unwilling to provide access for analysts to the ETL and automation tools they use. It’s really important that analysts can get past the data prep part of their work and onto the value-generating parts as quickly as possible.
Once they've got those transformations sorted, they're going to want to be able to automate and schedule their code. Too often, we see teams that don’t have the ability to do this, and (understandably) it's a huge frustration for them. This is a no-brainer to get implemented, so what's stopping you!?
Place to Share Work and Collaborate
The final thing we need to do for data teams (analysts and data scientists alike) is to give them a centralized place to collaborate together. There are very few things that can’t be improved by bringing together lots of smart people to tackle the same problem, and analytics and data science is no exception.
Despite this fact, I've worked with teams who have thousands of lines of code buried on a shared drive somewhere that they all use, yet nobody really understands. Just like any other team in an organization, data teams need tooling and processes in place to make it clear and easy for people to collaborate together and track changes as projects evolve over time.
Centralizing data teams’ work also reaps rewards when it comes to governance and security. Without a proper place to do their work, employees will start spinning up environments on their laptops, taking data home, and moving it all over the place. It becomes a nightmare to keep track of all of this and protect your data. By putting everything in one place, it's much easier to look after.
What Are You Waiting For?
If you're already managing to do all of this and provide your data team with everything it needs to be effective, you should have some happy data scientists analysts!
If you haven't, then starting to tackle some of the items listed above should help get you set on the right track. Solving these problems will help make your data team more productive, release better models more consistently, and perhaps most importantly, make your staff want to stick around. Let's finally give data scientists and analysts what they need. Let’s unleash the data team!
Go further: Download Dataiku's guide to nurturing a productive data team for more tips, or get started with Dataiku Data Science Studio (DSS).