When building or growing data science teams, companies often face a noisy world. As I was trying to identify the group dynamic in terms of pains and challenges, I came across an article from John P. Kelly that gives an interesting perspective on establishing a successful data science capability in an organization.
I thought it would be interesting to connect with the article’s author to get further insight on how companies could address their analytics needs. In the first part of the interview, John Kelly answered questions relating to data science team members, how they relate to each other in an organization, and what major frustrations they face when using data science tools.
John P. Kelly is the Managing Director of Berkeley Research Group (BRG), a predictive analytics practice that leverages econometrics and data science to help drive actionable data-driven growth strategies and products. BRG empowers its clientele by applying data science to key strategy decisions being made in marketing, sales, and operations. Some examples include dynamic pricing optimization, loyalty program design, site location analysis, predicting consumer behavior, and reducing churn.
CM: Let’s start by talking about the different roles in a data science team. What are the expected qualifications of data science team members?
JK: The title of data scientist is tricky as the term has been diluted. Companies are sometimes paying data scientists who are only able to validate data relationships, but not necessarily to find true data relationships. I would say they are three different types of team members:
- Data scientists are most likely computer scientists applying machine-learning methodology and statistical models while directing the creation of algorithms.
- Data analysts have a broader proximity with the business units. They report on, track, explain, and even visualize business metrics. They may have backgrounds in economics, engineering, or statistics.
- Data infrastructure engineers, also called database administrators, ensure that data is encoded properly and is of the right quality and validity.
CM: How do data science teams relate to other teams within an organization? Also, what are your thoughts on data science team deployment… centralized? Dispersed? Permanently embedded in individual business units? Could you share the advantages and risks of your recommended deployment option?
JK: Overall, I would recommend the centralized approach. The reason is that you have to build a data-forward culture, and you will build it more rapidly with a close collaboration of folks working with the data. Make sure to put data scientists on the same P&L responsibility so that they have the same goals.
If you start dispersing them to various business units, each unit will have their own dedicated resources and you can start building silos. A business unit manager will assume that his/her data scientist is only accountable for the business unit analytics projects and, consequently, won’t share his/her resources. A centralized approach allows you to aggregate a large amount of talent in a pool and allocate them to business units depending on a project’s priority. With this approach, you have visibility of your different analytics projects.
CM: How do you define the position and role of Chief Data Officer (CDO)?
JK: This job title may come and go as companies become more data centric. It signifies that the company is currently building a data-driven approach and that data has a strong seat at the executive table. Companies think that if they have a CDO, everything related to data should be his/her responsibility and other departments don’t need to get involved. As companies get more mature with data-centered information, it may become a problem. The reality is that data needs to be invested in all departments — the Chief Marketing Officer, Chief Operating Officer, etc., should all be comfortable working with data.
CM: Do you have any advice on how to find the best data science tools?
JK: At Berkeley Research Group, even if we do not sell tools, we can make recommendations. As Data Scientists are highly connected on social media and are truly invested in their work, they are going to hear about great tools from their network. So one way to hear about data science tools is to create a Twitter account, follow the 100 most impactful data scientists, and regularly read your Twitter feed; you will see a number of data science tools promoted. There are also organic courses and technical training here in the United States, like General Assembly, where you can hear people sharing information about how they experimented with a product.
CM: What are the major frustrations for data science teams when using these tools?
JK: It seems there is no substitute to cleaning the data, integrating data streams, and validating them. There are tools that claim they can do it but, in fact, they are only helping on an order of 25% of the time spent. They are not a replacement, and data scientists are frustrated by that; they don’t find that particular work especially rewarding. Companies need to continuously re-emphasize how important the data preparation stage is.
Another frustration is that a lot of tools pretend they can do everything: they can cook and clean and even raise the kids. Obviously they can’t do that and, consequently, they poison the well for other tools, and data-driven solutions to problems in general. New tools are not the standard, and other team members typically do not have the tool. When there is no critical mass for a product, it is not always interesting to use it.
CM: Do you have any advice for companies who would like to leverage a data science talent management model? How do you keep and empower data science talent?
JK: I don’t know any companies who claim to manage talent differently for data scientists, but I do know what attracts data scientists:
- Highly impactful challenges
- The opportunity to do something differently
- Trust and empowerment: if some data scientists come out with a fantastic conclusion and are only meet with blank stare, you can bet that the good ones will be gone before you make real progress
This topic is really important because there is such a shortage of data scientists in today’s world. They can easily come and choose their location and their platform. In the United States, there are three major concentrations:
- In the New York area you can find them among stock funds and hedge funds analyzing patterns in the stock market
- In Silicon Valley
- In major consultancies,such as BCG, IBM, McKinsey, as well as firms like our own; data scientists enjoy this arena if they like the service model (i.e., tackling different problems) and want to get a chance to work directly with the clients.
Did you enjoy this article? I will be back soon with some additional insight from John Kelly on the most common challenges in terms of organization and why big data investment has not yet impacted companies at scale.