Data Science at Scale: Make or Buy, In-House or Outsource?

organization| business| collaboration | | Caroline Martre

In the world of big data, there is no shortage of open source and commercial tools available; at the same time, there is, in some ways, a shortage of human capital - many companies struggle to hire or retain data teams.

Caroline Martre is an experienced business development specialist and subject matter expert who has worked with both large and small customers facing these exact predicaments for the past two years. She has written a series of blog posts on big data challenges she sees in businesses she works with, and the following is a combination of her insights from two of those posts: one on hiring in-house or outsourcing, and the second on making or buying a solution for data science at scale.

It’s no surprise that the age old questions of make or buy and hire in-house or outsource have extended to the burgeoning world of (big) data:

Big-Data-Landscape-2016-v18-FINAL 2.png

Learn from the Best

Before diving in and looking at the pros and cons of each approach, we’ll look at the best of the best - how have large-scale data innovators managed to stay ahead? Tech giants like Google, Facebook, Amazon, Uber, and Tesla (to name a few) have a strong data science presence in-house and have aligned their organizations (people, tools, and processes) around data to empower everyone at the company. This way, they constantly create value within the organization aligned around massive data sets.

As a specific example, Kevin Novak, Head of Data Science Platform at Uber, talks about the role platforms can have to help improve team efficiency by solving basic issues that can create a lot of frustration for data teams and represent a lot of missed opportunities to generate insights. Some issues essential to resolve no matter your approach or the staff used to execute include:

  • “Plumbing” issues; connecting all the technologies and solutions from visualization to data management and machine learning.
  • Too much time cleaning data.
  • Production issues; reinventing the wheel deploying models on a daily basis.
  • Duplicated work and lack of best practices sharing.

"i got it!"

So, how do you establish the organizational support necessary to do data science at scale and stay in the game?

Data Science Solution: To Make or To Buy

When it comes to finding a data science solution that can address some of the issues above and also deliver efficient and effective data projects, custom development (either involving your own functional and IT teams or outsourced) will allow you to build a solution that fits your exact requirements and will give flexibility for future evolution.

However, custom development will be more expensive, time- and resource-consuming than off-the-shelf solutions. This means it generally will not allow you to reach business-impacting results as quickly. In contrast, buying a solution will be faster and less expensive, but potentially less flexible (depending on your specific needs), though with the number of solutions on the market today, it’s very likely you will find one that meets all your specifications.

Both options (make or buy) can make sense at some point in your company’s development. And yet, despite all the interest, effort, and investment in both approaches, many organizations are still struggling to figure out how to create value from the data they generate over the long term.

Data Teams: In-House or Outsource

While a platform (whether custom built or purchased) can facilitate good data science, it will not replace leadership and the human side of data science: a staff that can build a knowledge base and consistently deliver results from data projects.

So is it best to develop a data team in-house either with hired resources or those promoted from within, or is outsourcing to an external data team the way to go?

"We often find data scientists are not part of a larger team. They're sort of sole practitioners," observes Martin Fleming, VP, Chief Analytics Officer, and Chief Economist at IBM. "They either don't have the level of support they need, or they don't have the functionality that's necessary, so they struggle with effectiveness and career development."

i feel so aloneThe solitude of data scientists... and the even more solitude of their data team manager

To solve this issue, a lot of companies tend to exclusively hire external data science teams to develop specific projects. But how many of these projects are truly driving the company’s business to operate better or are improving a product’s performance on a recurring basis? How many remain one-off?

On top of this issue, when outsourced teams leave, a lot of knowledge that could have been transferred usually doesn't get transferred. And at some point, if needs are increasing, continuing to outsource can become very expensive. For all these reasons, hiring internal skills for your data science initiatives likely makes sense.

But companies need to be careful, as hiring underused specialists in-house can get as expensive as outsourcing. The most involved resources (whether internal and/or outsourced) can have difficulty delivering impactful and positive results when the organization is afraid of change and unclear about its goals.

As Martin Fleming from IBM highlighted above, data scientists can struggle with effectiveness and career development. For this reason, companies (and especially leadership) need to understand their goals, their own ability to address them, and put in place the level of organizational support (human resources, tools and processes) required to effectively meet their goals and prevent frustration among resources.

As an alternative to entirely in-house, using a mix strategy can be a great option to put a good foundation in place: begin first with outsourced partners before or while hiring external candidates or even promoting inside the current talent pool. And as the number of projects grows, companies will progressively increase their internal capabilities with internal and external hiring. This will give these companies two advantages: ensure knowledge transfer and make sure projects will get deployed, monitored and will create value in the long term.

The Bottom Line

Ultimately, using only human capital without tools or vice versa will hinder your business’s ability to develop an effective data science strategy. But choosing the right combination and right solution for each, combined with effective leadership, will allow data teams and data science projects to thrive and scale.

Try Dataiku

Other Content You May Like