Remote Data Science: How to Make it Work

Scaling AI Nancy Koleva

Remote work is the ultimate litmus test of the data organization’s robustness. Many inefficiencies may go unnoticed when working in the office that ultimately lead to significant loss of time or of project relevance, but these issues can at least be partially mitigated by informal discussions and water cooler interactions.

The Challenges of Remote Data Science

A data team set up to be efficient remotely opens new opportunities for productivity; however, it also brings challenges in and of itself, namely:

  • Access to systems: Connection to underlying data systems may prove challenging in a work-from- home environment. Whether accessing the various data sources or the computational capabilities, doing it in a remote setting is often challenging

  • Collaboration within teams: Data projects are rarely one-person jobs, and the easiest way to get teams working together is to get them to sit together. Without this physical proximity, individuals are often siloed in their execution of projects

  • Collaboration across teams: Data projects are not only about data, but also require strong involvement from business teams to build experience, generate buy-in, and validate relevance. They also require data engineering and other teams to help with the operationalization steps. While water- cooler discussions play a pivotal part in the (often limited) success of cross-team collaboration, a full work-from-home setup may prove highly disruptive to execution.

  • Reuse over time: Capitalizing on past projects is essential to maintain productivity. However, capitalization within large code repositories often manifests itself informally, and the lack of off-the- cuff discussions may significantly limit the ability to reuse past work.

man working from home on a laptop

The Benefits of Remote-Ready Data Science

One of the most important trends in the past few years, and certainly in the tumultuous world of 2020, is enabling remote work. It makes business sense for three main reasons:

  1. It opens up doors to data talent that is not based where the company is based. According to the 2019 State of Remote Work Survey, 99% of respondents said they would like to work remotely, at least some of the time, for the rest of their career. Not offering the possibility of remote work for data professionals eliminates a lot of talent. Of course, being able to tap into that talent gives organizations a leg up not only over their competitors in their industry, but over other companies who are also competing for the same data talent.
  2. Most data talent at global organizations is distributed anyway. That is, for a slew of reasons ranging from cultural to regulatory, data teams at companies are hardly ever all sitting in the same place - meaning in the same building, much less the same country. Enabling remote work not only helps those who are always fully remote, but in parallel, it infuses best practices that benefit distributed teams and unite data practices across the entire organization.
  3. It allows companies to be more agile, ensuring that if any unforeseen circumstances arise, they can adapt without interruption to data science and machine learning projects (and thus AI strategy or progress overall). This is especially important as businesses become more mature in their AI journey and data projects underpin increasingly essential activities for the company, making any interruptions to those projects potentially devastating. For example, if a retailer has a machine learning-based pricing model in production, it needs to be constantly evaluated, and breaks in this process - especially during turbulent times - can result in lots of lost business.

All of this sounds great, but practically, how can organizations ensure that everyone - from analyst to data leader or manager, data engineers, scientists, and everyone in between - can work together seamlessly from near and far?

How to Actually Make it Work... Remotely

Small teams can potentially sustain themselves to a certain point by working on AI projects in an ad-hoc fashion, meaning team members store their work locally and not centrally and don’t have any reproducible processes or workflows, figuring things out along the way. But with more than just a few team members and more than one project, or with anyone - or, given current events, everyone - remote, this becomes unruly quickly.

The right data science, machine learning, and AI platforms such as Dataiku enable remote work at their core by allowing people across the organization to access all data and work together on projects in a central location, facilitating good data governance practices combined with widespread vertical (e.g., data scientist to data scientist) as well as horizontal (e.g., analyst to data scientist or business user) collaboration.