While folks that work on information technology (IT) and data science teams likely have very different job functions, one commonality is that both teams need to have an understanding of the greater business in order to effectively collaborate with key stakeholders, define success, and continuously improve. Why, though, don’t these teams always understand each other?
To help break down some of the core differences between IT and data science (and shed light on why they may not always see eye to eye) it’s important to keep in mind their unique challenges:
- IT teams deal with issues ranging from inefficient processes and time spent on tedious tasks that could be automated to data overload (and no unifying tool that can seamlessly combine siloed data) and organization-wide challenges related to security and compliance.
- Data teams frequently struggle with easily sharing best practices across team members and between teams (which therefore leads to a lack of collaboration), quickly and safely putting machine learning models into production, making changes to models already in production, having to navigate a myriad of tools and platforms for various stages of the model life cycle, and a lack of workflow reusability, amongst others.
Operationalization: The Root of All Misunderstandings
Successfully building a data science or machine learning project and then operationalizing it is not an easy task — it becomes twice as hard when teams are isolated and playing by their own rules (which is unfortunately often the case with IT and data science teams).
More often than not, there is a disconnect between the worlds of development and production. Some teams may choose to re-code everything in an entirely different language while others may make changes to core elements, such as testing procedures, backup plans, and programming languages. Operationalizing analytics products could become complicated as different opinions and methods vie for supremacy, resulting in projects that needlessly drag on for months beyond promised deadlines.
Strategies for Repairing the Data Science/IT Rift
In order to work together effectively, data science and IT teams need to get on the same page with the following strategies at a minimum.
- Consistent packaging and release: In the process of operationalization, there are multiple workflows: some internal flows correspond to production while some external or referential flows relate to specific environments. Moreover, data science projects are composed not only of code, but also data. That’s why to support the reliable transport of code and data from one environment to the next, they need to be packaged together.
- Rollback strategy: Teams must agree on a rollback strategy to return to a previous model version after the latest version has been deployed. A successful rollback strategy must include all aspects of the data project, including transformation code, data, software dependencies, and data schemas.
- Robust data flow: Preparing for the worst is the reality of working in the real world; in terms of data science and machine learning projects, this means having a robust failover strategy. Formulating a failover strategy in a data science workflow presents some unique challenges, mostly due to the sheer volume of the data involved. It’s not feasible to take a “rebuild” approach, as there is just too much information to do this efficiently.
By taking the time to gather a mutual understanding of each other’s key functions, IT and data science teams will be able to collaborate more effectively, reduce bottlenecks, and become more aware of the big picture, which will ultimately allow them to help each other complete tasks outside of their comfort zone.