Three Paths to Updating Your Data Technology

Scaling AI Romain

Data science departments often use older technologies that were in place when they launched. But the new data scientist generation is using newer technologies such as R, Python, etc. How can you solve the challenge of old versus new technology?

Information Solutions Are Getting Old

Data science departments frequently implement older technologies for statistical analysis (such as SAS and SPSS). These solutions were in place when established data science departments initially launched — the learning curve for these older technologies, particularly given their age and complexity, is significant. Most new graduates, however, are branded as data scientists, but their skill sets are rooted in newer technologies like R, Python, Spark, Pig, Apache Hive, etc.

The end result is two sets of data scientists both representing different generations of statistical analysis methodologies. The challenge of old versus new technology has exacerbated in recent years due to the growth of the data science industry coupled with the need to hire new talent.

How to Design Your Data Science Technology

From a human resources standpoint, there are essentially three paths available, each with their own respective pros and cons:

Abandoning Old Technologies and Switching to New Technologies

In this situation, the data science department changes its approach to development by abandoning older technologies (e.g., SAS, SPSS) in favor of newer options. This enables data science departments to hire new data scientists that can onboard quickly and become productive with little downtime.

Conversely, changing the core architecture of a data science lab has its own ramifications to both existing employees and the development process as a whole. By catering to newer technology, existing employees face the challenge of updating their skill sets.

simpson
No Doubt

Keeping Old Technologies and Training New Hires

With this option, the opposite approach is implemented: older technologies are retained and new hires are trained to use them. As mentioned, these older options are both complex and robust. They've simply been around longer and, consequently, require an advanced skillset in order to gain proficiency. The immediate benefit of this approach is that, unlike switching to a new approach, there is no disruption to the existing team's productivity. The downside revolves around the new hire learning curve and the possibility of becoming an antiquated data science laboratory over time.

Keeping Old Technologies and Pursuing New Ones (Hybrid Approach)

A third approach is a combination of the above options: keeping the old and using the new technologies in parallel. In this scenario, established employees are given the freedom to continue development using older technologies while new employees are allowed to develop using the new technologies. In other words, nothing is sacrificed, and both paths are pursued at the same time.

Define Your Hybrid Technology Approach

Obviously a data science department needs to hire people in order to grow, so this challenge cannot be avoided. At the beginning of a data science department’s evolution, they have to prove the value of their existence — once they have delivered on their first project, the demand (and need to hire new employees) will increase. The solution is to implement a tool that enables all parties, regardless of skill level and expertise, to work together. In a competitive market, a data science department can only survive if it can reliably deliver results. This means using a tool that is workflow-centric while supporting meaningful collaboration between all employees.

You May Also Like

From Vision to Value: Visual GenAI in Dataiku

Read More

Understanding the Why and How of the LLM Mesh Architecture

Read More

The Ultimate Test of ChatGPT

Read More