When Good Data Team Hiring & Open Source Aren’t Enough

Scaling AI Lynn Heidmann

Smart hiring (and retention) of data scientists combined with the power of open source can get companies started in their machine learning (ML) efforts. But the path to Enterprise AI means exponentially scaling efforts (i.e., operationalization) while keeping responsibility and governance top-of-mind, and that’s not something that can be left to just strategic hiring and cutting-edge ML tech alone.

abstract tech image

According to MarketsandMarkets, the overall data science platform market is expected to grow to more than $101 billion by 2021 at a compound annual growth rate (CAGR) of 39%. Clearly, Enterprise AI (as well as tools that help companies get there) are at peak hype. But what exactly do data science, ML, and AI platforms have to offer and what needs do they fill that open source does not?

Read on for more, or go straight to the source with the white paper Why Enterprises Need Data Science, Machine Learning, and AI Platforms.

Why DS, ML, and AI Platforms purple cover

Don’t Do Away With Open Source...

There’s no question that open source technologies in data science and machine learning are state-of-the-art and that organizations have to adopt them to be dynamic and future-minded. In addition to being on the bleeding edge of technological developments, using open source makes it easier to hire and onboard a team.

But that’s not the whole story — it's important to remember that keeping up with the rapid pace of change of open source is difficult for enterprise-sized organizations. What’s more, open source usually means highly technical, so without some sort of packaging or abstraction layers that make the innovations more accessible, it's very difficult to keep everybody in the organization on board and working together. Not to mention that governance can be a huge challenge if everyone is working with open source tools on their local machines without a way to have work centrally accessible and auditable.

...Support Open Source

Data science and ML platforms allow for the scalability, flexibility, and control required to thrive in the era of Enterprise AI.

cartoon of an accuracy check in the officeGovernance quickly becomes problematic when people work alone on data (open source or not) instead of in a collaborative environment.

Beyond pure technology, they provide a framework for the transformation of people and processes, including:

  • Collaboration: A way for additional staff working with data, many of whom will be non-coders, to contribute to data projects along with data scientists (or IT and data engineers).
  • Data governance: Clear workflows and a way for team leaders to monitor those workflows and data projects.
  • Efficiency: Finding small ways to save time throughout the data-to-insights process gets companies to business value faster.
  • Automation: A specific type of efficiency is the growing field of AutoML, which is expanding to automation throughout the data pipeline to alleviate inefficiencies and free up staff time.
  • Operationalization: Efficient means to deploy data projects into production quickly and safely.
  • Self-Service analytics: A system by which non-data professional from different lines of business can access and work with data in a controlled environment.

But Avoid Vendor Lock-In

Understandably, some enterprises are reluctant to commit to data science and ML platforms that might lock them in to a certain system or specific technologies - this is especially the case if they have been burned in the past by cumbersome, expensive systems that hindered the very data efforts they were supposed to accelerate.

The key to mitigating this fear is to choose a data science and ML platform that not only is built to handle all parts of the data-to-insights process (to avoid having to augment abilities with further tools later on), but also to choose one that is completely flexible, open, and innovative when it comes to technology integrations - especially open source.

One smaller detail to look for to ensure that the company doesn’t get locked in to technologies that hinder their overall growth in Enterprise AI is to ensure models can be exported so that should the business change directions later, all work is not lost.

But more broadly, ask questions about not only the ability of the potential platform to be integrated with all current technologies (programming languages, ML model libraries that data scientists like to use, and data storage systems), but about the vision of the company. It should be wide enough such that any new technologies the company may want to invest in the future can be easily integrated with the platform later on due to the vendor’s interest in staying open and cutting-edge.

You May Also Like

A Sneak Peek of Season 5 of the Banana Data Podcast: Humanizing Data Science and AI

Read More

A Day in the Life of a Data Scientist at Pfizer

Read More

Understanding the Power of Data

Read More

Using Data Privacy to Introduce AI Regulation: The Canadian Bet

Read More