We thought data scientists were unicorns, but it appears that citizen data scientists are much harder to define and find.
When we look at the world of advanced analytics and data science, it is easy — almost too easy — to assign people who do the work into two buckets: data scientists and citizen data scientists. The former have Ph.D.s and experience and work with huge datasets — or have at least gone to bootcamp — but the latter, it turns out, are much harder to find and label.
To illustrate, I recently did a search on LinkedIn and queried for a list of citizen data scientists. The result was a resounding zero! Under jobs, I could find “senior”, “principal”, and “ML” data scientists, but no “citizens.” It appears that companies want to develop citizen data scientists and enable them and provide products for them, but no one wants to call them that. No one I could find on LinkedIn identifies themself as a citizen data scientist.
In our effort to find one of these seemingly mythical figures, we were recently in a meeting with several hundred smart people and analytics professionals from a leading global pharmaceutical company, and my friend Kelci Miclaus asked all of the attendees to raise their hands if they were citizen data scientists. Out of the 200+ people in attendance, not a single citizen data scientist!
So, what is it that makes citizen data scientists (we’ll call them CDS’s for the sake of space) so hard to find? We think we know one when we see one, and many of us want some of them, but it seems that no one IS one of them.
Digging Deeper, What Would AI Say?
So, of course I went to ChatGPT for assistance. One of my takeaways from comparing the definitions of data scientists and CDS’s is that CDS’s are proficient at working with data, solving problems, delivering business insights, and being extremely valuable. However, they are also defined by their lack of programming skills in R and Python and their use of simpler data analysis tools (i.e., spreadsheets or visual analytics tools instead of more advanced statistical models or machine learning algorithms). According to ChatGPT, they also often work with smaller datasets than their more formally trained counterparts.
When asked for a direct comparison of the roles, ChatGPT said, “A citizen data scientist, on the other hand, is a non-expert who uses data analysis tools and techniques to gain insights from data.” That seems woefully understated and short-sighted; while CDS’s may lack formal data science curriculum, they typically possess experience, perspective, knowledge of “how things work” and an ability to solve problems with data.
While some individuals may not use the specific term "citizen data scientist" to describe themselves, they may still consider themselves to be data analysts or data-driven problem solvers who use data to make informed decisions. While they may lack formal data science curriculum, they are far from “non-expert” as they possess experience, business-centric knowledge, problem-solving skills, and a general knowledge of “how things work.”
Data-Driven Problem Solvers
The notion of “data-driven problem solvers” is an especially powerful concept. Rather than thinking of these folks as analysts with limited skill sets, let’s recognize them as who they really are. They are often graduates from top business schools, physics Ph.D.s who have turned their immense problem-solving skills to solving business and engineering challenges, and business professionals with 10-20 years of experience plus the insight and perspective to drive tens of millions of dollars in business value through their ability to interpret data and derive business insights. They may or may not have coding skills, they may or may not have formal data science training; it is their ability to solve difficult business challenges and create value that defines them.
Dataiku has moved away from the data scientist/citizen data scientist segmentation, to now focusing on data experts and domain experts. I think this is good, but still leaves room for greater segmentation of the non-data scientist population.
With powerful tools and access to data, these users become empowered business analysts. They are less focused on building a specific model (or using a particular AI algorithm or tool) than they are about solving a specific problem where their depth of knowledge enables higher value realization. They aren’t replacing data scientists, but are rather teaming with them to build collaborative powerhouses that solve problems and create efficiency.
In the hands of users like this, a flexible toolkit that enables data management, AND advanced analytics, AND transparency, AND experimentation AND the ability to collaborate with others becomes a knowledge platform and not just another AI platform. It is with users like this that a company like GE Aviation can enable “physics-based engineering” and deliver millions of dollars in value. It is in the hands of users like this where a leading New York City bank can deliver over 5,000 projects into production in just a couple of years.
It is how a quality engineer at NXP used virtual metrology to detect manufacturing flaws and save the company millions in material and engineering costs. It is how a large healthcare payer was able to effectively manage their network of providers to save costs and improve patient outcomes. One retailer was able to arm its analyst team to deploy new demand forecasting models within two weeks, and another improved 70% of their SKU-level demand models by enabling their analysts to experiment with and leverage new data sources. And regarding the perception that these users are confined to simpler tools and smaller data volumes, Standard Chartered Bank has great examples about how their empowered business analysts are processing billions of rows of data at a time and leveraging state-of-the-art compute environments.
Business people may not aspire to be citizen data scientists, and may not ever identify as CDS’s. But someone with those positive skills and traits is capable of delivering massive value and business transformation when recognized as an empowered business analyst and equipped with the tools they need to experiment, create knowledge, and deliver value.