Ever thought about data science as a food chain? No? Well, we’re here to change that (hint: there are more parallels than you realize). In this blog post, we will highlight key takeaways from my EGG On Air Episode on the topic. We’ll cover why it makes sense to view data science as a food chain, identify who is who in the food chain, and break down which changes in an organization can alter the food chain, often with unintended consequences.
First...Why is This Analogy Useful?
As with any new analogy, you should question why viewing data science as a food chain makes sense and is useful for organizations aiming to scale their data initiatives. Here are three core reasons why the analogy is valuable:
1. There are interwoven systems of mutual dependency.
In order to create permanent value by building durable AI systems, data science must be intimately connected to other disciplines. We know it’s inherently connected to other technical disciplines like engineering and infrastructure but, while these connections determine whether data science is possible in the first place, building them is not sufficient for success.A perfect model makes no impact if it never influences an action or decision, so connections outside of the narrow tech space are what really matter, regardless of an organization’s size or data science maturity. Data science depends on other teams and departments for its raw materials (both data and business needs) and, in turn, produces assets needed by others. Those links embody one of the characteristics of the food chain.
2. Parts of the system have different strengths.
The food chain analogy helps clarify that data science is just the name of the field and doesn’t have an exclusive claim to any set of techniques. After all, data engineers move and transform data and the same goes for data analysts — they just do it differently. What sets data science apart then? Its place in the complex web of connections between related technical and business fields and the way it interacts within that web. Data scientists aren’t the only organisms to breathe, eat, and move but they have a particular survival strategy adapted to their own ecosystem — as well as their own strengths and weaknesses.
3. There’s an intuitive way to increase responsibility.
The food chain analogy can be used as an organizing principle to understand what it means to be responsible in data science. When there is a high-profile case of “mutant algorithms gone wrong”, people often rush to blame a faceless machine and not its human designer. Isn’t that like blaming particular gases for climate change and not going one step further to ask where those gases come from? No organism in the food chain is inherently good or evil, it’s judged by its effect on others (pests in one place can be a national treasure in another).
Likewise, responsibility in data science can’t come down to which techniques are good or bad, rather it’s the duty of individuals and teams to consider the wider implications of what they build. It would be naive to think that algorithms that detect bias in other algorithms can solve underlying problems we have, just as introducing one animal to an ecosystem to control another never goes quite to plan. The food chain analogy sheds some light on this complex issue of responsibility and can make it easier to explain to stakeholders.
Key Characters in the Data Science Food Chain
1. Unicorn
Back when the data scientist role was considered the sexiest job of the 21st century, countless articles were written about this organism’s superpowers. But because the unicorn is mythical, it exists outside of the food chain. These organisms have infinite time and they always win — they depend on nobody else so they can never be automated away. They don’t really exist, so hopefully your food chain doesn’t depend on them.
2. Apex Predator
This organism hunts others but is hunted by no other. In data science, the apex predators are quite rare outside of research institutions and fancy tech companies because they take a long time to mature — they develop libraries we rely on, discover new techniques, and make existing algorithms more efficient. The role doesn’t require a lot of talking, as this organism can sniff out inefficient code or unsolved problems quite easily. Their role in the food chain is to limit explosive growth in the organisms below them.
3. Plover Bird
These organisms accept the risk they might get eaten, which is a fair representation of how data science exists in many organizations. They are cohabiters or collaborators with no special status, but they can thrive if they help many different larger animals which, in data science terms, means adapting to solve many business problems. Making these adaptations is about understanding the needs of different potential customers and proving the value of what you do so they don’t eat you. This role is nowhere near as glamorous as the unicorn or apex predator.
4. Artisan
The artisan represents a common popular vision of a data scientist — sitting in the dark corners of a random forest, handcrafting complex algorithms that may never see the light of day. They have chosen not to talk much, as others would never be able to understand the intricacies of building the perfect nest. They lay maybe one egg a year (read: deliver one immensely complex project) but, if there are just a few of these organisms, they can live happily in their own world. However, if too many congregate, the apex predators will swoop in to automate them away. Unlike plover birds, the artisans have no allies to protect them. A classic data science artisan could be the inventor, owner, and single maintainer of a critical forecasting model, written in a language only the artisan uses that everyone else is afraid to touch. This grants the artisan safety as long as this “nest” stays intact.
5. Dung Beetle
These organisms represent what most people are doing with data most of the day: recycling waste products of other organisms into something more useful. With the right conditions, they can help fertilize “insight” trees and lush business intelligence. They represent all the other people doing the hard data work in an organization that sit outside the data team, sometimes 20 times more numerous, and often producing the most influential data!
Apply This Analogy to Protect Your Data Science Food Chain
This analogy can be used to understand how data science and its ecosystem exist and vary across different contexts. Use the below learnings to protect the data science food chain at your organization:
- Recognize the different roles different people can play: Different skill sets lend themselves to different roles in the food chain, so what matters is finding the right balance for the needs of your organization. You may think “dung beetles” are nothing special, but how would your job change if they became extinct?
- Don’t get too comfortable — be ready to adapt to environmental changes: Pressure on the data science food chain is constantly shifting as organizations change priorities and mature (or don’t) in their use of data. Well-functioning teams of “plover birds” can find themselves under threat if business stakeholders find other ways to solve those problems. Whether you are a data scientist or their team leader, you need to be sensitive to these changes and have a strategy in place for adaptation.
- Responsibility is respecting your environment and anticipating your impact: What you do as a data scientist can impact the environment (read: the organization) both positively and negatively. With that power to change the delicate balance between organisms in the food chain, comes a duty to act intentionally to avoid unintended consequences.
Remember, no organism in the food chain is inherently good or evil. It all comes down to the interplay and how well you respect and match the needs of the other organisms.