Supply Chain Transparency With Dataiku's Deforestation Tracker

Use Cases & Projects, Dataiku Product Valentine Reltien

Thanks in part to the Convention on Biological Diversity (CBD) — an international treaty adopted in 1992 aimed at conserving biodiversity — biodiversity is finally having its spotlight moment. Why you may ask? It’s not, disappointingly, that polar bears’ are crushing it on TikTok, but rather as a result of ever more granular/insightful data-driven climate models demonstrating the colossal cost of biodiversity’s collapse to businesses and the economy

This blog post will dive into how Dataiku can catalyze the mitigation of biodiversity loss by detecting commodity supply chain’s exposure to deforestation. We’ll start by explaining exactly what regulations ask of businesses today in terms of due diligence before walking through an example of how Dataiku can streamline part of this exercise. We’ll then review the ways in which Dataiku has been, and can be, used to more effectively tackle biodiversity loss and accelerate regulatory readiness.   

Why Your Business’ Biodiversity Impact Matters 

First off: what even is biodiversity? The National Geographic Encyclopedia defines it as the “variety of living species on Earth, including plants, animals, bacteria, and fungi. It can be used more specifically to refer to all of the species in one region or ecosystem”. In turn, ecosystem services are the benefits people obtain from natural ecosystems. They range from the natural occurrence of resources for us to extract/harvest (provisioning) to beneficially stable environmental conditions enabled by an ecosystem’ healthy self-regulation (i.e. water cycle, nitrogen cycle etc…).

While we do not pay for such ecosystem services, they are the foundation of our social and economic lives. This is starting to hit home today as the cost of damage from extreme weather events’ infrastructural damage adds up. In the US alone, extreme weather events cost $765 billion between 2017 and 2021. The Intergovernmental Panel on Biodiversity and Ecosystem Services (IPBES) determined that over half of the world’s GDP is moderately or highly reliant on these — about $44 trillion (PwC). 

What’s the Matter With Deforestation?

Deforestation is a longstanding issue that’s been a poster child for ecosystem destruction. The reason for this is how salient and sticky an issue this is across commodity-containing consumer goods’ supply chains. After being left to the discretion of commodity supply chains and global private/public coalitions to tackle deforestation (i.e. Action for Sustainable Derivatives, Palm Oil Collaboration Group, Soy Coalition, etc…), the EU raised the bar significantly in December 2022 by passing two key legislations. The first prohibits the import or export of deforestation-exposed commodity-containing products (e.g., palm oil, cattle, coffee, cocoa) on the market. The second is a standard that identified fourteen sectors — ranging from oil and gas to food and beverage — that are required to disclose their transition plans for meeting sustainability targets. 

A first step in complying to both these requirements is supply chain transparency — that is, a granular, traceable, and reliable understanding of where one’s business operations presently damage ecosystems. In other words: where is one’s supply chain or business exposed to deforestation? 

This question is at the heart of an illustrative use case demonstrating Dataiku’s capacity to streamline supply chain transparency. Despite many attempts, data for supply chain transparency often seems biased insofar as it only partially covers the matter. This could be a result of methodological choices, original data and/or these analyses being siloed from business processes. To tackle this, we went with the assumption that using sources from various origins and methodologies would contribute to reducing any blindspots in the analysis. 

To capture a thorough and integrated view of a procurement product portfolio we chose a publicly available product portfolio from Carrefour already containing the French Retailers’ Ecoscores (providing life cycle analysis like score), an ESG Controversy score (i.e. from an ESG rating provider), and key commodity deforestation exposure from Trase, a partnership between Global Canopy and the Stockholm Environment Institute.

Moving Away From Deforestation With Dataiku 

Disclaimer: we’re now getting into technical weeds for those of you seeking to grasp the nitty-gritty “how” of the project. Feel free to skip to the video if you’d rather a simplified, visual experience.How, then, can Dataiku accelerate your procurement team’s effort to become deforestation-free? I’ll answer this question in two steps:

  1. A review the data flow’s logic (the “physics”)
  2. An explanation of Dataiku’s technical features enabling the data flow (the “chemistry”)

commodity deforestation tracker

1. The Data Flow

 Helping procurement teams identify their commodity suppliers at risk of deforestation in order to prioritize their engagement (or replacement) to comply with stakeholder expectation and legal requirements explained above. But how do we go from three separate datasets to an integrated dashboard? 

We start with three entry datasets: 

  1. A retailer’s own-brand product portfolio which crucially includes a product ID, an ingredient list, data about suppliers of each ingredient (including commodities), and their origin, as well as a product’s Ecoscore.
  2. A dataset about commodity exporter’s deforestation exposure - from Trase. Which crucially contains data about exporters, exporter group, different countries of origins, regions of production, total traded quantity (in volume and finance) and corresponding deforestation exposure.
  3. (Optionally) A dataset with ESG controversy scores of companies that coincide with the former’s exporters: to integrate alternative information from exporters’ officially published data on their operations’ sustainability. 

After preparing these datasets to enable their interoperability, the product portfolio (disaggregated to a one row per one commodity per product ID level) is joined with the deforestation exposure dataset. This is done thanks to a commodity-exporter-origin key generated in the preparation phase. What results is an enriched dataset of a disaggregated portfolio with each commodity having an associated level of deforestation exposure.

Further on, this enriched dataset is further expanded by joining it with the ESG controversy score dataset. We are thus left with a supplier-level dataset, wherein the original product portfolio is disaggregated to the commodity level which is most conducive to due diligence compliance. This data is visualized in a series of graphs that populate the resulting dashboard to accelerate supplier monitoring and engagement by both procurement and compliance teams.

Finally, this three-fold dataset is reaggregated at the product level to return to its original portfolio format. In so doing, both commodity deforestation exposure and exporters’ controversies are summed respectively, their deforested proportions are averaged and all exporters, origins (including region and biome), controversy types, and their respective scores are concatenated in a way that keeps records of their content explicit (in the form of lists of strings, integers and digits). 

This ensures full data traceability and transparency to cover procurement teams’ need for high level information to prioritize engagement; and compliance teams’ need for granular detail for disclosure purposes. Both formats are also accessible on the dashboard which separates the supplier and product level on different frames to facilitate usability and direct access depending on its above-mentioned user.

2. Key Platform Features

The above workflow is streamlined by Dataiku’s platform’s features, notably showcasing three of Dataiku’s fortes: 

  1. Ease, speed, and flexibility of alternative data aggregation, preparation
  2. Extract business insights from data and share results across business functions
  3. Traceability of insight process for audit trail and reporting readiness

While it problem-solves for supply chain transparency, you will notice the transposable nature of these processes as bricks underlying much advanced analytics on alternative data. The demo follows the following steps. Watch the video for an interactive illustration of the below mentioned features: 

Ease, speed, and flexibility of alternative data aggregation & preparation

i. Data pulling
ii. Data Cleaning with the Prepare recipe
iii. Data Parsing with the Filter recipe

Extract business insights from data and share results across business functions

i. Data Wrangling with the Groupby Recipe
ii. Data visualization with the sort recipe, charts & dashboarding

Addressing Other Biodiversity Challenges

Beyond tracking commodity’s deforestation exposure, Dataiku can help on projects with various other biodiversity stakes. Check out these other use cases : 

With ever more granular and insightful data-driven evidence of biodiversity's collapse’s colossal cost to the economy, automating due diligence and reporting is becoming a strategic-must for both compliance and business purposes. Dataiku’s platform features can provide ease, speed, and traceability in alternative data aggregation, preparation, and collaboration across multiple functions. This makes it easy to tackle supply chain transparency and to mitigate exposure to biodiversity destroying activities, like deforestation. 

You May Also Like

AI Isn't Taking Over, It's Augmenting Decision-Making

Read More

Maximize GenAI Impact in 2025 With Strategy and Spend Tips

Read More

Taming LLM Outputs: Your Guide to Structured Text Generation

Read More

Looking Ahead: AI Hurdles IT Leaders Need to Overcome in 2025

Read More