Cracking CSRD Readiness With Dataiku

Scaling AI, Featured Valentine Reltien

Following on our previous blog’s introduction to the complexities of the Corporate Sustainability Reporting Directive (CSRD), we will now turn to the practical implications of getting “CSRD ready”.

Setting up a framework for the discussion that follows, in general there are two paths organizations can take:

  1. Dive headfirst into the use of a reporting tool. 
  2. Build the approach from scratch either independently or with a partner. This second path leans on data scientists and a platform such as Dataiku

Whichever path is taken, there is a shared foundational requirement: data preparedness. 

This blog post will share how the Dataiku platform can help analytics, sustainability compliance and IT teams better collaborate with lines of business to access, prepare, and aggregate data necessary for CSRD readiness. 

How — and Why — Is Dataiku Positioning Itself in the Reporting Space? 

To Be Useful, Data Needs to Be Prepared & Complete 

Whether organizations are just starting their CSRD compliance journey or are well on their way, data preparedness is foundational to ensure that expected outputs are of optimal quality and reliability. Fundamentally, the CSRD exercise hinges on the transparent, explainable achievement of data aggregation at scale to output expected data points. This is not straightforward: In practice, CSRD compliance often involves disparate data strewn across organizational silos, where stakeholders have limited experience collaborating and different levels of data skill sets

 What is more, data is typically incomplete and unstandardised. It is often the case that stakeholders rely on a collection of external and internal spreadsheets with different data structures, maintenance schedules, and ownership, which introduces additional challenges to meeting goals around foundational data preparedness. 

The Challenge of Going Straight to SaaS Reporting Tools 

Sustainability teams often turn to Software-as-a-Service (SaaS) reporting platforms to help them with CSRD compliance. These are designed to provide valuable prepackaged resources to support reporting journeys such as ready-made reporting blueprints, emission factor libraries, and benchmarks such rates of decarbonization by subsector. However, unless input data is already cleaned, standardized and centralized to be ready-to-use, their efficacy will be limited. Hence we understand Dataiku as complementary to reporting tools by streamlining and facilitating data readiness upstream of their use.

CSRD Is a Deliberative and Collaborative Effort 

Furthermore, plugging data directly into a reporting SaaS platform without preparation might also side-step an opportunity underlying the CSRD’s ambition. Namely, making lines of business responsible for their operations’ sustainability data in such a way that it is considered on par with their conventional data and performance indicators, instead of relegating it to dedicated sustainability teams. It is in part via this data management channel that the CSRD exercise’s implementation can change business-as-usual. Otherwise, keeping sustainability data siloed within compliance leads’ scope risks disincentivizing  collaboration between sustainability teams, subject matter experts and analytics teams, which is critical for meaningful, strategic and effective decarbonisation. 

The Journey to Data-Driven Reporting

Whether an organization adopts a SaaS reporting tool or designs its own workflows for CSRD readiness, Dataiku’s key capabilities in data connectivity and preparation support what we recommend as the first step to both approaches. Namely, the building of a data foundation: a pool of reliable, ready-to-use and easily monitored data. Read on to understand why and how Dataiku can help you build it.

dataiku users

1. Centralize Your Data in an ESG Foundation

We assume a compliance or sustainability lead is tasked with ensuring that CSRD data points defined by the ESRS are built adequately and promptly. This would usually require for the former to work alongside analytics teams to access necessary data strewn across the organization’s information systems - before transforming it to expected output.

Relevant data could be distributed in various spreadsheets that live in a company’s various data management systems. Take the example of a CPG company: while some of its supply chain data could be found in its Enterprise Resource Planning system, quantity of goods or packaging data might be retrieved from its Warehouse Management System, and distribution data from its Transport Management System. The organization will likely need to supplement this internal information with data from external providers, for instance to shed light on its physical risk exposure or gather more granular insight into its suppliers’ sustainability credentials

Accessing all these different data sources can be a headache for IT teams in terms of iterative ad hoc permissioning, connection building, and legacy system maintenance. To avoid this, we recommend that after relevant data is collaboratively identified, located and mapped by compliance leads and SMEs, IT teams are solicited just once to access  the given data source. We then recommend that analytics teams build connective pipelines from these various sources to recuperate relevant data, and store them in a data foundation against which permissioning can be formalized. Dataiku’s architectural extensibility to a wide range of cloud, lake, and proprietary storage infrastructure systems can help relieve IT’s burden in this process. 

2. Streamline Data Preparation, Aggregation, & Consolidation

Next, we recommend supplementing these connection pipelines with data parsing and preparation steps so as for collected raw data to be stored in a pre-processed, cleaned format. Here, Dataiku’s ability to industrialize data standardization, ETL, and EDA with explainable, visual pipelines will optimize analytics teams’ efficiency. Furthermore, whether full or low code users are working with sources of structured spreadsheets or unstructured data from pdf reports, Dataiku’s GenAI-powered assistant AI Prepare can further boost performance. To learn more about Dataiku’s ability to support data centralization, organization, and analytics, refer to Novartis’ experience streamlining their corporate analytics with Dataiku to facilitate data-driven decision-making.

Finally, turn this foundation into a valuable asset by leveraging the Dataiku data catalog and data tags to organize disaggregated data into your preferred categories, including sustainability thematics, data readiness levels or ESRS subcategories for facilitated eXtensible Business Reporting Language (XBRL) tagging. Taking these steps, Dataiku can help to deliver an intelligible and easily searchable pool of golden-copy data, available for replication and reuse by permitted profiles tasked with the next stage of CSRD readiness. We will cover how Dataiku can support the transformation of data to ESRS expected output in our next blog. 

Let’s Recap

Whether users are building their CSRD data from scratch or relying on a SaaS reporting tool, streamlining data centralisation and preparation is fundamental to yield reliable insights into the organization’s sustainability. Executing on this effectively can support the CSRD’s ambition of bringing socio-environmental impact considerations on par with day-to-day business priorities. 

On a practical level, this means bringing lines of business, sustainability and analytics leads together to secure reporting obligations. We recommend building a data foundation to streamline this, and argue Dataiku’s reusable, visual and explainable preparation pipelines can facilitate such multi stakeholder collaboration. Furthermore, using Dataiku to establish this data foundation has the added benefit of simplifying IT teams’ administration thanks to the platform’s architectural extensibility, embedded permissioning and scalability. 

Next time, we will delve into how Dataiku can help organizations leverage their ESG data foundation to build different types of ESRS data points (from monetary or numerical projections to narratives) all the while monitoring data and processes’ quality throughout.

You May Also Like

Maximizing Text Generation Techniques

Read More

Looking Ahead: AI Hurdles IT Leaders Need to Overcome in 2025

Read More

No-Code ML and GenAI With Dataiku and Fabric

Read More

Unpacking 3 of the Biggest Controversies in AI Today

Read More