Takeaways from NLP for Sustainable Finance (NLP4SF)

Use Cases & Projects, Scaling AI Valentine Reltien

In July 2023, the Natural Language Processing for Sustainable Finance research group — NLP4SF — held its launch, and I had the privilege of attending. The program is a collaboration between the University of Zurich and Oxford University, and focuses on how natural language processing (NLP) can help finance professionals better understand the new risks posed by climate change and more effectively integrate these within their decision-making. The day of events was so rich, insightful and applicable to Dataiku users and prospects that we couldn’t resist sharing our highlights.

NLP for Accelerating Sustainable Finance

Organizations are grasping how progress in NLP enables them to quantify typically qualitative environmental factors, and thus integrate them alongside traditional financial variables. This capacity to capture complex environmental information (e.g. extreme weather risk exposure, supply chain disruption from resource scarcity, environmental damage caused, etc.also makes  it accessible and usable by other teams (e.g. governance or risk and compliance). Such useful information can now permeate traditional variables and reduce blindspots as to financial value’s real-life, environmental incidence.

However, most organizations— be they international development banks or private equity firms — still need help with the foundational basics of data analytics before reaping such NLP fruits. First, organizations need to build safe, robust data sharing systems to streamline the collection, aggregation and storing of the qualitative data in the first place. Next, they must ensure data’s quality control across its lifetime, including its sound scaling and shareability. 

From here an increasing number of teams can be equipped, compelled by regulatory compliance, competitiveness, or an understanding of such environmental variables’ importance for their medium term strategy’s soundness. Finally, data teams also need help enabling this emerging group of citizen data scientists on the ins and outs of different NLP use cases, all the while ensuring responsible and governed practices.


More generally, data teams need to understand how to connect with, transform and share both unstructured and structured data; and how to integrate and connect these novel insights alongside their traditional metrics. They also need leaders who grasp the stakes justifying investing time in such a project, ahead of the curve. 


Keeping a Human in the Loop

My second key takeaway was the resounding emphasis of not getting carried away with automating technology at the expense of forgetting our crucial agency in setting its direction of travel — that is, how we apply it and why. The matter of keeping a “human in the loop” is crucial to ensuring monitoring the selection of data sources and the truthfulness of a model’s output.

Given the intricate self-supervised learning of LLMs like ChatGPT, the logic by which the model produces its output can often be obscured. Responsibility cannot be ensured without clear accountability mechanisms, transparency and strong governance structures. And considering the stakes and urgency of sustainable finance, it is paramount that a robust chain of responsibility oversee the technology’s application. Hence the importance of working with a data platform or tool that enables transparent, explainable and traceable ML processes. 


Translating Reality Into Leverageable Data

Another NLP-enabled breakthrough is its ability to structure alternative data, from satellite imagery to integrated financial reports, watchdog reviews, spreadsheets or more. Whether parties come at it from a motivation to protect their investors from credit default or their underwriters from loss avoidance the expanding field of NLP offers opportunities to mitigate risk and capture opportunities far more effectively.

A particular application enabled by ClimateBERT stands out for its relevance to cross-sectoral corporate reporting and finance professionals: the Cheap Talk Index (CTi). Peer-reviewed research demonstrates that, in a corporate sustainability report, the ratio between “non-specific” commitment paragraphs over the sum of all climate commitment paragraphs, yields an accurate representation of the company’s effective climate action. The larger the CTi, the greater the correlation with a large increase in greenhouse gas emissions year on year, exposure to reputational risk, and media controversy. Meanwhile, a lower CTi is correlated with tangible positive contributions to decarbonization. This ingenious and impactful library was made open source by its creators. We’ve been very inspired by its capabilities and suggest you watch out for the next version of our interactive document intelligence for ESG solution

Beyond corporate reports, a dedicated session at the symposium elaborated on how satellite imagery is also leveraging progress in NLP to improve the accuracy of asset-level analysis and risk exposure. Spatial Finance is spearheading this by blending geospatial and finance data to deliver more granular and comparable insights into asset ownership and their socio-environmental impact. This is particularly relevant for underwriters and investors needing to fill the gaps in their asset portfolio’s risk assessments in the context of stringent regulations and stakeholder scrutiny. 

Both ClimateBERT and Spatial Finance’s outputs exemplify how NLP and alternative data blending is making it far easier for investors to close the gap between assets’ purported value and their actual value (factoring in up till now “invisible” variables).


Climate Resource From the Bottom-up and Top-down

From German Central Bankers and French Control Authorities, to top-of-class academic researchers, multidisciplinary tech nonprofits, international development banks, and private equity groups - the resolve to accelerate the financial system’s investment in transitioning the economy back within planetary boundaries was felt coming from all sides at the NLP4SF symposium. Two instances struck me as examples of cross-organizational collaboration pointing to a hunger for change and innovation within large organizations.


Climate Policy Radar is a tech nonprofit helping organizations leverage the power of LLM, generative AI and collective intelligence to advance effective policy. Its open source database gathers climate legislation, national (and soon corporate) policy and global climate litigation case law to “[build the evidence base for evidence-based decision-making”. Its aim is to make these innovative texts more accessible to decision-makers to adopt, advocate or replicate these at their organization’s level.

The Agence Française du Développement’s SDG Prospector, for its turn, measures progress on, translates, and monitors corporate policy’s contribution to the United Nations’ Sustainable Development Goals (SDGs). Our present system’s unsustainability is making the financial system increasingly severely vulnerable to economic shocks. The SDGs comprise a detailed, measurable, and generalisable framework that guides financial investment and economic activity back into socio-environmentally tenable and durable terrain. 

Both of these initiatives illustrate LLM’s capacity to uncover, map and leverage collective intelligence by building bridges between existing initiatives — from policy to legislation and litigation to strategy. The open source nature of these innovations reduces the cost of innovation and lowers the barrier of entry to adopt and integrate this knowledge into organizations of all sizes. Organizations working with software designed to prime the safe self-service aggregation and analytics of various data sources for enriched insights will have a head start.

You May Also Like

5 New Dataiku Features to Streamline Your RAG Pipelines

Read More

Dataiku Is a Gartner Peer Insights Customers’ Choice

Read More

2025 Retail & CPG Trends: Hyper-Personalization, GenAI, & More!

Read More

Keep Track of All Your Models (Including LLMs) With Dataiku

Read More