Top Data Challenges (& Solutions) for Financial Institutions

Data Basics, Use Cases & Projects, Scaling AI, Featured Julien Antunes Mendes, Samuel Mahy, Xavier Maréchal

Data science, machine learning, and AI are hot topics for financial institutions, as these techniques can help optimize and automate countless processes and tasks. But that doesn’t mean implementing them is easy — this blog post covers some of the top challenges financial institutions face and how they can be solved.

This is a guest blog post from our partners at Reacfin. 

Logo-Reacfin-RGB-less-space-1

History of Data Science in Financial Institutions

When the data science boom began, financial institutions started hiring scores of data scientists in an attempt to boost their ability to get insight from their data. When data was available and of sufficient quality (they often experienced difficulties with legacy systems and the collection and structuring of data), those data scientists developed some use cases, generally with open source tools — R, Python, etc.

The natural question is: were these early attempts successful? It is, of course, difficult to generalize to an entire industry, but by and large, the answer was yes, sometimes… but not overwhelmingly. A set of (recurring) problems emerged:

  1. Data quality, which was not always in line with expectations and sometimes lead to less-than-robust conclusions.
  2. Data scientists faced difficulties connecting with business units and solving real business problems rather focusing on the technical aspects of the modeling process.
  3. Developments were sometimes chaotic (no coding guidelines, no centralization) and difficult to operationalize afterwards, even when the proof of concept (POC) was successful.

    Because of these issues, many early use cases didn’t provide adequate returns. Today, the situation is evolving fast, and a lot of financial institutions have gotten interesting results from their data initiatives. But we’re not there yet — financial institutions still need to build on lessons learned to standardize their data analytics projects, ultimately allowing them to scale. Case in point: democratization of AI and industrialization of AI platforms are two major trends dominating the Gartner Hype Cycle for Artificial Intelligence 2020.

Lessons Learned

In our years of experience working with financial institutions, here are the top lessons learned from developing data projects:

  • Working in silos for each step of a data project leads to inefficient and sometimes unreliable results. Frequent changes of systems, data formats, etc., throughout the process can lead to errors (e.g., when copy/pasting information or applying manual corrections).
  • Working “agile” does not mean you don’t have to define your problem and the related success metrics upfront. Too often, technicians and data scientists jump into the data analysis directly without a clear view on the value to be delivered at the end.
  • Smooth data sourcing and automated data pretreatment are necessary, otherwise data scientists will have little time to focus on value-add tasks like modeling, model interpretation, etc.
  • Close collaboration between data practitioners and business is a key success factor that cannot be understated. Data scientists must think about the operationalization of their models (not just about the best predictive power), while business must include data-driven decisions into thir daily practices.

team collaborating

Next Steps

From these lessons, one can easily understand the need for the implementation of a robust and reproducible data analytics workflow with the following key success factors:

  • Challenging the initial business problem definition to set expectations and pave the way to success (including defining how to measure the success).
  • Thinking of the modeling process end-to-end, integrating data extraction, pre-treatment, and model deployment steps (and not only modeling in itself) to reduce leakages and inconsistencies
  • Monitoring the quality of models and creating a feedback loop in order to assess their continuous relevance and anticipate the need for retraining and/or refreshing.

You May Also Like

A Sneak Peek of Season 5 of the Banana Data Podcast: Humanizing Data Science and AI

Read More

A Day in the Life of a Data Scientist at Pfizer

Read More

Understanding the Power of Data

Read More

How to Perform Basic ML Scoring With Scikit-Learn, Docker, Kubernetes

Read More