Moving From Spreadsheets to Dataiku for Financial Modeling

This article presents how financial modeling can be done inside Dataiku. Let’s begin with the context: spreadsheet-based tools like Microsoft Excel are some of the most popular tools for financial modeling and are used for all kinds of tasks including investment analysis, P&L modeling, and risk management. Why is that the case? Spreadsheets have convenience benefits, they have been around for a long time, and they will continue to be around for the foreseeable future. However, they also have some downsides. Namely, the interfaces are semi-structured and brittle, especially when it comes to large volumes of data and lack of scheduling automation.

At the core, spreadsheet-based tools bring data to the forefront and keep the calculation logic in the background, for your reference. You can view the dependency graph for a specific calculation, but it could get challenging to figure out where the entire story begins and ends — interconnections between calculations and data — within a multi-tab spreadsheet. Real-world spreadsheets tend to become complex over time which presents difficulties in understanding and maintaining work. And if you want to do custom stuff, then you have to write VBA macros which require a completely different skill set.

finance

Spreadsheets are certainly not ideal for enterprise development where you have portfolio teams building, maintaining and collaborating with full traceability on stocks selection (i.e.,within a sector there is a need for the output to flow in an automated fashion into a more holistic workstream like portfolio construction). While spreadsheet-based tools are mostly cloud based, there are no pushdown capabilities to leverage other compute options when dealing with operations on large volumes of data pulled from various sources. It’s also difficult to connect to different types of sources, such as API services.

That brings us to Dataiku which is structured as a visual pipeline for performing calculations and other operations on data. Dataiku brings the calculation visual flow to the forefront so users can view the input data, operation, and the results from the operations by clicking on each step in the flow. That’s the fundamental difference in terms of the interface. Dataiku also provides data profiling features to identify and automate data quality checks to spot human errors, which are a primary cause of spreadsheet problems.

In this blog post, we will outline how NPV/IRR/payback period, P&L construction, and market beta calculation using regression examples can be replicated in Dataiku.

NPV/IRR/Payback Period Using a Spreadsheet-Based Tool

In a typical financial modeling example, you are going to find five types of sections. The sections outlined below represent some of the common aspects of any spreadsheet model, even beyond financial modeling.

spreadsheet example

As you can see, the sections are:

Placeholders you can change (i.e., your model’s assumptions)
Static input data
Interim calculations and the calculations you are ultimately interested in
Calculations under various model assumption scenarios
Calculations and scenarios data visualization

Below, you can see the fully built out Dataiku flow for IRR/NPV/payback period:

Step 1:

We place our discount rate (model assumption), future cash flows (data), and discount rates 1-40% (multiple discount rate scenarios) into three separate editable datasets. We chose editable datasets because they are different from other Dataiku dataset types. They can be modified from the Dataiku UI or through a Dataiku application.

Step 2:

We join data to discount rates to create a single dataset. Unlike a spreadsheet where you do cell references, you have to Join disparate datasets so that you can reference columns/fields for calculations.

Step 3:

We then create a formula step in a visual Prepare recipe to calculate discounted cash flow for each investment idea and period.

The benefit of doing this is two-fold: We can now sum up the cash flows to arrive at the net present value, but we can also show how the discounted cash flows situation is progressing using a Windows recipe to see what the payback period for the investment is.

By visualizing our results, we can show the following:

IRR: We can see that there are two IRRs since it has two negative cash flows for this investment idea.

IRR in Dataiku

NPV:

NPV in Dataiku

Payback period:

payback period in Dataiku