Addressing CRM Data Quality with Dataiku

Use Cases & Projects Alexander Vorslov

Using machine learning (ML) to predict customer growth, churn, and to find insights in the data is not only a trendy topic, but also something that can bring a lot of value. However, there is an essential prerequisite for any machine learning model, and machine learning models for these use cases are no different: good-quality data (in this case customer relationship management or CRM data).

This article focuses on answering two questions:

  • What rules should the sales team follow when putting data into CRM systems?
  • How can you use Dataiku to make sure that the rules are followed?

→ Brand New to Dataiku? Watch the 13-Minute Demo

Having high-quality data in a CRM system is important for creating reports and conducting analyses as well as ensuring the sales process is transparent. And apart from those conventional data quality benefits, it allows for making accurate predictions based on CRM data.

For example, imagine that you would like to analyze your competitors. The goal is to understand which ones you lose deals to and define a strategy for improving the product. However, this analysis is only possible if you can extract accurate competitor information from your CRM system. Having that lets you build a simple visual dashboard from which you can get valuable insights.

Or imagine that you need to predict which customers risk churning next year. It is not an easy task, and probably the best way to tackle it is to build an ML model. However, for the model’s results to be accurate, you need to make sure that the training dataset is extensive and accurate. If this is not the case, the results of the model can be wrong. But how do you do that? One way is to start grading the salespeople on the quality and completeness of the data they have in the CRM tool — this data will then be used as a training dataset for the models.

Building a CRM Data Quality Score: The Process

As with any data project, building a CRM data quality score starts with the fundamental step of talking to the business experts — in this case, salespeople. You need to identify what information they have about the customers at which part of the sales process. The most crucial goal here is to identify the information they do not yet put into the CRM system. For example, you might find out that approximately three months after working with a prospect, they can estimate how much the prospect likes the product and what competitor products they are evaluating.

The next step is analyzing the responses — you need to identify the information that the team possesses and that would be relevant for the further analysis. It’s essential to find the balance between the quality of that information and the time it takes to log it. You only want to ask what is necessary and make sure the salespeople don’t spend too much time logging what they know.

After the analysis is done and you’ve identified the valuable information, you need to create new fields/properties in the CRM tool and define rules, including which stage of the sales process what fields need to be populated. 

For example, at the earliest stage, you might need to identify at least three contacts at the prospect’s company, and for each of them, you need to fill in their opinion of the product. At the next stage, you might need to identify and describe the use case, and by the last stage, you also might need to specify the competitors. After that is done, this set of rules needs to be validated with the sales leadership.

As soon as the sales managers approve the set of rules, you can start thinking of a process that  might work for your company, for example:

  • Based on the set of rules, a certain score (let’s say, 0 to 100)  could be calculated for each salesperson based on their data in the CRM.
  • Once a week, every team member gets an email report with their current scores. It’s also nice for that digest to mention exactly which data quality issues are costing them points, as it will make it easier to improve the scores. It’s also a good idea to add a possibility for someone to check their score whenever they want. For example, you can add a button to the email which will generate another email report.
  • Lastly, there is the incentivizing part. Many companies go with some kind of a leaderboard so that people try to improve their scores.

The data quality score can also include some good-to-have CRM properties that you do not want to be required in the system itself. For example, having more than 30 words in the client’s use-case description can result in bonus points. Also, it is a good practice to have some consistency rules, for example — no open deals with close dates in the past or no won deals with close date in the future.

Building a CRM Data Quality Score: The Dataiku Flow

Implementing a CRM data quality score model in Dataiku is a very simple, four-step process. 

Step 1

Get data from the CRM. Depending on the tool you use, you might be able to use a dedicated plugin (like the Salesforce plugin), which makes the data import flawless. Even if there is no plugin available, you can create a Python/R recipe that will connect to the API and import the required data.

In Figure 1, you can see that Dataiku Plugin recipes are used to import the CRM data. The data is then prepared and joined into one dataset that will be used for the scoring using a mix of visual and code recipes.

Data import and preparation for a CRM data quality project in DataikuFigure 1: Data import and preparation for a CRM data quality project in Dataiku

Step 2

Translate the rules you defined to Dataiku and apply them to your data. This can be done exclusively with visual recipes (if you don’t want to code). Prepare recipes are especially useful in this step since they are highly flexible, easy to adjust, very transparent, and easy to use.

For example, in the Prepare recipe in Figure 2, people who are off on vacation are excluded from the score calculation. After that, points are assigned with formula processors.

Implementing business rules for the CRM data quality project in Dataiku

Figure 2: Implementing business rules for the CRM data quality project in Dataiku

Step 3

Create an email alert. Depending on the preferences of the sales team, this can be transformed into a chatbot alert as well. Such an alert can be either done with a Dataiku send-email plugin or with another code recipe.

Step 4

Create a scenario (requires at least a Dataiku Business edition), which will launch your flow once a week. On every launch, the scenario will download fresh data, apply the scoring rules, and will send the alerts to the team. You can also record the past scores in a dataset in case you would like to have a global leaderboard.

*Bonus* Step 5

If you would like the team to have the ability to check their scores on the fly, and not just once a week, you can also use the API node. You can create a simple API service that, when queried, would launch the scenario responsible for sending the alert.

The API endpoint can also accept parameters (like a person’s name), and each email can contain a link specific to this person, so when they click the  link, the email is sent to just one team member and not to everyone. You can also deploy the flow on the Automation Node. That would let you continuously improve the project without affecting the current version.

Figure 3 shows how the complete project flow looks in Dataiku. In this flow, we first get the data, then transform it to the right format with a mix of visual and code recipes. Then, we set up two email alerts — one for the team and one for managers (this is why there are two Python recipes in this flow).

The CRM data quality project in Dataiku

Figure 3: The CRM data quality project in Dataiku

Benefits & Business Value

This simple process can significantly improve the quality of data in the CRM. It does not take much time to set up and the potential benefits are great. The newly collected data can be used for forecasting, identifying sales process pain points, and reporting.

As you can see in the previous sections, doing such a project with Dataiku is easy, especially because you can use a mix of visual and code recipes depending on your preferences and technical level. Another benefit is that several people can work on the project at the same time, due to Dataiku’s collaborative nature. And of course, you can grant access to the project to anyone with an account on your instance, to use as a reference.

You May Also Like

Taming LLM Outputs: Your Guide to Structured Text Generation

Read More

No-Code ML and GenAI With Dataiku and Fabric

Read More

The Objects of an LLM Mesh for Building LLM-Powered Applications

Read More

Data Lineage: The Key to Impact and Root Cause Analysis

Read More