In its simplest form: Churn (or attrition) is when customers leave, and companies in nearly every industry have to address it because it has the power to plateau the growth of any businesses even if that business is gaining customers quickly. The most successful companies address it by building predictive models that accurately predict churn; then they take action by building targeted marketing campaigns around preventing it or by making product changes that combat churn.
There are two basic types of churn: subscription churn and non-subscription churn:
Subscription churn happens in businesses where users or customers are on contract for a set period of time (monthly, annually, etc. — think cable, network, or phone providers), and customers choose not to come back after that contract is up. It is easy to define, predict, and prevent since there’s a clear, defined window with risk of churn where marketing activities can be focused.
Non-subscription churn happens when users or customers can end their relationship with your business at any time - they come and go at will. A customer may gradually over time reduce their purchase frequency, or they may all of a sudden never buy again. This article will focus on the process for preventing non-subscription churn because:
- It is a prime candidate for prediction since there is no set renewal time and because it’s not clear when this audience will need or be receptive to marketing materials aimed at preventing churn.
- It requires collaboration among several teams to predict accurately - the business side generally defines churn (lack of action after weeks, months, or years), and then it’s a back-and-forth, iterative process with data teams to arrive at the right model.
Note that some industries might deal with a combination of both types of churn, for example, banks where basic services are non-subscription, but credit cards with annual fees might be subscription-based.
How?
Tackling churn by successfully predicting those that will churn is as easy as following the seven fundamental steps to complete a data project. Some particular nuances and details for churn prediction:
1. Understand the Business
How will your specific business define churn? This step is crucial - defining a churn period that is too long risks predictive models with artificially low churn rates, not capturing enough people and defeating the purpose of predictive modeling. But defining a churn period that is too short makes it difficult for marketing teams to evaluate churn prevention campaigns because they ultimately can’t distinguish between organic actions (users or customers who would have come back anyway without intervention) and effective campaigns.
It’s also a good idea to do basic analysis upfront (unsupervised/clustering) to decide which users should even be considered in the churn analysis. For example, if someone used the product or service only one time, are they considered a churner after that? Or is there some minimum threshold after which a user should be considered and included in churn analysis?
Additionally, before moving on to any other steps, it’s essential to decide first what the churn predictions will be used for. The marketing and product teams should be fully looped in and have a concrete plan for using predictions to prevent churn. Otherwise, there is a risk of wasting time and resources modeling churn predictions that go unused. Predictions can be used for short-term solutions like marketing campaigns to re-engage likely churners (more on this later), or they can help uncover potential deeper drivers of churn that can be addressed long term. For example, maybe there is an issue with the product that is blocking customers’ ability to come back easily or there are in-product improvements (or potential new features) to be made to prevent attrition.
2. Get Your Data
The minimum data required to predict churn is simply some form of customer identification and a date/time of that customer’s last interaction. This data, though not incredibly detailed, would allow you to build models to predict churn at a basic level.
However, the reality is that adding additional data on top of this minimum dataset is recommended and highly encouraged. The more data included, the better the churn predictions will be, so if available, also include things in the dataset like static demographic information about users, details on specific types of user actions, etc. The more sources, the better.
3. Explore and Clean Data
Remember that this step of the process can account for up to 80% of the total time spent on the project, so don’t be discouraged as you get your data into a usable format. Take time to ensure you understand what all the different variables in your data mean before moving on to cleaning up different spellings or possibly missing data to ensure everything is homogeneous. Thoroughly exploring and cleaning will save time in subsequent steps, particularly when it comes time for prediction.

4. Enrich Data
If you’re working with a more advanced dataset than simply customer identification and date/time of last interaction (which is, as mentioned, highly recommended for better prediction), this is the time to enrich that data and join it to get down to the essentials. For example, if you have one dataset with customer identification and date/time of last interaction and another with customer identification and demographic information, you’ll want to join these into one set of data.
While certain data can enrich your analysis, be aware that group/pivot operations could generate hundreds or thousands of features if done blindly. So ensure the data you’re using has ultimate value for your defined churn goals (see step 6 before adding too many features), and get ready to build some very large datasets!
5. Visualization
Now that you have explored and know your data by digging in, cleaning, and enriching it, it’s time to visualize. Visualization is an important step in the process because it allows a way for end-users - in the case of churn, this is the marketing team and/or the product team - to consume the data quickly and easily.
Ensure you are aligned with your end-users here and give them visualizations in a format that is actually helpful for them. Some helpful visualizations for marketing and product teams with regard to churn might be to show:
- The evolution of churn over time and targeted churners
- Which product features have an impact on churn
- Descriptive statistics of those key features for easy reference or visual simulations illustrating how changing features would impact churn probability
- Additional insights about the chosen churn model
Additionally, you may consider exploring visualizations not as a product for end-users but as a way to uncover additional insights and trends you may want to explore or explain with predictive modeling. For example, maybe by creating a churn visualization on a map, you find that certain geographies churn at a higher rate than others, and you would like to explain why.
6. Get Predictive
When building a predictive model, one has to be careful that it will actually learn what you want. For instance, one of the common pitfalls for a churn modeling project is to train your model on both past and future events. To avoid this common mistake, you need to put yourself in the position you’ll be in when your model will be deployed into production: What data will be available to you? When would you like your prediction to be: for next week, next month?
An important part of the predictive process is the interaction and iteration between predictive modeling and feature engineering. In step 4, you enriched your data and generated features. Now it’s time to see if the features you’ve added are actually valuable to your model. Try keeping the feature set relatively small at first and then run your model(s) to evaluate performance. Little by little, continue to add features and evaluate their effect on the accuracy of the model.
When in this design stage testing features and iterating, note that it’s not necessary to run complex models at this point in time. Instead, focus on optimizing for the right features first and running simple models. Later, once you know you have the best features, you can optimize and find the best model. This will save time and resources in the long run.
Regarding finding the best model, another critical step in this process is choosing how you evaluate which model is best. You want to choose an evaluation model that fits with the business needs. For example, is the goal to identify everyone who has a higher than N likelihood of churning, and the marketing or product teams will address all of those individuals? Or will those teams only address those individuals with more than a certain lifetime value? Or do you need to identify only the top N most likely churners overall since the marketing team only has the budget to try and retain a small set of overall likely churners?
If you’re a beginner when it comes to machine learning and algorithms, you can use a tool like Dataiku to run basic open source algorithms to predict churn in a clickable interface without having to write any code.
7. Iterate and Deploy
This is where the interplay between data science and business is strongest — work together to determine if the model is actually effective. In particular, ensure models are sufficiently generic, which means using training, validation, and testing sets that are not specific to a certain time period or to a certain type of customer. For example, you would not want to train or test based on a data set from a time period where there was perhaps a pricing change or some other factor that caused churn rates to be different than usual.
Most importantly is deploying a churn solution into production. Looking at churn one time and evaluating models but not taking any real action to set up a continuous churn prevention strategy doesn’t do much good!
What Next?
Once you have a good churn prediction model in place, the job is only half complete. The final (and perhaps most important) step is to take actions based on predictions. But where to begin? What’s the best way to tackle potentially large swaths of potential churners? Many businesses make the mistake of taking those who scored the highest (i.e., are most likely to churn) and targeting them.
- Decide how to reach likely churners: Short term, marketing campaigns (particularly those offering special deals or discounts) are the most effective means of re-engaging predicted churners. You’ll need to decide exactly how you will reach these customers. Emailing? On-site promotions? Some other way?
- Decide which of the likely churners to target: Realistically, not every single customer who churns will come back. To save time and resources, effective teams go one step further and use uplift modeling and client clustering to drive return on investment (ROI) for churn marketing campaigns. In other words, you should only spend time and resources targeting those churners who will respond positively to your campaign.

 
                                                 
                                                 
                                                 
                                                