Churn prediction, distribution optimization modeling, segmentation and cohort analytics

Churn prediction, distribution optimization modeling, segmentation and cohort analytics

With the introduction and increased use of digital technologies in the retail industry, retail shops experience massive volumes of data. The data stores use consists of information on customer demographics, transactional data, sales quantity, geographical data, coupon and discount usage, registry telemetry, supply chain logistics, and more. This provides retailers with an opportunity to gain insights into customer preferences, trends, shopping behaviors, pricing adjustments, inventory management, and other advantages, and holds great potential for the businesses to improve or optimize processes, increase sales, enhance customer experience, and other desired strategies. The goal was to use Machine Learning and advanced analytics to identify important segments and cohorts of customers, predict customer churn, calculate the quality of items, analyze telemetry, and optimize the distribution logistics of item delivery to the stores. This resulted in conducting several analyses, creating a knowledge graph database, building several Machine Learning models, putting them in production, and creating multiple dashboards and visualizations. 

Services

Data Science,
Partnerships

Project Length

12 Months

Client

Tus Trgovine

Our Planning Process

Since the goal was to produce multiple deliverables during the length of the project, the solution consisted of several separate tasks. Since the data was stored in a graph database, the feature engineering and some analyses were performed using the graph database Neo4j. Segmentation was performed using clustering algorithms, churn prediction and distribution optimization using Deep Learning Neural Network models in Python and/or Jupyter notebooks, telemetry analysis was performed using the process mining tool Disco, and visualizations and dashboards were created using Jupyter notebooks, QGIS and Qlik Sense. 

What we did for Tus Trgovine

The following products are one part of all developed products during the collaboration: 

  • Knowledge graph: All data for the analysis and models was in a graph database. The required feature engineering and all necessary data transformations were performed inside the Neo4j database and then extracted to the respective places they were needed. The knowledge graph encompassed customer data like demographics, shopping habits, store analytics, item properties, item sales, and more. Many features were geospatially enriched through contexts, and even new advanced features were calculated inside the knowledge graph, like for example our own devised Item Quality metric. 

  • Churn prediction: This consisted of predicting customers that have a high chance of not shopping anymore at this retailer. The first generation model was created using XGBoost, while the second generation was an improvement over the first and was created using Deep Learning and Convolutional Neural Networks. The result of the model was to generate a list of high-probability churners which can then be used by us to detect patterns, analyze behaviors, draw insights, and by the retailer for interviews, targeted advertising and more. 

  • Segmentation of customers: Customer segmentation and cohort analysis was performed based on customer demographics, shopping habits and more. Important segments were analyzed and problem maps, heatmaps and other visualizations were generated. The segmentation was performed using Python and various clustering algorithms.  

  • Distribution optimization: Retail stores, depending on their size, frequently order fruits and vegetables from the distribution centers. The shop managers must predict how much of each fruit and vegetable they think will be sold until the next batch arrives. Item sales are affected by weekends, the weather, holidays, the season, item quality and much more, and it is not a trivial task. If ordered too much, this results in wastage and loss for the company. If ordered too little you are not selling as much as you could, and that also results in dissatisfied customers. A model using Deep Learning and CNNs was built which predicts how much of each item needs to be ordered for every store, taking into account all features that might affect sales, as well as the current stock of the item in that store. 

  • Process mining: Checkout counters at the stores also collect large amounts of data, from item scanning speed to length of processing one customer, length of processing payments and all errors that can happen on a checkout counter. It is a flow which has certain order: from starting a new receipt, to scanning items, applying discounts to payment and receipt printing. Using Python and Process Mining with the tool Disco we detected anomalies in the flow, errors, time-consuming steps and unusual or extraordinary loops. The findings were reported to the client with the goals of improving the software, introducing cashier training where needed, detecting slow-scannable items and more. 

  • Dashboarding and visualizations: Throughout all processes, dashboards, visualizations and charts needed to be produced for the client. The dashboarding was performed mainly in Qlik Sense, several visualizations using Jupyter notebooks, and for geographical data the software QGIS. 

Final Results

The development of these products resulted in several benefits for the company. The analyses improved their knowledge of the customers, detected important customer segments and improved their customer targeting. The ML models were put into production: with the Churn model they are able to predict in advance which customers are of high risk of churning and are able to target them with discounts or conduct interviews. With the distribution model, less waste and higher profit is achieved, and with the process mining many issues with the registers were detected. Across all deliverables multiple dashboards and visualizations were created for monitoring and control.