A ‘Sales Analytics’ product architecture series — Part I : Customer Segmentation using sales data

Pramod Krishna
4 min readDec 25, 2020

--

Part II : Credit Risk analysis from ledger data (Tally output)

Part III : Architecture for a ‘Sales Analytics’ product.

Proposed Dashboard : click here

Three considerations that I kept in my mind while I wrote the code and designed the architecture for the product are:

  1. Fast execution
  2. The output should have the potential for visualizing the story of each customer
  3. The code must be able to handle any Tally outputs, irrespective of the format.

The final dashboard is interactive and you can select customers from the ‘Customer ID’ drop-down to visualize the story of a particular customer (like in Fig 2) . The ‘Customer Ids’ marked in green are recent customers.

Fig 1
Fig 2

In Part I, we will be doing an RFM analysis of customers in a different approach, to produce an output that can tell the story of each customer over a time period . (As the methodology ‘RFM’, we will be calculating Recency, Frequency and Monetary Value of customers.)

The First issue in RFM modeling would be to decide the recency period. In some businesses, a customer who has purchased within a year would be considered a recent customer. In some other cases, a month will be considered as a recent customer . Here, we are going to take a ‘Quarter’ (3 months) as the time frame for a recent purchase.

Secondly, every business user would like to know if their customers are coming back to purchase after the recency period (here, a Quarter) or in other words whether they have retained the customer.

Taking both into consideration we shall focus on finding ‘Frequency’ and ‘Monetary Value’ of each customer in each Quarter. Thus, we will be able to plot how each of the customers fares in each quarter.

Let’s go through the process and code.

Sales Data

Fig 3 shows the format of the data that we use as input

Fig 3

Preprocessing

First, we shall import all the relevant libraries and create a new column called ‘profit’. Also, we shall convert the ‘Date’ column into date format. (Fig 4)

Fig 4

We shall now create a ‘Year-Quarter’ column from the ‘date’ column and create one data frame each for each quarter. This will help us find how customers fared in each quarter. (Fig 5)

Fig 5

Let’s now find how much profit a customer has contributed as well as the frequency (Fig 6). The ‘order_cluster’ function, is to order the clusters that we created. Since we are using K-Means clustering, outliers will have a good effect on how clusters are formed, hence we will generate 9 clusters for now.(Creating a high number of clusters to nullify the effect of outliers is not a good method, but here this will do the trick. Or as an alternative you can use methods like DBSCAN which works well even in the presence of outliers)

Fig 6

Next, append the data frames to a list and create a new data frame.

Fig 7

I like to create copies of data frames in between the code just to make things easier if I mess something in between. Now, we shall scale the ‘frequency_cluster’ and ‘profit_cluster’. (Fig 8 & Fig 9)

Fig 8
Fig 9

Now, we shall create a Profit Segmentation column and Frequency Segmentation column. Here we will segment customers into ‘High Value’, ‘Average’, ‘Below Average’ & ‘Low Value’ based on the Profit and Frequency Scores.

Fig 10

Finally, we shall save it into a CSV file. Once we are done finding the ’Credit Scores’ and ‘Credit Segment’ of customers, we will be using this CSV file to create final output. Before we end Part I, let’s find the ‘First Purchase’ date and ‘Last Purchase’ date of each customer and store it in a CSV as well.

Fig 11

Fasten your belts for the next part because the preprocessing of ledger data is a bit complicated.

link to Part II :Credit Risk analysis

--

--

No responses yet