A ‘Sales Analytics’ product architecture series — Part II : Credit Risk analysis from ledger data (Tally output)
Part I : Customer Segmentation using sales data
Part III : Architecture for a ‘Sales Analytics’ product.
The final dashboard is interactive and you can select customers from the ‘Customer ID’ drop-down to visualize the story of a particular customer (like in Fig 2) . The ‘Customer Ids’ marked in green are recent customers.
In this Part II, we shall now find out the credit worthiness of each customer. This is a pretty hard take, because of two reasons :
- We don’t have a previous credit history or rating criteria for customers
- There is no credit period information in the data set.
This is a pretty common problem in any SMEs (Small and Medium Enterprises), and there lies the beauty as well, we shall try to create something out of nothing.
Why Credit Rating?
In some cases (especially in the wholesale sector) customers will not be paying the amount upfront. They prefer paying it in installments, we cannot say ‘NO’ to such a system, because if we don’t provide credit to our customers they will leave our service.At the same time, we cannot allow them to exploit the credit system. So, our analyses and ratings would come in handy to the business user when issuing credits, later on.
Okay now, let’s dive deep in. This is an exciting challenge!
Data Set
The figure shown above is a screenshot of the ledger data that will be taken as input. This is a standard ledger format and this was exported directly out of Tally.
As you can see, the data is not at all in our preferred format.
Pre-processing Ideas
First ,we shall create a new column customer_name and store the customer name that is written to the right side of ‘Ledger Cell’ in that column against each line item.
Second, we shall fill NA values in between the cells of the credit and debit column with ‘0'.
Third, we shall drop other NA rows and the rows containing ‘Closing Balance’.
For simplicity, I have written the whole preprocessing associated with it as a single function (Fig 4)
How to arrive at a credit rating?
First, let’s get familiar with a few terms and columns used in the data-set:
Credit Column : This is the amount that was given by the buyer to the seller.
Debit Column : This contains data regarding the purchase done by the buyer.
Opening balance : If the first row of a customer’s sub-table contains this row, it means that either buyer or the seller owes a sum to the other. This can be positive or negative.Meaning, if the amount is under the credit column that means the seller owes a sum to the buyer and if the amount is under the debit column it means the seller owes a sum to the buyer.
Now we should be calculating ‘Opening balance’ only in the beginning. We have 10 years of data exported from Tally with us, so it’s very much possible that a buyer will be having multiple opening balances in the sheet. Hence, we will be taking only the first ‘Opening balance’, since we plan to calculate a running balance till the end.
Let’s do that now.
First, we shall import all the relevant libraries and read the preprocessed data.
Now we shall remove all the irrelevant opening balance.
Next, we shall create a data-frame for each quarter so that we could gauge how each customer behaved in each quarter.
Now comes the most exciting and longest part, assigning a credit score value. For that, we will find the credit exposure per day of each customer and cluster it, to arrive at a score.
What’s credit exposure per day?
Credit exposure is a measure of the maximum possible loss to a lender, if a borrower defaults on a loan. If it’s positive then it’s good for the seller, as he has already provided an advance amount, if it’s negative it’s not so good as the buyer owes the seller an amount.
We shall create a column to calculate the difference between days of payment/purchase. Then we shall create a column called gross_total and deduct the ‘debit’ column from the ‘credit’ column. This will help us maintain a ‘balance’ amount column against customer_name.
One key point we used is, we performed set operations wherever it is possible and avoided loops. Because loops take time to execute in large datasets. Here we have limited the entire code to just one single loop. (Fig 8 to 10)
Now we shall cluster and assign a cluster value based on CEP (Credit Exposure per Day) [Fig 11 to13]
Now let’s append all the individual quarter data frames and assign segments. [Fig 14]
Hurray! we have assigned a credit score to each customer_name in each quarter.
Next, we shall merge both the datasets — the final output from Part I and the final output from this series to create the final datasets for visualization. Like the one shown below.
In the final part of this series (Part III), we shall discuss how to package all of this together and make it into a product.