The future of default prediction: A comparison of machine learning model performance
Deciding whether a loan request should be approved is an important risk management tool. Giving out loans with an inaccurate view on the probability of default is risky because it can potentially harm your profitability. This makes the loan decision process a delicate task. A task that's always on the table for further improvement, to increase its accuracy.
Break with the past
Traditional credit scoring methods are usually based on credit scorecards. A scorecard usually grades several aspects of a customer and normally features between ten to twenty characteristics, for example the income or occupancy of the borrower. The lender then combines all the results into a single score that will determine the loan decision.
We expect to be able to increase the accuracy of loan decisions by using different innovative machine learning algorithms
The exact method used to pool the characteristics is usually based on logistic regression or a similar regression technique. Logistic regression is reliable as it is thoroughly researched, widely used and has a proven history.
Another benefit of the method is that it does not have an inconveniently large number of parameters to tune. However, its predictive performance is limited. We expect to be able to increase the accuracy of loan decisions by using different innovative machine learning algorithms.
The magic of machine learning: Mr. Know It All?
Machine learning is a technique in which models are not programmed to carry out a specific task, but instead, programmed to be able to learn. The algorithms used for our research are all based on so-called supervised learning. This form of machine learning uses historical data to learn to predict a specific label or outcome. And we’ve used the open source Python library Scikit-Learn to implement the machine learning algorithms.
In the case of credit scoring, the labels are default and non-default. Training is done by feeding the algorithms historical loans—showing whether they have been paid back or went into default. The algorithms then try to capture the structure underlying the characteristics of the loans and their relation to the default status.
Machine learning can often find details in this underlying structure that conventional methods cannot. Among several reasons, this is because machine learning can find non-linear relationships and the usage of non-continuous decision borders. This ability makes it possible to find connections between the data, connections that normally won't show up at first sight.
Depending on data
Data is often seen as the basis for current and future innovations in many fields. One reason for this is the increasing amount of data that is generated by all kinds of improved processes.
Think for example about electronic transactions: making payments through mobile applications is only becoming easier. This has led to a surge in the number of payments and, therefore, also a steep increase in generated data. Another reason for the increased importance of data is the availability of computing power, which enables better and quicker processing of the generated data.
Big data is a huge opportunity for analytics as it gives an accurate insight into all kinds of topics. When more data is available, increasingly complex data structures can be found. This helps you to get a more accurate view of whatever generates the data, be it customers or something else.
To handle the high volume of data, new technologies capable of finding such complex data structures are needed. Machine learning is exactly the technology made for this task.
Machine learning is a technique in which models are not programmed to carry out a specific task, but instead, programmed to be able to learn
Let’s put it to the test: experimenting with machine learning algorithms
To determine whether machine learning is a technology ready for implementation in your loan organisation, you must gather information on how accurate the predictions are. The best way to do so is to apply machine learning to default predictions and performance measurements.
In preparation for our experiment, we implemented nine different machine learning algorithms, as shown below. Logistic regression was the go-to algorithm for us—as mentioned before as a popular algorithm in current solutions—to provide a benchmark against which the other algorithms can be compared.
We implemented the following algorithms in the process:
|• Logistic Regression||• k Nearest Neighbors||• Random Forest|
|• Neural Network||• Decision Tree||• Gradient Boosting|
|• Naïve Bayes||• AdaBoost||• Support Vector Machine|
Aside from implementing the algorithms, it is required to run the algorithms with test data. For this experiment, we used two publicly available data sets, with a total of 115,964 loans and 23 characteristics per loan for the first data set and 45 for the second. Several features, of which two were discussed, were examined in detail to get a feeling for the data.
As shown in figure one below, income clearly has an impact on the default rate (DR), but only up to a certain level after which we observed significant differences. The box plot in figure two shows whether the applicant’s income could be verified. The result is that applicants with a verifiable income have a lower DR and, on average, borrow larger amounts.
It’s all about preparation: the approach to test data
Before using a data set for machine learning, you must thoroughly prepare it. This will make sure that the data is of good quality and that benchmarking the algorithms is done in a structured and confident method.
- Quality: The first step is to ensure the quality of the data. Missing values and other deficiencies will lead to an algorithm that is below par. Depending on the type of data issued, several methods are available to handle the deficiency.
- Training: After the quality of the data is ensured, we split the data into three separate sets: training (70 percent), testing (20 percent) and validation (10 percent). See below.
- Conclusions: Finally, on the validation set, we compared predicted default statuses with the true default statuses, based on which we calculated a final performance metric.
Training: We used the training data as historical data, which we fed to the algorithms for learning.
Testing: Testing data was used as if they consisted of new loans. The algorithms would make predictions on these loans. Based on the results of the predictions, the settings of the algorithms were updated.
Validation: When the final settings of an algorithm were determined, the validation set was used once as new loans to make predictions on.
To make an objective measurement, you should have a clearly defined metric; we used the Area Under the Curve in this case. The value of this metric varied between 0.50, a worthless predictor, and 1.00, a perfect predictor. The goal of this insight was to determine the predictive performance of the nine algorithms listed before.
All credits to the winning algorithm
The table below shows the results of applying the algorithms to the two different data sets.
* Please note that Support Vector Machine is only applied to the first data set due to infeasible computation times on the second, larger, data set.
The best performing algorithms were Neural Network and Gradient Boosting. We were happy to see that these two algorithms performed best on both data sets, which underscores their consistency in the tested circumstances.
It was furthermore interesting to us to observe how similar the algorithms performed on the first data set; the worst and best performances were very close—between 0.74 and 0.78. The differences in the second data set were substantially larger—between 0.66 and 0.81. We hypothesized that this was caused by the complexity of the data sets.
The second data set had substantially more features per loan (45 versus 23), leading to a more complex data structure. Since the strong suit of machine learning is to learn complex data structures, we already expected that the differences would be larger on the second, more complex, data set.
The best performing models tend to become a decision-making black box
Our observation that the more complex models performed best leads to one of the main issues in machine learning: the best performing models tend to become a decision-making black box.
It is nearly impossible to determine why a certain decision is made. Regulators are extremely hesitant in allowing the use of black boxes. However, a new field of machine learning is starting to shape: explainable machine learning. Within this field, algorithms are examined or altered in such a way that decisions can be explained better.
The future of credit scoring is all about machine learning and complex data
From this experiment, we can conclude that it's indeed possible to use machine learning to improve the default prediction accuracy.
In the first data set, the difference between the best performing algorithms and logistic regression, the algorithm that's now used often, was negligible.
For the second, more complex data set, the differences were substantial. Therefore, machine learning proves to be an interesting technology of which its relevance should be determined case-by-case. Those cases in which the number of characteristics and complexity is high, are the most suitable candidates for the use of machine learning.
Given the increasing amount of available data, we will see more complex data and, consequently, an important role for machine learning in financial services providers. However, it's a daunting task to set up the right infrastructure for using machine learning in your processes. Be thorough, prepare well and trust your data.
Need help conceptualizing, testing, and implementing machine learning to handle your data? We are here to help!