Why is benchmarking needed for credit risk modeling?

By: Bart Baesens, Seppe vanden Broucke

This QA first appeared in Data Science Briefings, the DataMiningApps newsletter as a “Free Tweet Consulting Experience” — where we answer a data science or analytics question of 140 characters maximum. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.


You asked: Why is benchmarking needed for credit risk modeling?

Our answer:

Benchmarking is an important quantitative validation activity.  The idea here is to compare the output and performance of the analytical PD, LGD or EAD model with a reference model or benchmark.  This is recommended as an extra validity check to make sure that the current credit risk model is the optimal one to be used.  Various credit risk measurements can be benchmarked.  For example, credit scores, ratings, calibrated risk measurements such as PDs, LGDs, CCFs, or even migration matrices.  Various benchmarking partners can be considered.  Examples are credit bureaus, rating agencies, data poolers and even internal experts.  As an example of a simple benchmarking exercise consider benchmarking an application score against a FICO score.

The benchmark can be externally or internally developed.  Various problems arise when doing external benchmarking.  A first one is that there is no guarantee that external ratings are necessarily of good quality.  Think about what happened during the credit crisis, when many rating agencies were criticized because their ratings turned out to be overly optimistic.  Next, the external partner might also have a different portfolio composition and adopt different model development methodologies and/or processes making a comparison less straightforward.  Also different rating philosophies might be used, whereby the benchmark rating system is either more point in time or through the cycle.  The default and loss definitions might differ, different LGD weighting schemes can be adopted, different discount factors, collection policies etc.  External benchmarking might also be complicated because of legal constraints whereby due to banking secrecy regulation information cannot be exchanged.  Credit risk is typically also an endogenous phenomenon, which is highly dependent upon the internal credit culture and/or process.  There is also a risk of cherrypicking whereby a close match external benchmark is selected without further critically evaluating it.

Given these complications with external benchmarking, the idea of internal benchmarking has been advocated.  It was first introduced by the Hong Kong Monetary Authority, HKMA, as illustrated by this quote (HKMA 2006):

Where a relevant external benchmark is not available (e.g., PD for SME and retail exposures, LGD, and EAD), an AI should develop an internal benchmark. For example, to benchmark against a model-based rating system, an AI might employ internal rating reviewers to re-rate a sample of credit on an expert-judgment basis

The internal benchmark can be a statistical or an expert based benchmark.  Consider for example a PD model built using a plain vanilla logistic regression model.  You can then consider building a neural network benchmark for example.  The performance of both the logistic regression and the neural network can then be contrasted.  Although the neural network is clearly a black box model, and can thus not be used as the final credit risk model, the result of this benchmarking exercise will tell us whether there are any non-linear effects in the data.  If it turns out that the neural network performs better than the logistic regression, you can then start looking for non-linear effects or interactions and try and add them to your logistic regression model to further boost its performance.  The benchmark can also be expert based.  Remember, an expert based benchmark is a qualitative model based upon expert experience and/or common sense.  An example of this could be an expert committee ranking a set of Small and Medium Sized Enterprises (SMEs) in terms of default risk by merely inspecting their balance sheet and financial statement information in an expert based, subjective way.  The ranking obtained by an expert based rating system can then be compared to the ranking obtained by the logistic regression for example.

A champion challenger approach can be used when doing benchmarking.  The current model is the champion, which is challenged by the benchmark.  If the benchmark beats the champion in performance, then it can become the new champion.  This way, models are continuously challenged and further perfected.