This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at firstname.lastname@example.org and let’s get in touch!
Evaluating the performance of predictive models is an important step in any data driven application. In business scenarios, when such models are used to make decisions based on the data, taking into account the expected proﬁts resulting from their deployment has obvious beneﬁts because it facilitates putting an actual value on the model itself. Thus, the business reality becomes a part of the evaluation. The proﬁt centric Expected Maximum Proﬁt measure (EMP) was developed with this objective as it considers the business reality of binary classiﬁcation problems when evaluating model performance . It has been formulated for two scenarios by taking into account the speciﬁc requirements of each one: churn prediction and credit scoring. In the case of churn prediction, where the goal is to identify the customers that are most likely to leave the company, the measure maximizes the expected proﬁt accompanied by a retention campaign, when taking into account the losses and gains of misclassifying customers, the cost of the campaign and the retention offer . Here, we focus on the second scenario for the Expected Maximum Proﬁt measure: credit scoring.
Credit scoring has a long history in predictive modelling and it is important e.g. in ﬁnancial institutions because it concerns distinguishing defaulters from non-defaulters when it comes to granting loans. By building predictive models using historical data, a probability score is assigned to each customer, with a higher score meaning that the customer is more likely to default on the loan. By deciding a threshold or a cut-off value on the scores, the ﬁnancial institution subsequently grants a loan to everyone who has a score that is lower than the cut-off but rejects the application if the score is higher. The Expected Maximum Proﬁt measure takes into account the proﬁts received by the non-defaulters and the losses caused by the defaulters and computes an EMP value that is expressed as a percentage of the total loan amount and measures the incremental proﬁt relative to not building a credit scoring model .
In addition to returning a value for the Expected Maximum Proﬁt, the measure also provides a proﬁt maximizing fraction, which indicates the proportion of customers who should be rejected a loan, in order to achieve the maximum proﬁt. This fraction is a unique property of the measure and allows the modeller to compute the proﬁt of the model. Before computing the model proﬁt, some information is needed. First, the beneﬁt of correctly identifying a defaulter needs to be estimated. In the EMP it is typically deﬁned as
λ := (LGD·EAD) / A
where LGD is the loss given default, EAD the exposure at default and A is the principal of the loan. Furthermore, ROI, the return on investment of the loan, is assumed to equal the cost of incorrectly classifying a good applicant as a defaulter.
We will now demonstrate how to compute the model proﬁt using the example dataset in table 1 below:
We assume that a credit scoring model has already been built using a training set and that the data in table 1 is the corresponding test set that contains 5000 observations to evaluate the model. The ﬁrst step is to apply the model to the test set to produce the predicted scores in column ‘Predicted Score’. Next, the Expected Maximum Proﬁt measure is computed using the columns ‘ActualClass’ and ‘Predicted Score’ which, in this case, gives the result
EMP = 1.65%, EMP fraction = 0.0544.
The second value indicates that 5.44% of the loan applications should be rejected in order to achieve the EMP of 1.65%. This means that in the testset, 0.0544·5000=272 loans should be rejected. We label the customers accordingly by assigning the label ‘Default’ to the 272 customers with the highest scores and ‘Non-default’ to the rest.
This is indicated by the column ‘EMPClass’ in table 1. The next step is to compute the proﬁt (or loss) generated by each customer. We use the confusion matrix in table 2 that shows how much proﬁt is achieved when non-defaulters and defaulters are classiﬁed correctly and incorrectly. Note that in this case LGD = 0.7 and ROI = 0.2644. The proﬁt of each customer can be seen in the column ‘Proﬁt’. Finally, by adding together all the values in column ’Proﬁt’ the model proﬁt is achieved.
Above we have shown that the Expected Maximum Proﬁt measure can easily be applied to compute the proﬁt of credit scoring models. As a result, the modeller obtains a measure of performance that is not only highly relevant for the business user but also easy to comprehend.
-  Verbraken,T., Verbeke,W., & Baesens,B. (2013). A novel proﬁt maximizing metric for measuring classiﬁcation performance of customer churn prediction models. IEEE transactions on knowledge and data engineering, 25(5), 961-973.
-  Verbraken, T., Bravo, C., Weber, R., & Baesens, B. (2014). Development and application of consumer credit scoring models using proﬁt-based classiﬁcation measures. European Journal of Operational Research, 238(2), 505-513.