Classification: A Profit versus Resource Allocation Perspective

Classification: A Profit versus Resource Allocation Perspective

Contributed by Prof. dr. Bart Baesens and Prof. dr. Seppe vanden Broucke

Key take aways:

  • Two common goals of classification are discrimination and calibration;
  • The profit perspective directly optimizes profit. In this article, we review CS-logit as a cost-sensitive extension of logistic regression;
  • We also discuss the resource allocation perspective which takes into account the limited amount of resources (e.g., fraud investigators) into the classification problem formulation.

Introduction

Classification models aim at classifying instances into a predefined number of classes or categories.  Classification problems have been very popular for, e.g., churn prediction, fraud detection, credit scoring, response modeling, etc.  A classification problem can be specified in various ways depending upon the goal it needs to fulfil.  The goal of discrimination is to discriminate one class from another as good as possible by assigning well-estimated scores on any range that provide a good ordinal ranking of instances in terms of their behavior (e.g., credit risk, churn risk, fraud, etc.).  In certain business applications, one needs more than just an ordinal measure or score reflecting the behavior of interest.  More specifically, there might be a need to have well-calibrated probabilities that precisely measure the relative chance of occurrence of the event on a scale between zero and one.  In other words, next to an ordinal measure, one also needs an exact well-calibrated cardinal measure.  An example of this is credit risk where banks need to estimate exact probabilities of default.  In this article, we zoom in on two additional perspectives: the profit perspective and the resource allocation perspective.

Classification: the profit perspective

The goal of a classification model can also be to directly optimize profit.  In other words, the parameters of the model (e.g., weight in a logistic regression, splits in a decision tree, etc.) are determined so as to directly optimize profit rather than any statistical objective function (e.g., maximum likelihood for logistic regression, information gain for decision trees).  Obviously, this requires a profit function which is application specific.  In what follows, we give the example of fraud detection.  Suppose we want to classify claims or (e.g., credit card) transactions as fraudulent or legitimate.  The confusion matrix and accompanying costs are as depicted in Table 1 (Höppner et al., 2022).

Actual negative (legit)
y = 0
Actual positive (fraud)
y = 1
Predicted negative (legit)
c = 0
CTN = 0 CFN = Amount(A)
Predicted positive (fraud)
c = 1
CFP = Ca CTP = Ca

Table 1 Cost matrix for fraud.

When a legitimate transaction is classified as legitimate the cost is obviously zero as it successfully passes.  When a fraudulent transaction is classified as legitimate, the cost equals the amount of the transaction, claim, etc., which is instance dependent.  When a transaction is predicted as fraudulent, the cost is always Ca, which represents the fixed administrative cost for investigating the transaction (e.g., contacting the customer, asking for extra verification, etc.).  Either the transaction will then be halted when it turns out to be actually fraudulent, or successfully executed in case the investigation doesn’t signal any fraud.

Let’s now assume we start from a data set S with N transactions.  Assume we have a classification model f() (e.g., logistic regression, decision tree, XGBoost) trained on S. The classifier f() generates a predicted label ci for each transaction with i ranging from 1 to N.  The cost of the classification model then becomes

Using the numbers of the confusion matrix above, this reduces to

We can now train a cost-sensitive logistic regression model, CSlogit, which directly minimizes the following objective function

The si0,β) values represent the outcome of a logistic regression model with intercept β0 and weight vector β.  Note that a LASSO penalization term can also added to avoid overfitting.  The parameters β0 and β can then be directly optimized using gradient descent using the partial derivatives of the cost function with respect to each βj parameter:

where xij is an element of the data matrix X which is defined such that its first column consists of ones to cater for the intercept term β0.  As shown in (Höppner et al., 2022) the second order gradient of the cost function can also be easily calculated such that both the first and second order derivatives can be easily plugged into the XGBoost optimizer resulting into CSBoost.  In (Höppner et al., 2022), we evaluated CSlogit and CSboost for credit card and payment transactions fraud and illustrated how both outperform cost insensitive logistic regression and XGBoost in terms of the costs saved due to detecting fraud.

Classification: the resource allocation perspective

(Vanderschueren et al., 2024) provide yet another new perspective on specifying the goal of classification.  Consider the case of credit card fraud detection.  Fraud analysts can only investigate a limited number of transactions per time unit (e.g., day, week, etc.).  It is not known upfront whether investigating a transaction will uncover a fraudulent case.  Hence, the challenge is how to optimally allocate limited resources to maximize profit or how to optimally allocate fraud analysts to suspicious transactions to minimize fraud losses.  In traditional classification, one typically adopts a predict then optimize approach where a model (e.g., logistic regression, XGBoost) is first estimated to predict which instances are most likely to be fraudulent, which are then subsequently investigated.  (Vanderschueren et al., 2024) argue that this approach is suboptimal when resources (i.e., fraud analysts) are limited because a classification model does not take capacity limitations into account.  They introduce a novel learning to rank method that directly optimizes the assignment’s expected profit given limited, stochastic capacity of the resources.  By considering the available capacity during optimization, the model focuses on correctly ranking the most promising tasks, proportional to their likelihood of being processed under limited capacity.  In other words, the learning to rank method explicitly considers a task’s relevance and pay-off in comparison to the other available tasks. The key characteristic of this approach is that only relative positions in the ranking matter, corresponding to the need to prioritize tasks relative to each other.  You can see it summarized in Figure 1.  Note how the cumulative profit decreases for every red or unsuccessfully processed task.  The problem formulation itself essentially boils down to a balanced linear assignment problem.  The authors empirically demonstrate the superiority of their learning to rank method versus the traditional predict-then-optimize approach in terms of expected profit for churn prediction, credit scoring, direct marketing and fraud detection.

Figure 1 Learning to rank (Vanderschueren et al., 2023).

Conclusion

Two common goals of classification are discrimination and calibration.  In this article, we also discussed the profit perspective and reviewed CS-logit as a cost-sensitive extension of logistic regression.  Finally, the the resource allocation perspective takes into account the limited amount of resources (e.g., fraud investigators) into the classification problem formulation.

References

  • Höppner, S., Baesens, B., Verbeke, W., Verdonck, T. Instance dependent cost-sensitive learning for detecting transfer fraud, European Journal of Operational Research forthcoming, 2022, https://doi.org/10.1016/j.ejor.2021.05.028.
  • Vanderscheuren T., Baesens B., Verdonck T., Verbeke W., A new perspective on classification: optimally allocating limited resources to uncertain tasks, Decision Support Systems, forthcoming, 2024, https://doi.org/10.1016/j.dss.2023.114151.