What package can we use to calculate weights of evidence and information value in R?

By: Bart Baesens, Seppe vanden Broucke

This QA first appeared in Data Science Briefings, the DataMiningApps newsletter. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.


You asked: What package can we use to calculate weights of evidence and information value in R?

Our answer:

I suggest you use the package Information written by Kim Larsen. Here is a small example of how to use it using the hmeq (=home equity loans) data set which can be downloaded from http://www.creditriskanalytics.net/:

library(Information)
#Information package author is Kim Larsen
hmeq <- read.csv("c:/temp/hmeq.csv")
IV <- create_infotables(data=hmeq, y="BAD")
print(head(IV$Summary))
MultiPlot(IV,"LOAN")

The result is as follows:

Variable                  IV
DEBTINC            1.8771930
DELINQ             0.5653247
VALUE              0.4703797
DEROG              0.3471889
CLAGE              0.2301126
LOAN               0.1630072

You can see that DEBTINC is the most predictive variable in terms of information value.

The weights of evidence (WOE) plot for the LOAN variable looks as follows:

Remember, positive (negative) weights of evidence means less (more) risk.