Does it make sense to do categorization of continuous variables?

By: Bart Baesens, Seppe vanden Broucke

You asked: Does it make sense to do categorization of continuous variables?

Our answer:

For continuous variables, categorization may be very beneficial. Consider e.g. the age variable and its risk as depicted in the figure below.  Clearly, there is a non-monotonous relation between risk and age. If a non-linear model (e.g. neural network, support vector machine) would be used, then the non-linearity can be perfectly modeled. However, if a regression model would be used (which is typically more common because of its interpretability), then since it can only fit a line, it will miss out on the non-monotonicity. By categorizing the variable into ranges, part of the non-monotonicity can be taken into account in the regression. Hence, categorization of continuous variables can be useful to model non-linear effects into linear models.