QA: What are the top performing techniques used in predictive analytics?

By: Bart Baesens, Seppe vanden Broucke Read and comment on this article on Medium

This QA first appeared in Data Science Briefings, the DataMiningApps newsletter as a “Free Tweet Consulting Experience” — where we answer a data science or analytics question of 140 characters maximum. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.


You asked: What are the top performing techniques used in predictive analytics?

Our answer: In our experience, regression models, decision trees and random forests are the most popular and powerful predictive analytics techniques.

Both linear and logistic regression are simple to understand, yet powerful techniques. Decision trees provide the additional benefit of being interpretable as a set of If-Then rules. If you are only focused on predictive performance rather than interpretability and want to spend less time on feature engineering as you’d need to do when using logistic regression, then random forests are the method of choice.

Throughout all of our benchmarking studies in credit scoring, marketing analytics and fraud detection, we have found that random forests consistently outperform the other predictive analytics techniques.