On the Role of Simplicity in Analytics

By: Bart Baesens, Seppe vanden Broucke Read and comment on this article on Medium

This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps.


William of Ockham (c. 1287–1347) was an English Franciscan friar and scholastic philosopher who already stated the basic principle of successful analytics as “Entia non sunt multiplicanda praeter necessitatem“, literally translated as “entities should not be multiplied unless absolutely necessary”. During the middle ages, multiplication was considered to be a complex operation, and it was only considered when strictly necessary.

Nowadays, multiplication is simple and easy with the availability of computers, but the level of complex operations has substantially increased. Translated to an analytical setting, Ockham’s principle basically states that analytical models should be as simple as possible, free of any unnecessary complexities and/or assumptions. Put in other terms, analytical models should be parsimonious and compact giving a clear insight into the underlying patterns in the data. Note that Ockham’s principle is also often referred to as Ockham’s razor since the razor is used to “shave away” unnecessary complexities and/or assumptions. In this blog article, we demonstrate the relevance of Ockham’s razor in analytics.

Analytical techniques come in many flavors: from complex mathematical models to simple and easy to understand parsimonious models. An example of a more complex model in analytics could be the following:

Probability of Churn = 2,34 
  – 3,58 * (age of customer)^2 * (income) 
  - 6,22 * (days since last purchase)^3
  - …

The above model could have been built and calibrated using a very complex analytical technique (e.g. neural networks). Obviously, the model is very black box and thus not very insightful. A more comprehensible model could be based on a set of if-then business rules, such as:

IF income > 2000 AND age of customer > 45
THEN Churn = FALSE
IF days since last purchase < 30 AND gender = female
THEN Churn = FALSE

These rules are clearly a lot more comprehensible. In analytics, it is highly preferred to have interpretable models to get a thorough understanding of the underlying problem. An important reason for this is that analytics should be actionable in the sense that the models should not only provide insights towards which customers will churn, for instance — and be accurate in doing so, but also give us insight into what actions could be undertaken to improve customer retention, which are much easier to infer from white-box models than complex black boxes. Sadly, interpretability often involves a trade off with model performance. Consider e.g. the complex mathematical formula depicted above. It may well be that this formula more accurately predicts customer churn that a set of simple if-then rules. This is a trade-off that needs to be typically evaluated by both the decision maker in collaboration with the data scientist. The experience and education of both will play a crucial role in providing an answer to this trade-off. It is important to note that although simplicity is an important focus, it should not become a blind obsession. In other words, when reality is complex to understand then enough room should be given to the analytical model to appropriately take this into account.

Continuing with our pursuit of simplicity and understandability in analytics, we strongly advocate analytics to focus on solutions rather than techniques. Far too many software tools and consulting solutions available in the industry nowadays focus on providing a whole range of complex analytical techniques from which it is not immediately clear how they can be adopted to solve practical problems. In order to bridge the gap towards decision makers and increase chances of success, software should be oriented towards solutions, such as managing churn or fraud, dealing with credit risk, etc. rather than techniques.

Finally, in order to manage the successful deployment of analytical models and guarantee their simplicity, corporate governance and management oversight is needed. Appropriate organizational procedures should be put in place to optimally safeguard simplicity and thus comprehensibility. A very useful idea here is the concept of analytical model boards which is a set of people that closely follows up the analytical model development. It includes all business users that will end up using the analytical models. A simple rule learns that if people don’t understand a model, they will simply not be using it! By including them from the outset of the analytical model development hereby making sure that they clearly understand the model (and its limitations!), we improve our chances of successful analytical model deployment.

To summarize, the key takeaways of this article are:

  • Ockham’s razor is key to successful analytics: models should be kept as simple as possible, but also not simpler than that.
  • Analytical software and consulting solutions should be solution oriented and not technique oriented. Tools (and data scientists) should ask which question you aim to answer, not which technique you aim to apply.
  • Corporate governance and management oversight is key to the success of analytics! One way to achieve this is by using the concept of model boards.