The Future isn’t What it Used to be: Hierarchical Forecasting on All Levels

Contributed by: Tine Van CalsterWilfried Lemahieu, Bart Baesens

This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at and let’s get in touch!

Sales forecasting has been a popular subject for decades, both in the industry as in research facilities. The search for accurate prediction is a reason for frustration for many, as the range of methods and parameters does not seem to come to an end. Questions such as ‘What data do I need and how do I gain access to it?’ and ‘How far do I need to look in the past?’ are always raised, together with many others. The truth is that there is not just one perfect answer to all of these questions, as every business has its own particular issues to deal with.

First, we should take a look at the requirements for the forecast, as they tend to be different in every situation. For example, being able to predict extremely accurately might be more important than being able to predict extremely fast or vice versa. The ultimate goal, of course, is an optimal combination of both aspects, but the reality is often located somewhere in between. Next, the operational use of the forecast determines which methods should be applied. If a company wants to measure the impact of particular strategic planning parameters on the estimate, they can only use transparent techniques that explain their own behavior. If the estimate will only be used as a number that requires no further clarification, we can turn to machine learning techniques that act as a black box or we can simply make an estimate based on time series analysis, which does not require any external factors at all. Many situational factors guide the estimation process, which makes it difficult to outline a general framework.

For sales forecasting in particular, another interesting aspect comes into play. When companies sell multiple products, the concept of product hierarchies is introduced, as the business can make use of different levels of specificity. The first question that comes to mind, deals with the suitable level of the forecast. Generally, the main focus lies on the lower levels of the hierarchy, as most day-to-day strategic decisions are made on more local geographical or product levels. However, exactly these levels have proven to be the most challenging forecasts, as many external parameters have an effect on the accuracy of the estimate. In addition, there are usually more models to train on the lower levels, resulting in a larger computation cost. The goal of our research, however, is to achieve good forecasts on as many levels as possible by combining techniques, while also ensuring the usability of the final model in terms of accuracy, transparency and cost.

In order to tackle hierarchical forecasting, two general points of view are often suggested and compared using time series analysis. Firstly, we can forecast in a top-down direction by estimating our predictions at a high level and dividing this prediction according to the past proportions of the local levels. The advantage of this approach is obvious: as we only estimate one model (or a limited number of models), the computation time is severely reduced. However, the downside of this technique is the loss of information, because we do not take into account any peculiarities of the local levels. For example, if we want to forecast the sales for Belgium based on a model for the global sales, the overall sales forecast for next month might be higher than last year, but that does not mean sales necessarily go up for Belgium. We can find the answer to this problem of information loss by reversing the direction of estimation and conducting our forecast in a bottom-up manner. We estimate our sales on more local levels and simply aggregate the results to achieve a global forecast. However, now we have to estimate a larger number of models, which leads to a higher computation cost. This bottom-up method has proven to be more accurate on local levels, as they use as much information as they can, but this does not necessarily mean that the technique performs equally for higher-level forecasts.

In our research, we have focused on the accuracy of global forecasts by bottom-up forecasting with multiple hierarchies as a first step. A benchmark with time series analysis suggests that aggregating has its limits for the accuracy of a high-level forecast. As we do not take into account any external factors, the estimates on the most local levels are unstable and provide inaccurate aggregations for higher-level estimates. However, if we limit the hierarchies to intermediate and higher levels, the combination of the time series techniques and the bottom-up forecasting method leads to accurate forecasts with a low computation cost. This balance can be achieved by pin-pointing the lowest level that is able to deal with a lack of external factors. The selection of this level, however, is context-dependent and extremely data-driven, so we expect some variation between use cases. This finding implicates that for a significant proportion of the hierarchy levels, we do not need any external parameters at all, so we can eliminate (or at least restrict) the troubling process of finding this data for the levels concerned.

The next step in the process of finding the optimal combination of models focuses on the more local levels of the hierarchies. Our approach deals with the two main issues of local forecasts: the correct amount of external data to include without creating too much variance in the forecasts, and the substantial computational cost that is caused by the large number of models to train. We tackle the first problem by incrementally adding parameters to the models for each level, while making sure that the overall model remains usable and generalizable for all instances. The second problem requires us to move into the world of machine learning techniques, where we can reduce the number of models to train by means of clustering similar items of the same level.


  • Athanasopoulos, George, Roman A. Ahmed, and Rob J. Hyndman. “Hierarchical forecasts for Australian domestic tourism.” International Journal of Forecasting 25.1 (2009): 146-166.
  • Dangerfield, Byron J., and John S. Morris. “Top-down or bottom-up: Aggregate versus disaggregate extrapolations.” International Journal of Forecasting 8.2 (1992): 233-241.
  • Hyndman, Rob J., et al. “Optimal combination forecasts for hierarchical time series.” Computational Statistics & Data Analysis 55.9 (2011): 2579-2589.