Optimizing Maintenance with Causal Machine Learning

This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at briefings@dataminingapps.com and let’s get in touch!

Contributed by: Toon Vanderschueren.

Asset maintenance poses a significant challenge for companies. Ensuring that assets are well-maintained is crucial for attaining production goals, streamlining the supply chain, and satisfying customers. Additionally, maintenance is an important cost, estimated to be between 15-40% of total production costs (Dunn, 1987; Lofsten, 2000), and an key driver of profit (Alsyouf, 2007). Therefore, companies continuously aim to scrutinize, improve and optimize their maintenance operations.

The objective of effective maintenance planning is to avoid asset failures and costly overhauls while minimizing expenses related to maintenance activities, such as proactive replacement of parts or allocation of technicians. A straightforward and widely adopted approach is periodic preventive maintenance, where maintenance is performed at regular intervals over time. The challenge then becomes determining the optimal frequency of maintenance. Should it be every day, every week, or every month? While more frequent maintenance leads to fewer failures, it also incurs higher maintenance costs.

Figure 1: What is the optimal frequency of preventive maintenance to balance the cost of maintenance and the cost of asset failures?

Existing work typically establishes a mathematical model of an asset’s failure behavior and determines the optimal frequency of preventive maintenance within that model. However, these existing methods rely on several questionable assumptions. First, they model the asset’s overhaul and failure rate using a stochastic process whose parameters are hard to verify empirically. Second, they assume a certain deterministic or stochastic factor to represent the impact of maintenance on the failure and overhaul rate. Third, they neglect to consider asset-specific characteristics into account and instead assume uniform hazard rates and maintenance effects. However, older, worn-down assets might require more maintenance than brand new assets. As these assumptions do not align to reality, the resulting maintenance frequency will likely not be optimal.

We suggest a novel approach: instead constructing a mathematical model of the effect of preventive maintenance, why not learn the effect from historical, observational data of assets we maintained in the past? The challenge lies in the fact that, for each asset, we only observe what happens for one single maintenance frequency. We never observe what would have happened if more or less maintenance had been applied. If only we knew this, determining the optimal maintenance frequency would be straightforward. Fortunately, by combining the principles of causal inference with machine learning, we can answer counterfactual questions such as “What would have happened to an asset if it had received more or less maintenance?”

Figure 2: We only observe the failures for one specific maintenance frequency: this constitutes the factual outcome. Unfortunately, we never observe what would have happened if a different maintenance frequency was applied. These are referred to as counterfactual outcomes.

Causal inference provides a principled approach for preventive maintenance. Ideally, we would answer these counterfactual questions through experiments or randomized controlled trials. We would divide our assets in different groups and randomly assign different maintenance frequencies to each group. After a period of time, we would compare the failure frequencies across the groups. Although this experimental approach is straightforward in theory, it can be difficult to implement in practice, prohibitively expensive, or even unethical (consider maintaining life-support equipment in hospitals).

Fortunately, it is possible to perform causal inference using historical, observational data on previously maintained assets, provided that some requirements are met. The advantage of this data is that it is cheap and often readily available. However, observational data presents an additional challenge. Past maintenance frequencies were not assigned randomly, but rather based on a pre-existing policy or the maintenance department’s expertise, potentially taking into account the characteristics of the asset. For example, older assets might have received more frequent maintenance. If we do not account for this non-random treatment assignment, using observational data can result in biased estimates of maintenance effects and, as a result, suboptimal maintenance decisions.

In our research, we use machine learning to address the issue of non-random assignment of maintenance frequencies in observational data and mitigate the resulting biases. Machine learning provides us with the flexibility to learn the maintenance effects from historical, observational data without needing to impose strict parametric assumptions. This approach allows us to infer the relationships between the asset’s characteristics, maintenance frequency, and outcomes, by leveraging historical data. Specifically, we rely on a methodology called SCIGAN (Bica et al., 2020), which uses a generative model to simulate the counterfactual outcomes for the assets in the training data.

By combining the principles of causal inference and machine learning , our approach can predict how often a certain asset will fail and how many overhauls would be required for a certain preventive maintenance frequency. These predictions allows us to choose the maintenance frequency that minimizes the overall expected cost resulting from failures, overhauls, and preventive maintenance. In our paper, we illustrate this approach empirically using real-world data on over 4,000 machines. The results show that our method is capable of assigning maintenance frequencies that are close to optimal in terms of accuracy and cost.

Causal machine learning presents a cutting-edge solution for preventive maintenance optimization. By leveraging historical data on assets that were previously maintained, our approach harnesses the power of causal inference and machine learning to determine the optimal maintenance frequency in a completely data-driven manner. Our recent publication in the International Journal of Production Economics, “Optimizing the preventive maintenance frequency with causal machine learning” (Vanderschueren et al., 2023), provides more information and insights into this innovative approach.

References:

  • Dunn. Advanced maintenance technologies, Plant Engineering, 40 (12) (1987), pp. 80-82
  • Lofsten, H. (2000) Measuring maintenance performance – in search for a maintenance productivity index, International Journal of Production Economics, Vol.63, pp. 47-58.
  • Alsyouf, I. (2007). The role of maintenance in improving companies’ productivity and profitability. International Journal of production economics, 105(1), 70-78.
  • Bica, I., Jordon, J., & van der Schaar, M. (2020). Estimating the effects of continuous-valued interventions using generative adversarial networks. Advances in Neural Information Processing Systems, 33, 16434-16445.
  • Vanderschueren, T., Boute, R., Verdonck, T., Baesens, B., & Verbeke, W. (2023). Optimizing the preventive maintenance frequency with causal machine learning. International Journal of Production Economics, 108798.