Relational Learning in Telco: a Short Introduction

Contributed by: María Óskarsdóttir, Jan Vanthienen, Bart Baesens

This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at briefings@dataminingapps.com and let’s get in touch!


Customer retention is a concern for many telco providers. Their goal is to approach and offer promotions to clients who are about to resign their contracts, i.e to churn. By using predictive analytics, they can build models to predict which clients are likely to churn. Usually, the prediction is a probability of belonging to the class of churners, and then everyone who has a probability greater than a predetermined cut-off value is assigned to the class of churners and the rest to the class of non-churners.

Classical methods in telco churn prediction use demographic and usage data of clients as input variables for classifiers such as logistic regression and decision trees. Although these models give good results, they are are missing an important part of the clients’ behaviour: their relationships with other clients. Recent research has shown, that using information gathered from social networks generated by call detail records enhances the performance of churn prediction models. This can be done by enriching the dataset with network variables such as link-based variables or network metrics such as degree and centrality, before applying the classifiers.

In addition there is a different kind of methods, which classify clients based on the type of clients they are in contact with. These methods are known as relational learners, because they learn a classification model using the relationships between clients. The relationships are derived from call detail records (CDR), which document every phone call and text message between clients. Thus, it is possible to construct a client network, where links exist between clients who call or text each other. In addition, weights are assigned to the links, by using the information of how often or for how long people talk together. Finally, each client belongs to one of the two classes: churner or non-churner.

Assume now, that there are clients in the network, for which the class is unknown. Relational learners can be used to predict to which class these clients belong. Relational learning consists of two methods that work together: relational classifiers and collective inference methods.

Relational classifiers are node centric, i.e. they consider one client at a time. They observe the weights and class membership of all clients to which the given client is linked. Then, using some rule, a churn probability or class label is inferred. For example, the Weighted Vote Relational Classifier determines a score based on the weighted average of the labels of the linked clients. Other relational learners include the Class Distribution Relational Neighbor Classifier, the Network-only Link Based Classifier and the Spreading Activation Relational Classifier.

Clearly, relational classifiers operate rather locally, i.e. only infer a probability for one client at a time. When a relational classifier is applied subsequently to each node of a network, the probabilities and labels of the nodes could change. To infer more accurate class labels, the classifiers should be applied iteratively until stability is reached. Collective inference methods control these iterations and decide how a final label is determined. In addition, collective inference methods, determine in which order the nodes are considered and also how the unknown values are estimated together.

Collective inference methods apply a relational classifier multiple times while keeping track of probabilities in each iteration and inferring the final probability of the node at the end. The Relaxation Labelling collective inference method, updates the scores of the nodes in each step and uses the new scores in the next classification iteration. In contrast, Gibbs Sampling sums the probabilities after each iteration and normalizes them at the end. Together, relational classifiers and collective inference methods, infer a stable score for each node in the network, which can be used directly as a churn probability.

Relational learners can be used on their own to predict churn or they can be used in combination with classical classifiers. The class probabilities can for example be used as an additional feature before applying a normal churn prediction method. Relational learners gather valuable knowledge about the clients, which in combination with already existing data can be used to achieve a more complete model which results in more accurate predictions for customer churn.

References

  1. Verbeke, D. Martens, and B. Baesens. Social network analysis for customer churn prediction. Applied Soft Computing, 14:431–446, 2014
  2. Lu and L. Getoor. Link-based classification. In ICML, volume 3, pages 496–503, 2003
  3. Dasgupta, R. Singh, B. Viswanathan, D. Chakraborty, S. Mukherjea, A. A. Nanavati, and A. Joshi. Social ties and their relevance to churn in mobile telecom networks. In Proceedings of the 11th international conference on Extending database technology: Advances in database technology, pages 668–677. ACM, 2008
  4. A. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study. The Journal of Machine Learning Research, 8:935–983, 2007