This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at firstname.lastname@example.org and let’s get in touch!
Exploiting the Word-of-Mouth effect has always been an important consideration in marketing campaigns . In particular, referral programs employed in retail are seen as a powerful tool in new customers acquisition . Detecting potential referrers in a customer base can bring many benefits ranging from better estimation of referral reward costs to better customer retention management.
The modern world is heavily interconnected: networks are everywhere . Leveraging network information has been proved to be beneficial in such domains as credit scoring, churn prediction and fraud detection [4, 5 ,6]. The underlying idea of using the information from the networks is a concept of homophily that can be illustrated by the expression “Birds of a feather flock together” . Relationships express the ways of influence sources, and the connections between people can tell a lot about an individual serving as an additional source of information. The retail domain is not an exception: customer networks are widespread and is of a high importance for a business! For example, knowing that an individual comes from a big family with many family members living together in a same neighborhood and being economically dependent on each other gives us an additional information about the influence possessed by this individual. Regarding the marketing referral reward programs, this individual potentially has a high capacity of bringing new people to the customer base in case being targeted with a promotion. The customers with such a referral capacity can be seen as influencers and the network is assumed to play an important role in determining these influencers. Correctly predicting potential influencers is a way to optimize the targeted marketing campaigns and ameliorate customer acquisitions rates.
The question, however, is how to incorporate network information into the prediction model. Technically speaking, the network (can be also referred as a graph) is a collection of nodes (vertices) along with identified pairs of nodes called edges . Both nodes and edges can have features associated with them. Traditional machine learning models expect the data to be presented in a Euclidean space which is not the case for a network data. Hence, network information should be extracted and presented in a form accepted by the model. The process of encoding graph topology information is called a graph representation learning. The network topology can be simply encoded with neighborhood metrics, centrality metrics and collective inference algorithms . More advance approaches include Graph Neural Networks (GNNs) that can learn the representations of the network with neural networks. In particular, the representation of the network can be learnt in a similar manner as the image representation is learned with Convolutional Neural Networks (CNNs) . Graph Convolutional Networks (GCNs) can be seen as a generalization of CNNs operating on graph data by inspecting neighboring nodes and aggregating their information in order to obtain a representation of a node. There exist other types of GNNs for representation learning, e.g., GraphSAGE, Graph Attention Networks (GATs) and Graph Isomorphism Networks (GINs), that utilize different architectures for learning network representations [8, 9, 10].
The network representation learned with the GNNs can be used for downstream tasks which includes the aforementioned problem of influencer detection. But what if the network is changing over time and learning a representation once is not sufficient? How dynamics of the network can be taken into consideration? Here come Recurrent Neural Networks (RNNs)! The Long Short-Term Memory Neural Network (LSTM) and the Gated Recurrent Unit Neural Network (GRU) are the most renowned RNN types that are capable of solving temporal problems on sequential data [11, 12]. Both LSTMs and GRUs have some kind of “memory” as they utilize information from prior inputs to influence the current input and output.
As it is clear now how to encode the temporal features of the data, we can combine the best of two worlds in order to be able to work with the graphs presented as a time-series of snapshots in time. The first part of the problem is encoding network topology with GNNs to obtain a static network representation – we call this part an Encoder part. Next step is to bring in the dynamic component, i.e., RNN model, to learn a dynamic representation based on the embeddings generated by GNN at each timestamp. Final step is to apply a simple fully-connected network (FCN) that predicts a final probability of a node being an influencer. We call the combination of RNN and FCN a Decoder.
Now it is time to discuss how we used the methodology described above in our recent paper . We were confronted with a problem of influencer detection on a dynamic network of attributed SuperApp users and colored edges between them (colors represent different edge types) and nodes being labelled as either influencer or non-influencer. The users who influenced at least one user to acquire a credit card were considered influencers.
Once the network has been constructed, it is time to decide on the encoder and decoder parts of the framework. We chose one of the most popular GNN in the graph literature as an encoder, i.e., Graph Convolutional Network (which has been already described above) and Graph Attention Network (GAT). GAT makes several improvements of GCN, namely, it assigns importance to edges following the attention mechanism and is capable of incorporating the edge features information during network representation learning. For the decoder part, we used LSTM and GRU. Hence, we get four encoder-decoder parts, i.e. GCN-LSTM, GCN-GRU, GAT-LSTM and GAT-GRU. We compared our proposed models with six baselines: using only node features in an encoder and applying RNN models (both LSTM and GRU) in the a decoder part and adding a PageRank metric as an additional node feature. We also benchmarked our approach with static GNNs.
In our research we found that neighbor feature representations captured by GNNs are more important than centrality measures such as PageRank especially when it comes to generalizing to unseen nodes where using multi-head attention in the encoder boosts the performance . Another finding is that capturing the dynamics of the networks plays an important role and boosts the performance. We believe that our research is a good starting point for leveraging network data for influencer detection and applying our framework in practice will facilitate more informed decision-making and optimize the referral program management.
For more details, we invite you to check out our paper here.
- Rosen, E. (2002). The anatomy of buzz: How to create word of mouth marketing.
- Ryu, G., & Feick, L. (2007). A penny for your thoughts: Referral reward programs and referral likelihood. Journal of marketing, 71(1), 84-94.
- Baesens, B., Van Vlasselaer, V., & Verbeke, W. (2015). Fraud analytics using descriptive, predictive, and social network techniques: a guide to data science for fraud detection. John Wiley & Sons.
- Óskarsdóttir, M., Bravo, C., Sarraute, C., Vanthienen, J., & Baesens, B. (2019). The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics. Applied Soft Computing, 74, 26-39.
- Óskarsdóttir, M., Bravo, C., Verbeke, W., Sarraute, C., Baesens, B., & Vanthienen, J. (2017). Social network analytics for churn prediction in telco: Model building, evaluation and network architecture. Expert Systems with Applications, 85, 204-220.
- Van Belle, R., Mitrović, S., & De Weerdt, J. (2020). Representation learning in graphs for credit card fraud detection. In Workshop on mining data for financial applications(pp. 32-46). Springer, Cham.
- Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
- Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. Advances in neural information processing systems, 30.
- Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. stat, 1050, 20.
- Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2018). How powerful are graph neural networks?. arXiv preprint arXiv:1810.00826.
- Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
- Tiukhova, E., Penaloza, E., Óskarsdóttir, M., Garcia, H., Bahnsen, A. C., Baesens, B., … & Bravo, C. (2022). Influencer Detection with Dynamic Graph Neural Networks. arXiv preprint arXiv:2211.09664.