How can networked data be leveraged for analytics?

By: Bart Baesens, Seppe vanden Broucke

This QA first appeared in Data Science Briefings, the DataMiningApps newsletter as a “Free Tweet Consulting Experience” — where we answer a data science or analytics question of 140 characters maximum. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.

You asked: How can networked data be leveraged for analytics?

Our answer:

Network based relationships can be explicit or implicit.  Examples of explicit networks are calls between customers, shared board members between firms, and social connections (e.g. family, friends, etc.).  Explicit networks can be readily distilled from underlying data sources (e.g. call logs) and their key characteristics can then be summarized using featurization procedures.  In our previous research (Verbeke et al., 2014; Van Vlasselaer et al., 2017), we found network data to be highly predictive for both customer churn prediction and fraud detection.  Implicit networks or pseudo networks are a lot more challenging to define and featurize.  Martens and Provost (2016) built a network of customers where links were defined based upon which customers transferred money to the same entities (e.g. retailers) using data from a major bank.  When combined with non-network data, this innovative way of defining a network based upon similarity instead of explicit social connections gave a better lift and profit for almost any targeting budget.  In another, award-winning study they built a geosimilarity network among users based upon location-visitation data in a mobile environment (Provost et al., 2015).  More specifically, two devices are considered similar and thus connected, when they share at least one visited location.  They are more similar if they have more shared locations and as these are visited by fewer people.  This implicit network can then be leveraged to target advertisements to the same user on different devices or to users with similar tastes, or to improve online interactions by selecting users with similar tastes.  Both of these examples clearly illustrate the potential of implicit networks as an important data source.  A key challenge here is to creatively think about how to define these networks based upon the goal of the analysis.

  • Martens D., Provost F., Mining Massive Fine-Grained Behavior Data to Improve Predictive Analytics, MIS Quarterly, Volume 40, Number 4, pp. 869-888, 2016.
  • Provost F., Martens D., Murray A., Finding Similar Mobile Consumers with a Privacy-Friendly Geosocial Design, Information Systems Research, Volume 26, Issue 2, pp. 243 – 265, 2015.
  • Van Vlasselaer V., Eliassi-Rad T., Akoglu L., Snoeck M., Baesens B., GOTCHA! Network-based Fraud Detection for Security Fraud, Management Science, forthcoming, 2017
  • Verbeke W., Martens D., Baesens B., Social network analysis for customer churn prediction, Applied Soft Computing, Volume 14, pp. 341-446, 2014.