This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at firstname.lastname@example.org and let’s get in touch!
Across the world, personal cell phones have become an essential part of people’s lives, as is evident by the rapid increase in number of mobile phone subscriptions, from 12% in 2000 to 96% in 2014. In particular, in most developed contries, everyone owns and uses a cell phone whereas in the developing contries, the ownership has reached 90%. For billing purposes, telecommunication providers keep logs of all the traffic that goes through each mobile phone, both phone calls and text messages. The logs keep track of the caller, the callee, time and duration, in addition to the antenna that carried the call, thus providing the approximate location of the phone. These enormous datasets are known as Call Detail Records – or CDRs for short – and they contain valuable information about real time interaction between millions of all kinds of people. As such, they provide a cross section of societies with representatives of all kinds of people, regardless of age, race and social status.
Over the last decade, CDRs have become a great source for research in various disciplines, such as physics, sociology, epidemiology, transportation and networking. This is clear by the high number of publications that have been summarized and discussed in a few recent surveys on the state of the art in the analysis of mobile traffic [1, 2, 3].
In an extensive overview of the current literature in large-scale mobile traffic analysis across multiple disciplines, Naboulsi et al.  defined mobile traffic analysis as ”the study of massive traffic datasets collected by mobile network operators to improve the understanding of natural or technological phenomena occurring at large scales, and to design solutions to issues they may yield.” In addition, they separated the studies in this rich field into three categories, which we will now proceed to describe.
Social Analysis: The research in this category has contributed to the field of sociology in various ways. Firstly, by building call graphs from the CDRs, researchers can analyse the structure of social interactions between people. In addition, the call graphs can be used to analyse relationships between these graphs and other social features, such as demographics. Other studies investigate the interactions between geographical and temporal features and the communication structure in the call graph and finally, the movements of people, can be used to understand the how deceases spread, thus giving new insight into epidemiology.
Mobility Analysis: We have mentioned that CDRs contain information about the location of cells phones and it has become a popular source of knowledge about peoples movements. In this regard, a distinction can be made between human mobility, which is the analysis of movements of people, both individually and collectively, and specialized movement patterns, by means of people’s usage of transportation systems, such as roads and trains. In addition, there exists vast literature about the validity of using mobile data for these purposes. This usage has been criticises on the grounds that CDRs do not provide the exact location of the users, only the location of the closest antenna.
Networking Analysis: Apart from understanding the social infrastructure of mobile networks and studying the movements of mobile phone users, CDRs can be used to understand and improve mobile network systems. Various applications exist, where this more technical standpoint has been taken in order to evolve the network infrastructure to better accommodate the dynamics of mobile demand. The two main research directions in this category are, on one hand, the characterization of mobile service usages and, on the other hand, the design and evaluation of solutions for networking, marketing and privacy.
One of the obstacles when it comes to CDRs is the preprocessing of these enormous datasets, which often contain billions of records. Deciding which information to use depending on the application is not easy and can possibly have a great effect on the results. However, it is clear form this brief overview, that there is a lot of interesting and valuable information hidden in CDRs, which further research will hopefully help reveal.
- Naboulsi, M. Fiore, S. Ribot, and R. Stanica. Large-scale mobile traffic analysis: a survey. IEEE Communications Surveys & Tutorials, 18(1):124–161, 2015
- Saramäki and E. Moro. From seconds to months: Multi-scale dynamics of mobile telephone calls. arXiv preprint arXiv:1504.01479, 2015
- D. Blondel, A. Decuyper, and G. Krings. A survey of results on mobile phone datasets analysis. EPJ Data Science, 4(1):1, 2015