Big Data versus Teenage Sex

Posted on November 24, 2015

Contributed by: Bart Baesens, Wilfried Lemahieu, Monique Snoeck, Seppe vanden Broucke

This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at briefings@dataminingapps.com and let’s get in touch!

Big data and Analytics: terms that frequently pop up in newspapers, magazines, airports or even during pub chats to pimp a conversation. These days, everybody talks about it, nobody knows how to tackle it, everybody thinks the others are doing it and hence claims to do it as well, but as the title suggests only the fortunate have (positive) experience(s) with it. But what is big data actually? Let’s start by narrowing down the perspective to size only and consider more than 1 terabyte of data. Do we know of any successful business cases that create added value by storing, analysing and managing more than 1 TB of data? Of course we do, but they are too often limited to domains such as astronomy, bio-informatics (e.g. genomics) and only rarely in business applications such as risk management, fraud detection, marketing, or supply chain management. In this contribution, we would like to share some of our experiences originating from various research partnerships which we recently initiated with a diverse set of firms and institutions operating in sectors such as banking, retail, and government.

A first issue concerns the organizational aspect. How can this new technology be successfully embedded into a company’s DNA? A first option would be to set up a companywide Analytical Centre of Excellence, and staff it with data scientists handling all Big data & Analytics requests from the various departments. It is our experience that such a centralized approach oftentimes simply doesn’t work. To fully leverage and compete on analytics requires business knowledge, implying that the data scientists should be close to the business. In an earlier column, we described the ideal skill mix of a data scientist as follows: quantitative skills, ICT skills (e.g. programming), business knowledge, communication and presentation skills, and creativity. In other words, a data scientist is a multidisciplinary profile, and to fully exploit this unique skill set another organizational approach is needed based upon the principle of subsidiarity.

The main idea is that a centralized unit should only manage the issues that cannot be successfully managed by the local business units, such as managing the ICT environment (both hardware and software), privacy rules, model governance, documentation, etc. A substantial amount of data scientists should be directly embedded into the individual business units, such that the analytics projects can be well-focused and fed with the right business knowledge. Business ownership for every analytical project is essential, but also the cross-fertilization between data scientists across business units is important. A well-focused, centralized analytics unit can play a key role in evangelizing, stimulating and communicating good practices and lessons learned (and vice versa prevent repeated rookie mistakes). A closely related attention point concerns the sharing of (different types of) data across business units, since this is precisely where the added value is to be situated! Hence, we prefer to not only consider size when talking about Big Data, but also to take into account the new insights that can originate from coalescing different data sources, both structured (e.g. transactional data) as well as unstructured (e.g. server logs, click streams or social media feeds).

A second attention point concerns the economic value of Big Data & Analytics investment. Firms only invest in a new technology when a positive return is anticipated. Although the costs of an analytical project are fairly easy to grasp (e.g. acquisition and (post-) ownership costs), this is far less evident for the benefits. Our experiences indicate that firms primarily invest in Big Data & Analytics under competitive pressure, rather than based upon their firm belief in its positive return. The latter is however clearly the case. Just think about new strategic opportunities by better targeting customer segments, identifying new product needs, or anticipating customer behavior. These benefits are however hard to precisely quantify upfront and it’s our belief that the fruits of the investment are harvested about 3 to 5 years after the initial investment, although reaping some low hanging fruits in the initial stages of a project is also an explicit concern, if only for the sake of management buy-in. At our university (and undoubtedly many others as well), we teach our students to adopt a long term perspective when making investments. Unfortunately, due to both internal and external (not seldom stock market) pressure for immediate results, companies are far too often short sighted, hereby impeding the adoption of new technologies (e.g. Big Data & Analytics) to foster sustainable growth.

To summarize, in this column we highlighted some of the organizational and economic challenges that come with Big Data & Analytics. We welcome any shared experiences (both confirming and contradicting). We are also happy to refer to our Master of Information Management programme (www.kuleuven.be/ma/mimel) in which the above topics are covered into more detail.