Some have argued that big data is fundamentally about data “plumbing”, and not about insights, or deriving interesting patterns.  It is argued that value (the 5th V) can just as easily be found in “small”, normal, or “weird” data sets (i.e. data sets that wouldn’t have been considered before).  What are your thoughts on this?

By: Bart Baesens, Seppe vanden Broucke

This QA first appeared in Data Science Briefings, the DataMiningApps newsletter. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.


You asked: Some have argued that big data is fundamentally about data “plumbing”, and not about insights, or deriving interesting patterns.  It is argued that value (the 5th V) can just as easily be found in “small”, normal, or “weird” data sets (i.e. data sets that wouldn’t have been considered before).  What are your thoughts on this?

Our answer:

This is an open question. We have certainly seen many cases where clever analytics were done on non-big data sets, i.e. well-structured, tabular, relatively small (millions of records can even be considered smallish) data. In fact, most business analytics applications are still found in this realm. Think about marketing analytics such as churn prediction or propensity modeling, HR analytics, fraud analytics, credit risk modeling, and so on.

Fast Company has published an interesting article, “Your Garbage Data Is A Gold Mine”, where they talk about various applications of novel, “weird” or “exhaust” data sets. Which are not necessarily big, but interesting nonetheless. For example, network data outlines relationships and other signals from social networks, geospatial data lends itself to mapping, and survey data concerns itself with people’s viewpoints. As another example, the hedge fund BlackRock, for example, is using satellite images of China taken every five minutes to better understand industrial activity and to give it an independent reading on reported data.

The takeaway is that big data and analytics can go hand in hand, though this is not an immediate given. That is, it is not because you’re dumping massive amounts of data at high-speed in a Hadoop data lake that insights and analytics will appear out of nowhere. On the other hand, you don’t require huge volumes of data to extract insights that are valuable.