This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at firstname.lastname@example.org and let’s get in touch!
It has been an interesting year for data science and analytics. With Google and Facebook putting their weight behind deep learning, we saw Deepmind’s AlphaGo beating world champion Lee Sedol, Google Research figuring out the general semantics behind human languages, conversational commerce taking off, and a push towards new advances in reinforcement learning, with the announcement of OpenAI’s Universe still fresh in our minds.
Here’s what we think were the biggest defining trends of 2016.
Trend 1: Companies began to realise there’s more to analytics than big data plumbing
Did you know that Hadoop turned ten years old at the beginning of 2016? Depending on who you ask, the actual birth date of Hadoop is somewhat fuzzy, but the fact remains that Hadoop has become a relatively outdated concept. That is, companies have realised that the “big data hype” has passed, and have come to understand that big data is, fundamentally, nothing more than “data plumbing”, i.e. making sure that we can move around huge volumes of data, often at a lightning fast pace. In fact, Hadoop was not even designed for high-speed, interactive analytics and reporting.
While this has caused some sour feelings among some early adopters (and investors) who had hoped for more spectacular results right out-of-the-box, the good news is that managers are now fully aware of the fact that it’s really all about analytics and intelligence, and that even small (or weird, even) data can contain big insights. As such, we have observed a continued adoption of Apache Spark as being the preferred distributed analytics stack. With Hadoop’s limitations (such as the limited supply of skilled MapReduce programmers or the issue of building a SQL interface on top of Hadoop) now becoming clear, Spark has set out to offer a replacement which is designed from the start to handle analytics and intelligence workflows and tasks.
In 2016, we also saw a dramatic rise of various real-time and streaming solutions. Apache Kafka, for instance, had a great year thanks to the emerging requirements for analysing fast-moving data, as well did Apache Beam, Flink, Spark Streaming, and Apache Ignite. Expect various companies to burn themselves again in this field by eagerly adopting technologies before the dust has settled.
Trend 2: Deep learning remained the next big thing in research, but is slowly finding its way to business as well
It’s impossible to escape the hype surrounding deep learning, at least in academia. With convolutional neural networks already being “a thing of the past” that anyone with a hefty GPU and some free time on their hands can build, more attention was given in 2016 to reinforcement learning, natural language processing and generative adversarial networks. And indeed, the results are looking very promising, How about image to image translation, super-resolution of images, 3d volume generation, just to name a few examples?
Nevertheless, deep learning remains a fairly foreign concept in the enterprise, due to several reasons. First of all, since the research field is still in its infancy, finding talented engineers who can train and tune deep models are still wide and far between. Second, the time, data and capital investments are very steep, with larger models taking weeks to train, requiring hundreds of thousands of input instances, as well as beefy machines filled with GPU chips (by the way: NVIDIA has turned itself around to brand itself as an AI provider instead of a player in the computer games space, a move that made their stock price rise up like a rocket in 2016). Third — and perhaps most important — the business opportunities still seem rather scarce. Deep learning models are incredibly hard to interpret or explain (they’re the epitome of black boxes), and most companies cannot think immediately of use cases requiring computer vision or hearing.
That said, application areas are coming, as evidenced by recent try-outs from some innovate players. Some companies are already using deep learning models for voice analytics, i.e. to assess the sentiment of a customer calling in at a help line, or even to predict what the customer is going to buy based on what their face is telling.
Trend 3: As the public perception regarding analytics and AI is shifting, regulatory compliance, privacy and ethics were as important as ever
There’s a reason why Cathy O’Neil’s Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy is popping up on so many non-fiction best-seller lists. As models become increasingly complex, i.e. more deep and harder to explain but also rule so many aspects in our life, from credit scoring to employment, all the way down to predicting recidivism. People are becoming increasingly aware of what is being done with their data and are becoming more protective of their privacy and their rights to challenge a model’s conclusion. Even the White House released a statement regarding the promises and dangers of analytics: Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights.
2016 was also the year in which the European Commission adopted its data protection reform regulation, i.e. the General Data Protection Regulation (GDPR), set out to become effectively applied in 2018. The GDPR comes with some key changes which will heavily impact the way how companies run analytics projects. Companies are already scrambling to figure out how to setup data anonymization layers, or to deal with the requirement to delete certain data if asked to do so by an individual.
Based out on these findings, we formulate our analytics predictions for 2017 as follows:
- As companies take up new application areas and domains, analytics in the enterprise will continue to grow
No surprises here. Expect real-time and streaming analytics to grow, as well as more buzz around conversational intelligence, bots, smart agents, deep learning, and so on. Our general advice remains the same, however: search for solutions for your problem, not for solutions that will become problems.
- As analytics and AI will become more personal, the concept of explainable AI will become more pressing as well
Not only “personal” in the form of personal assistants like Alexa and so on, but also in terms of having an impact on your life. Clients, customers and partners will expect more transparency from your models going forward, as well as the right to challenge and tune them. Explainable artificial intelligence (XAI) will continue to rise.
- Governance and privacy will continue to remain hot topics
Disregarding that novel bleeding-edge algorithm, a new programming library or an API opening up so many new possibilities, companies have realised that data governance, privacy, and compliance; managing teams; ensuring reproducibility and monitoring and maintenance are crucial to making an analytics initiative succeed. The “things that happen after the model code is done”, so to say, will remain hot topics going forward.