The Analytics Year in Review and Looking Forward to 2018

Posted on January 24, 2018

Contributed by: Seppe vanden Broucke, Bart Baesens

This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at briefings@dataminingapps.com and let’s get in touch!

It has been an interesting year for data science and analytics. Reinforcement learning and related deep learning approaches continued to improve, with Deepmind’s AlphaZero mastering not only Go but Chess as well. Meanwhile, lots of new deep learning frameworks popped up, with Facebook dropping PyTorch, Tensorflow reaching 1.0, and Amazon announcing Gluon, building on top of MXNet.

We have seen clever AI approaches in medicine, where it can help to detect skin cancer or diagnose irregular heart beatings, as well as more creative applications, such as spotting forgeries.

AI is surrounded by hype. But with hype comes failure as well. IBM Watson was last year’s prime example of over overhyped marketing and failed to deliver corresponding results. Lots of writers have been hating on IBM Watson, which is not surprising after its repeated failures in healthcare. Another story that quickly went haywire was on Facebook’s “Researchers shut down AI that invented its own language”. The title couldn’t have been further from the truth, but this didn’t stop media outlets all around to world to pick up a juicy story.

Another issue we’ve seen relates to reproducibility of all this deep learning spectacle. Deep learning models rely on a huge number of hyperparameters which must to be optimized in order to achieve results that are good enough to publish, and several researchers have raised concerns about the reproducibility of academic paper results. In Reinforcement Learning That Matters, researchers showed that the same algorithms taken from different code bases achieve vastly different results with high variance. In Are GANs Created Equal?, researchers showed that a well-tuned GAN using expensive hyperparameter search can beat more sophisticated approaches that claim to be superior. In On the State of the Art of Evaluation in Neural Language Models, researchers showed that simple LSTM architectures, when properly regularized and tuned, can outperform more recent models.

Finally, as the world goes crazy over AI, a lot has been said regarding ethics and morality of AI applications, too. More and more concern is raised over bias present in algorithms and data, and how we should deal with it. Many articles were written about Facebook’s uncanny ability to construct microprofiles, allowing advertisers to finetune which people to select. This also opens the door for nefarious actors to silence critics or influence democracy. In Something is wrong on the internet, James Bridle wrote about the disturbing way how algorithms automatically construct content to get clicks from young Youtube viewers, without any regards for morality. To quote: “This is being done by people and by things and by a combination of things and people. Responsibility for its outcomes is impossible to assign but the damage is very, very real indeed.” We are getting better to predict, but fail to understand. Combining this with all the research being done on adversarial attacks is enough to feel worried. And if that doesn’t sound scary enough, people using deep learning to paste your face over a promiscous videos also has many worrying about the implications…

Let’s take a look back at our predictions for 2017, and how they stacked up:

As companies take up new application areas and domains, analytics in the enterprise will continue to grow
No surprises here. Expect real-time and streaming analytics to grow, as well as more buzz around conversational intelligence, bots, smart agents, deep learning, and so on. Our general advice remains the same, however: search for solutions for your problem, not for solutions that will become problems.

This was an easy prediction to make, of course, but one which was certainly correct. Companies went crazy over chatbots, are experimenting with deep learning applications, and are looking at the “architectures of Giants” to learn how to extract insights from real-time event data.

As analytics and AI will become more personal, the concept of explainable AI will become more pressing as well
Not only “personal” in the form of personal assistants like Alexa and so on, but also in terms of having an impact on your life. Clients, customers and partners will expect more transparancy from your models going forward, as well as the right to challenge and tune them. Explainable artificial intelligence (XAI) will continue to rise.

Here in Europe, the aspect of transparancy certainly has many organizations worrying, as this is a big part of the EU General Data Protection Regulation, with is set to be enforced soon. In the realm of consumer goods, “personal assistants” have become a lot better, though perhaps not in the way how we’d like…

Governance and privacy will continue to remain hot topics
Disregarding that novel bleeding-edge algorithm, a new programming library or an API opening up so many new possibilities, companies have realised that data governance, privacy, and compliance; managing teams; ensuring reproducibility and monitoring and maintenance are crucial to making an analytics initiative succeed. The “things that happen after the model code is done”, so to say, will remain hot topics going forward.

Again, with the GDPR looming, this prediction certainly held true, though the “governance” of data science, including reproducibility, deployment, monitoring and maintenance remain hard topics to tackle in all but the best-engineered enterprises.

With that out of the way, here are our predictions for what we’re going to see in 2018:

The hype will not (yet) die out, though applications will become more serious
Contrary to some, we still think we’ll see a lot of sensationalized headlines in media regarding AI, robots and deep learning, and how it relates to society, employment, and so on. That said, we do believe that we’ll start seeing some “real” use cases and progress in industry in terms of AI and DL adoption.
Adoption will be driven through controlled deployment, novel hardware, and new modelling types
One of the main hurdles in terms of adoption of AI in industry lies in the fact that no one really knows how to deploy a deep learning environment at scale, safe for larger entities such as Google, Facebook, and so on. We expect to see more companies revealing their approach towards deployment, which will pave the way towards general best practices others will adopt, similar to what we have seen a couple of years ago in the realm of big data solutions.
In addition, we expect not only NVIDIA but also others such as Google and Intel continuing to announce specialized hardware for training AI models, though this will spell trouble as well: many smaller hardware startups will probably fail. Intel will continue to disappoint in this space, especially since it’s already been losing a lot of PR and trust after the Spectre and Meltdown vulnaribilities that surfaced recently.
Finally, we’ll continue to see new interesting types of models and architectures, especially in the areas of self-play (like AlphaZero), meta-learning and generative models, though not only for traditional data types such as voice, imagery and video, but for the modeling and prediction of other complex data structures as well, which will make these new approaches especially interesting in the enterprise.
Ethics, transparancy and explainability will remain hot topics in AI
Not very surprising, given our year in review overview above. Stories and requirements regarding ethics in data science and AI will increase even more. The population is becoming more and more aware of the disastrous effects of automation when applied incorrectly. Facebook, Twitter, Amazon and other might be facing a lot of headwind in 2018.
Face recognition is already being used in very dangerous settings, such as to detect a person’s sexual orientation. Algorithms are creating media that is indistinguishable from reality (also see this article on fake footage) — which is going to become a major problem, especially in today’s media setting which is all about rapid clicks, “engagement”, and being the first to put something out without much regard to truth. Policy makers are very much behind, so only expect slow reactions from this front.
Organizations, meanwhile, will try to improve goodwill and trust by scrambling to figure out how to make their models transparant and explainable, but mainly only to a believable extent without exposing “intellectual property” or “trade secrets”. Combining this “marketable explainability” with the fact that getting explanations from deep models is still an incredible challenge, we’re going to see solutions (separate models, even) that aim at creating “fake explainations”: explanations around a prediction of which the probability is high that a human target will find these confortable or intuitive, without it being the true or full explanation.