Web Picks (week of 12 June 2017)

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

  • If Your Company Isn’t Good at Analytics, It’s Not Ready for AI
    Management teams often assume they can leapfrog best practices for basic data analytics by going directly to adopting artificial intelligence and other advanced technologies. But companies that rush into sophisticated artificial intelligence before reaching a critical mass of automated processes and structured analytics can end up paralyzed.
  • How to Call BS on Big Data: a Practical Guide
    While data can be used to tell remarkably deep and memorable stories, its apparent sophistication and precision can effectively disguise a great deal of bullshit.
  • 8 Ways Machine Learning Is Improving Companies’ Work Processes
    Today’s leading are starting to experiment with more-advanced uses of AI. Corporate investment in AI is predicted to triple in 2017, becoming a $100 billion market by 2025. This will no doubt have profound effects on the workplace. Here are some concrete examples of how AI and machine learning are creating value in companies today.
  • A simple neural network module for relational reasoning (paper)
    The authors of this (very accessible) paper propose a new neural network architecture that natively understands the way that various entities relate to one another.
  • Google Brain Residency
    Ryan Dahl is an accomplished software engineer who got to spend a year in the Google Brain Residency Program. In his post, Ryan describes what it’s like to dive into machine learning from an engineering perspective. His Conclusions section, especially, is a great take-away!
  • The Strange Loop in Deep Learning
    “Coincidentally enough, the [self-referential mechanism] is in fact is the fundamental reason for what Yann LeCun describes as “the coolest idea in machine learning in the last twenty years.””
  • Reasons to Switch from TensorFlow to CNTK
    “Deep learning has revolutionized AI over the past few years. Following Microsoft’s vision that AI shall be accessible for all instead of a few elite companies, we created the Microsoft Cognitive Toolkit (CNTK), an open source deep learning toolkit free for anyone to use. Today, it is the third most popular deep learning toolkit in terms of GitHub stars, behind TensorFlow and Caffe, and ahead of MxNet, Theano, Torch, etc. Given TensorFlow’s extreme popularity, we often encounter people asking us: why would anyone want to use CNTK instead of TensorFlow?”
  • Python For Finance: Algorithmic Trading
    “Technology has become an asset in finance: financial institutions are now evolving to technology companies rather than just staying occupied with just the financial aspect: besides the fact that technology brings about innovation the speeds and can help to gain a competitive advantage, the speed and frequency of financial transactions, together with the large data volumes, makes that financial institutions’ attention for technology has increased over the years and that technology has indeed become a main enabler in finance. Among the hottest programming languages for finance, you’ll find R and Python. In this tutorial, you’ll learn how to get started with Python for finance.”
  • Mosaic: processing a trillion-edge graph on a single machine
    Unless your graph is bigger than Facebook’s, you can process it on a single machine.
  • Hacks and optimizations for neural nets
    Training neural networks is as much of an art as it is a science. Here are some assorted techniques associated with performance that may come in useful when designing neural nets.
  • A Comparison of Advanced, Modern Cloud Databases
    In the last few years we’ve seen the emergency of some impressive DBMS cloud technology. It can be a little hard to keep track of the new entrants and track how exactly they differ from one another, so here I’ve tried to summarize various offerings and how they compare to one another.
  • Further Exploring Common Probabilistic Models
    “While there are many potential themes of probabilistic models we might explore, we’ll herein focus on two: generative vs. discriminative models, and “fully Bayesian” vs. “lowly point estimate” learning. We will stick to the supervised setting as well.”
  • Detecting Fraud at 1 Million Transactions per Second (video presentation)
    Interesting presentation from the R/Finance conference on predicting loan delinquency.
  • gcForest
    A python 2.7 implementation of ‘Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks’
  • Unsupervised real-time anomaly detection for streaming data (paper)
    “One fundamental capability for streaming analytics is to model each stream in an unsupervised fashion and detect unusual, anomalous behaviors in real-time. Early anomaly detection is valuable, yet it can be difficult to execute reliably in practice. We propose a novel anomaly detection algorithm based on an online sequence memory algorithm called Hierarchical Temporal Memory (HTM).”
  • Self-Normalizing Neural Networks (paper)
    “Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations. While batch normalization requires explicit normalization, neuron activations of SNNs automatically converge towards zero mean and unit variance. The activation function of SNNs are “scaled exponential linear units” (SELUs), which induce self-normalizing properties. This convergence property of SNNs allows to (1) train deep networks with many layers, (2) employ strong regularization, and (3) to make learning highly robust.”