Web Picks (week of 24 July 2017)

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

  • Jefferies gives IBM Watson a Wall Street reality check
    IBM’s Watson unit is receiving heat today in the form of a scathing equity research report from Jefferies’ James Kisner. The group believes that IBM’s investment into Watson will struggle to return value to shareholders. The discussion on Hacker News is also worth reading for quotes such as “IBM vastly over promises with their marketing. It is so frustrating to have to answer questions from the CEO about why we don’t solve all our problems with magic beans from IBM’s Watson.”.
  • The Business of Artificial Intelligence
    This online special edition of the Harvard Business Review explores what AI can and cannot do for your organization. New parts are being added through July 26. So far, we have:
    1) Whats driving the machine learning explosion
    2) Inside Facebooks AI workshop
    3) AI can be a troublesome teammate
    4) How AI fits into your data science team
  • Technical Debt in Machine Learning
    “Experienced teams know when to back up seeing a piling debt, but technical debt in machine learning piles extremely fast. You can create months worth of debt in a matter of one working day and even the most experienced teams can miss a moment when the debt is so huge that it sets them back for half a year, which is often enough to kill a fast-pacing project.”
  • A list of artificial intelligence tools you can use today — for businesses
    This list contains companies working on artificial intelligence and machine learning products primarily for business use, non-specific to any industry.
  • How Checkers Was Solved
    The story of a duel between two men, one who dies, and the nature of the quest to build artificial intelligence
  • I ask 100 information questions to four digital assistants
    All of them fail at least 50%
  • Cardiologist-Level Arrhythmia Detection With Convolutional Neural Networks
    “We develop a model which can diagnose irregular heart rhythms, also known as arrhythmias, from single-lead ECG signals better than a cardiologist.” Very impressive!
  • Robust Adversarial Examples
    Impressive (and scary) research from OpenAI: “We’ve created images that reliably fool neural network classifiers when viewed from varied scales and perspectives. This challenges a claim from last week that self-driving cars would be hard to trick maliciously since they capture images from multiple scales, angles, perspectives, and the like.”
  • Facets: An Open Source Visualization Tool for Machine Learning Training Data
    From Google Research: “Working with the PAIR initiative, we’ve released Facets, an open source visualization tool to aid in understanding and analyzing ML datasets. Facets consists of two visualizations that allow users to see a holistic picture of their data at different granularities. Get a sense of the shape of each feature of the data using Facets Overview, or explore a set of individual observations using Facets Dive. These visualizations allow you to debug your data which, in machine learning, is as important as debugging your model.”
  • How to make a racist AI without really trying
    “My purpose with this tutorial is to show that you can follow an extremely typical NLP pipeline, using popular data and popular techniques, and end up with a racist classifier that should never be deployed.”
  • Biased Algorithms Are Everywhere, and No One Seems to Care
    The big companies developing them show no interest in fixing the problem.
  • Synthesizing Obama: Learning Lip Sync from Audio
    “Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip.”
  • Microsoft releases Seeing AI, a free iOS app that narrates the world around you
    This looks very helpful!
  • CatBoost is an open-source gradient boosting library with categorical features support
    CatBoost is an algorithm for gradient boosting on decision trees. Developed by Yandex researchers and engineers, it is the successor of the MatrixNet algorithm that is widely used within the company for ranking tasks, forecasting and making recommendations. It is universal and can be applied across a wide range of areas and to a variety of problems.
  • Quilt is a data package manager
    Manage and version your data repositories like code.
  • Using Deep Learning to Create Professional-Level Photographs
    There are areas where objective evaluations are not available. For example, whether a photograph is beautiful is measured by its aesthetic value, which is a highly subjective concept. To explore how ML can learn subjective concepts, Google Research introduces an experimental deep-learning system for artistic content creation.
  • Our quest for robust time series forecasting at scale
    “We were part of a team of data scientists in Search Infrastructure at Google that took on the task of developing robust and automatic large-scale time series forecasting for our organization. In this post, we recount how we approached the task, describing initial stakeholder needs, the business and engineering contexts in which the challenge arose, and theoretical and pragmatic choices we made to implement our solution.”
  • Two Decades of Recommender Systems at Amazon.com
    Amazon is well-known for personalization and recommendations, which help customers discover items they might otherwise not have found. In this paper, Amazon discusses their recommendation system over the years.
  • When Not to Use Deep Learning
    Is the hype real or are linear models really all we need?
  • Recommendation System Algorithms Overview
    The Statsbot team has prepared an overview of the main existing recommendation system algorithms.
  • Keras Backend Benchmark: Theano vs TensorFlow vs CNTK
    The performance of 3 different backends (Theano, TensorFlow, and CNTK) of Keras with 4 different GPUs (K80, M60, Titan X, and 1080 Ti) across various neural network tasks are compared. CNTK comes out looking pretty good.
  • Predictive learning vs. representation learning
    “When you take a machine learning class, there’s a good chance it’s divided into a unit on supervised learning and a unit on unsupervised learning. I’d argue that this is deceptive. I think real division in machine learning isn’t between supervised and unsupervised, but what I’ll term predictive learning and representation learning.”
  • How to Visualize Your Recurrent Neural Network with Attention in Keras
    “In this tutorial, we will write an RNN in Keras that can translate human dates (“November 5, 2016”, “5th November 2016”) into a standard format (“2016–11–05”). In particular, we want to gain some intuition into how the neural network did this.”
  • Interpreting neurons in an LSTM network
    What do individual neurons in a LSTM network actually learn? How are they used to make decisions?
  • keras: Deep Learning in R
    With the rise in popularity of deep learning, CRAN has been enriched with more R deep learning packages.
  • Introducing tidygraph
    “I’m very pleased to announce that my new package tidygraph is now available on CRAN. As the name suggests, tidygraph is an entry into the tidyverse that provides a tidy framework for all things relational (networks/graphs, trees, etc.).”
  • Modeling documents with Generative Adversarial Networks
    “I presented some preliminary work on using Generative Adversarial Networks to learn distributed representations of documents at the recent NIPS workshop on Adversarial Training. In this post I provide a brief overview of the paper and walk through some of the code.”
  • A Modern Development Environment for Deep Learning
    DeepForge is a development environment for deep learning designed for simplicity, collaboration and reproducibility of experiments.
  • Alibaba Cloud
    A new cloud player enters the market. “ChinaConnect” looks to be interesting: Alibaba Cloud’s one-stop solution to help international companies do business in China, including ICP license support.
  • I trained an A.I. to generate British placenames
    The results were predictable.