Web Picks (week of 17 April 2017)

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

  • Teaching Machines to Draw
    In a recent paper, Google presents a generative recurrent neural network capable of producing sketches of common objects, with the goal of training a machine to draw and generalize abstract concepts in a manner similar to humans.


  • How You Battle the “Data Wheel of Death” in Growth
    “Data Isn’t Constantly Maintained -> Data Becomes Irrelevant / Flawed -> People Lose Trust -> They Use Data Less — If the above looks familiar, you’re not alone. I estimate that greater than ⅔ of data efforts at companies fail. This is trouble because data plays a key horizontal role in the growth process and mindset. Without good data, it’s not possible to run a legitimate experimentation cycle.” This must-read takes a look at 4 reasons why well-meaning data efforts fail so often, and what you can do about it.


  • Jupyter Notebook 5.0
    This is the first major release of the Jupyter Notebook since version 4.0 and the “Big Split” of IPython and Jupyter. This release adds some long-awaited features, such as cell tagging, customizing keyboard shortcuts, copying & pasting cells between notebooks, and a more attractive default style for tables.


  • Unsupervised sentiment neuron
    “We’ve developed an unsupervised system which learns an excellent representation of sentiment, despite being trained only to predict the next character in the text of Amazon reviews.”


  • Google shuts down Burger King’s cunning TV ad
    Just under three hours after Burger King unveiled a new advertisement designed to hijack your Google Home to read a long-winded description of its Whopper burger, Google has disabled the functionality.


  • Understanding the limits of deep learning
    Artificial intelligence has reached peak hype. News outlets report that companies have replaced workers with IBM Watson and that algorithms are beating doctors at diagnoses. New AI startups pop up everyday, claiming to solve all your personal and business problems with machine learning. Much of the AI hubbub is generated by reporters who’ve never trained a neural network and by startups or those hoping to be acqui-hired for engineering talent despite not having solved any real business problems. No wonder there are so many misconceptions about what AI can and cannot do.




  • Introducing Pixie, an advanced graph-based recommendation system
    At Pinterest, a primary engineering challenge is helping people discover and do things every day, which means serving the right idea to the right person at the right time. While most other recommender systems have a small pool of possible candidates, for instance 100,000 movies, Pinterest has to recommend from more than 100 billion ideas saved by 150 million people around the world in real-time. We set a performance goal of 60 milliseconds p99 latency, and to achieve it, we built Pixie, a flexible, graph-based system for making personalized recommendations in real-time. Pixie now powers recommendations across Pinterest in Related Pins, home feed and Explore, and accounts for about half of all Pins saved.


  • Transfer Learning – Machine Learning’s Next Frontier
    In recent years, we have become increasingly good at training deep neural networks to learn a very accurate mapping from inputs to outputs, whether they are images, sentences, label predictions, etc. from large amounts of labeled data. What our models still frightfully lack is the ability to generalize to conditions that are different from the ones encountered during training.


  • GAN by Example using Keras on Tensorflow Backend
    “In this article, we discuss how a working DCGAN can be built using Keras 2.0 on Tensorflow 1.0 backend in less than 200 lines of code. We will train a DCGAN to learn how to write handwritten digits, the MNIST way.”



  • Data validation with the assertr package
    assertr is a R package to help you identify common dataset errors. More specifically, it helps you easily spell out your assumptions about how the data should look and alert you of any deviation from those assumptions.


  • Which machine learning algorithm should I use?
    This resource is designed primarily for beginning data scientists or analysts who are interested in identifying and applying machine learning algorithms to address the problems of their interest.


  • Best Practices for Applying Deep Learning to Novel Applications (paper)
    This report is targeted to groups who are subject matter experts in their application but deep learning novices. It contains practical advice for those interested in testing the use of deep neural networks on applications that are novel for deep learning. We suggest making your project more manageable by dividing it into phases. For each phase this report contains numerous recommendations and insights to assist novice practitioners.


  • A Gentle Intro To Graph Analytics With GraphFrames
    Graph traversal is nothing like SQL. You think about them in two completely different ways and the ergonomics of graph traversals are inherently harder to get used to. Further, this is compounded when using a Graph Analytics library like GraphX. Already being forced to work with RDDs (Not exactly beginner friendly) adding the paradigm of graphs on top of it is too much for the uninitiated. What would be much easier to comprehend is if we could go from a table-like structure to a graph and do the same queries for comparison. GraphFrames allow us to do exactly this. It’s an API for doing Graph Analytics on Spark DataFrames. This way, we can try to recreate SQL queries in Graphs and have a better grasp of the graph concepts.