Web Picks (week of 2 May 2016)

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

  • DeepMind moves to TensorFlow
    The DeepMind team announces that they’ll start using TensorFlow in future projects and moving away from Torch.
  • How to Prevent a Plague of Dumb Chatbots?
    Staying in the world of AI bots: MIT Technology Review notes that “the best (and least annoying) chatbots will be those that recognize their limitations and occasionally turn to humans for help.”
  • Explorable explanations
    What if a book didn’t just give you old facts, but gave you the tools to discover those ideas for yourself, and invent new ideas, or, while reading a blog post, you could insert your own knowledge, challenge the author’s assumptions, and build things the author never even thought of… all inside the blog post itself? Explorable explanations is an attempt at answering some of those questions, and it’s a great way to visualize ideas and concepts.
  • Back to the Future of Hand riting Recognition
    The reason why we mention “exporable explanations”: this post describes and shows how the Graphical Input Language software system (GRAIL) worked, a handwriting recognition system from fifty years ago!
  • The amazing power of word vectors
    “For today’s post, I’ve drawn material not just from one paper, but from five! The subject matter is ‘word2vec’ – the work of Mikolov et al. at Google on efficient vector representations of words (and what you can do with them).”
  • Modern pandas
    Best practices and more in this modern pandas series. Worth a read for anyone starting out with pandas.
  • OpenAI Gym
    OpenAI announces Gym: a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Go.
  • csv-schema: analyze a CSV file and generates database table schema, all within the browser
    This application parses CSV files (including huge ones) within the browser. It analyzes each field to suggest the best database field type, max length, and whether or not there are any null values. From there, you can rename fields, ignore them, override field types/lengths, etc. and generate database table creation sql for MySQL, MariaDB, Postres, Oracle, or SQLite3.
  • TensorFlow Examples
    Code examples for some popular machine learning algorithms, using TensorFlow library. This tutorial is designed to easily dive into TensorFlow, through examples. It includes both notebook and code with explanations.
  • BetaGo: AlphaGo for the masses
    BetaGo lets you run your own Go engine. It downloads Go games for you, preprocesses them, trains a model on data, for instance a neural network using keras, and serves the trained model to an HTML front end, which you can use to play against your own Go bot.
  • On Nested Models
    “Current data science practice is quietly losing statistical power through inappropriate re-used of data in different stages of the process (the analyst looking, variable pruning, variable treatment, dimension reduction, an so on).”
  • Sketch Simplification
    “We present a novel technique to simplify sketch drawings based on learning a series of convolution operators. In contrast to existing approaches that require vector images as input, we allow the more general and challenging input of rough raster sketches such as those obtained from scanning pencil sketches.”
  • Where Will Your Country Stand in World War III?
    “In the recent Panama Papers scandal, journalists analyzed 11.5 million documents using network graphs to trace the use of offshore tax structures. In this chapter, we use a network graph technique called Social Network Analysis (SNA) to map weapons transfer between countries. By analyzing bilateral weapons trade, a network of multilateral ties can be distilled, providing insights into the complex arena of international politics.”
  • Machine Learning Meets Economics, Part 2
    Follow-up to the first part, zooming in on classification where a reject option is available, i.e. where the classifier can choose not to classify an instance.