Web Picks (week of 14 December 2015)

Posted on December 24, 2015

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

The current state of machine intelligence 2.0
A great article on the upcoming changes in the ML ecosystem.

Jupyter, Zeppelin, Beaker: The Rise of the Notebooks
The idea of computer notebooks has been around for a long time, starting with the early days of Matlab and Mathematica in the mid-to-late-80s. This interesting post shows of modern data science implementations of this idea, which not only include Jupyter, but others as well.

Implementing a CNN for Text Classification in TensorFlow
Looking for more examples to learn TensorFlow? This post describes how to implement a model similar to Kim Yoon’s Convolutional Neural Networks for Sentence Classification.

Scikit Flow
Scikit Flow is a simplified interface for TensorFlow, to get people started on predictive analytics and data mining.

Finding the Right Color Palettes for Data Visualizations
This interesting post shows that even a seemingly trivial thing such as defining a color palette for your charts can in fact be a crucial decision to convey your visualisations. See also the great talk “How We Designed Matplotlib’s New Default Colormap (and You Can Too)” if you haven’t already.

Why Percentiles Don’t Work the Way you Think
“People assume, Averages are bad, and percentiles are great – let’s calculate percentile metrics and put them into our time series databases, right? Not so fast.”

Better Computer Go Player with Neural Network and Long-term Prediction
Both Facebook and Google are hard at work towards create a new AI that can beat humans at Go. This research paper describes Facebook’s latest effort.

Denoising Dirty Documents with R
Interesting post series on denoising badly scanned documents.

The NYT’s best data visualizations of the year
We especially enjoyed the economic yield curve.

Introducing OpenAI
Sam Altman, Elon Musk, as well as other influencial tech legends announce the OpenAI non-profit to advance researchin AI.

The Quartz guide to bad data
“As a reporter your world is full of data. And those data are full of problems. This guide presents thorough descriptions and possible solutions to many of the kinds of problems that you will encounter when working with data.”

d3.compose
Compose complex, data-driven visualizations from reusable charts and components with d3.

d3-shape
Another d3 library: “Visualizations typically consist of discrete graphical marks. While the rectangles of a bar chart may be easy enough to generate directly, other shapes are complex, such as rounded annular sectors and centripetal Catmull–Rom splines. This module provides a variety of shape generators for your convenience.”

This holiday season, give me real insights
Yanir argues against the misuse of “insights” to report the what, and forgetting about the why. We agree completely.

Wikipedia-Mining Algorithm Reveals World’s Most Influential Universities
An algorithm’s list of the most influential universities contains some surprising entries.

How Much Memory Does A Data Scientist Need?
Give a data scientist a beefy machine with 1TB of ram over a Hadoop stack 9 out of 10 times.

Monopoly Simulations
A fun one to close with: this post describes a Monopoly simulation model to figure out imbalances in the game.