Web Picks (week of 7 March 2016)

Posted on March 14, 2016

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Inside the Artificial Intelligence Revolution: A Special Report, Pt. 1
“We may be on the verge of creating a new life form, one that could mark not only an evolutionary breakthrough, but a potential threat to our survival as a species,” Rolling Stone investigates the AI revolution.

A Visual Look at 2 Million Chess Games
“We’ll take a look at more than 2 million games. I was interested to see what kind of visualizations I can do, and what patterns would be revealed by considering so many games.”

Introducing GraphFrames
Databricks announces the release of GraphFrames, a graph processing library for Apache Spark. GraphFrames support general graph processing, similar to Apache Spark’s GraphX library. However, GraphFrames are built on top of Spark DataFrames, resulting in some key advantages.

Four pitfalls of hill climbing
Interesting article discussing four distinct pitfalls that can emerge from an over-reliance on hill climbing.

Recommending for the World
The Netflix team highlights the four most interesting challenges they encountered in making their algorithms operate globally and, most importantly, how this improved their ability to connect members worldwide.

Where the f*** can I park?
Fun, short read about Manuel Garrido’s quest towards extracting data out of a very non-open data government repository.

A Billion Taxi Rides in Redshift
Now this is what you can call Big Data: this author analyses the geospatial metadata of 1.1 billion Taxi journeys made in New York City between 2009 and 2015 using Amazon Redshift.

Google’s Computers Are Making Thousands as Artists
Google’s artificial intelligence, or shall I say “painters,” raked in thousands for charity over the weekend in San Francisco at an art show called “DeepDream: The art of neural networks.”

Discover Hong Kong through the lense of Instagram
An interesting case study on how social media can be used to gain socio-demographic, and political, insights.

Analyzing the Language of the Presidential Debates
This article looks at the presidential candidates’ transcripts from the debates and see what can be gained by applying statistical and natural language processing (NLP) techniques to their words.

Visualizing the Clinton Email Network in R
Staying in the realm of politics, though the article states: “This isn’t a post about politics. When the WSJ folks made it possible to search the Clinton email releases I though it would be fun to get the data into R to show how well the igraph and ggnetworkpackages could work together, and also show how to use svgPanZoom to make it a bit easier to poke around the resulting hairball network.”

Do Brand Colors Translate to Instagram?
“Starbucks green. UPS brown. Target red. Some of the most recognized brands have built their visual identity on just one color. We were curious: how does this color-centric visual identity carry over to social media and, more specifically, Instagram?”

Watch Tiny Neural Nets Learn
This post I’ll show you some animations of tiny neural nets learning.

Google Unveils Neural Network with “Superhuman” Ability to Determine the Location of Almost Any Image
Not a tiny neural net here — guessing the location of a randomly chosen Street View image is hard, but Google’s latest artificial-intelligence machine manages it with relative ease.

Scaling Knowledge at Airbnb
Airbnb discusses how they make their knowledge shareable and easily transmittable, focussing on reproducibility, quality, consumability, discoverability and learning — interesting read.

Dora – an exploratory data analysis toolkit for Python
Dora is a new Python library designed to automate the painful parts of exploratory data analysis. The library contains convenience functions for data cleaning, feature selection & extraction, visualization, partitioning data for model validation, and versioning transformations of data.

A Practical Guide to Anonymizing Datasets with Python & Faker
Interesting article discussing how to deal with the anonymization issue when working with real life datasets.

Picture combining using Neural Networks
Well, this is strange. This image gallery shows of yet another neural network which is able to combine two source photographs into one.

Decision Forests, Convolutional Networks and the Models in-Between [pdf]
This paper investigates the connections between two state of the art classifiers: decision forests (DFs, including decision jungles) and convolutional neural networks (CNNs).

Dynamic Memory Networks for Visual and Textual Question Answering
“Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules.”