Web Picks (week of 1 June 2015)

Posted on June 3, 2015

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

The Unreasonable Effectiveness of Recurrent Neural Networks
Andrej Karpathy presents a captivating introduction on the magic of recurrent neural networks, trains them to write Shakespeare, and even release full source code – worth a full read!

Top 10 data mining algorithms in plain English
We’re missing the “random forest” explanation, but a good introduction overall.

Nobody ever got fired for using Hadoop on a cluster [pdf]
A paper from Microsoft research, with the title of course being a pun on the famous marketing phrase: “No one ever got fired for buying IBM.” The article argues that memory and solid state drives are becoming so cheap, that managers might be wrongly implementing Hadoop clusters for use cases where such solutions are not needed. Also amusing: yourdatafitsinram.com.

Any P-Value Distinguishable from Zero is Insufficiently Informative
Another article definitely worth reading from Cosma Rohilla Shalizi. Reminding us about the crucial statistical fact that “Any Non-Zero Mean Will Become Arbitrarily Significant” and “Any Non-Zero Regression Coefficient Will Become Arbitrarily Significant”.

Torch
Torch is a scientific computing framework with wide support for machine learning algorithms, based on Lua.

Keras
Another framework: Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses Theano under the hood. It was developed with a focus on enabling fast experimentation.

RStudio 0.99.441 released
Finally, we end with this absolute must-upgrade for data scientists: RStudio 0.99.441 has been released containing a vastly improved data viewer with support for large datasets, filtering, searching, and sorting, a complete overhaul of the code completion engine, and a new diagnostics provider to warn you of potential errors as you work.