Web Picks (week of 9 January 2017)

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

  • Blockchains & Platforms: shaping the future of Insurance and Liabilities
    “Despite the outlook on risk management — and generally on insurance business — is changing a lot on a global scale, we are all extremely surprised about how the giants in the insurance industry are sitting in peaceful thinking that their industry is — essentially — protected from any disruption.”
  • A Kaggler’s Guide to Model Stacking in Practice
    “Stacking (also called meta ensembling) is a model ensembling technique used to combine information from multiple predictive models to generate a new model. Here I provide a simple example and guide on how stacking is most often implemented in practice.”
  • Jupyter + Pachyderm — Part 1, Exploring and Understanding Historical Analyses
    “In other words, the multi-format, exploratory functionality of Jupyter could be that much more powerful if there were a system, with which Jupyter could be paired, that would enable Jupyter notebooks to interact with chronological records of works and/or be versioned themselves. Enter Pachyderm! Pachyderm, with its data versioning plus data pipelining functionality, can expand the possibilities and increase the significance of applications like Jupyter and nteract.” Cool!
  • The Instant Rise of Machine Intelligence?
    “While I strongly believe in the fascinating opportunities around deep learning for image recognition, natural language processing and even end-to-end “intelligent” systems (e.g. chat bots), I wanted to get a better feeling of the recent technological progress.”
  • Symbolic Machine Learning
    This post aims to provide a unifying approach to symbolic and non-symbolic techniques of artificial intelligence.
  • Beautiful thematic maps with ggplot2
    “In this blog post, I am going to explain step by step how I (eventually) achieved this result – from a very basic, useless, ugly, default map to the publication-ready and (in my opinion) highly aesthetic choropleth.”
  • Generating Videos with Scene Dynamics (paper)
    “We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction).”
  • Learning from Simulated and Unsupervised Images through Adversarial Training (paper)
    “With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulator’s output using unlabeled real data, while preserving the annotation information from the simulator.”
  • What Neural Network Can Tell About Your Doodles?
    “I spent 3 weeks analyzing them, observing, looking for patterns. And I found a few. These patterns could be used for a deeper analysis of a thought process behind drawing and the way people use their brain.”
  • Spatial analysis pipelines with simple features in R
    In November, the new simple features package for R sf hit CRAN. The package is like rgdal, sp, and rgeos rolled into one, is much faster, and allows for data processing with dplyr verbs! Also, as sf objects are represented in a much simpler way than sp objects, it allows for spatial analysis in R within magrittr pipelines.
  • Handy Python Libraries for Formatting and Cleaning Data
    Cleaning data may be time-consuming, but lots of tools have cropped up to make this crucial duty a little more bearable. The Python community offers a host of libraries for making data orderly and legible—from styling DataFrames to anonymizing datasets.
  • Design Better Data Tables
    After being the bread and butter of the web for most of its early history, tables were cast aside by many designers for newer, trendier layouts. But while they might be making fewer appearances on the web these days, data tables still collect and organize much of the information we interact with on a day-to-day basis.
  • ggraph – Graph visualization for messy data
    This is a library built on top D3 with the goal of improving how we work with large and messy graphs. It extends the notion of nodes and links with groups of nodes. This is useful when multiple nodes are in fact the same thing or belong to the same group.
  • A Guide to Deep Learning
    This guide is for those who know some math, know some programming language and now want to dive deep into deep learning.