Web Picks (week of 2 October 2017)

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

  • Is AI Riding a One-Trick Pony?
    Just about every AI advance you’ve heard of depends on a breakthrough that’s three decades old. Keeping up the pace of progress will require confronting AI’s serious limitations.
  • New Theory Cracks Open the Black Box of Deep Learning
    A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn…
  • A Brain Built From Atomic Switches Can Learn
    A tiny self-organized mesh full of artificial synapses recalls its experiences and can solve simple problems. Its inventors hope it points the way to devices that match the brain’s energy-efficient computing prowess.
  • The Ten Fallacies of Data Science
    There’s a gap between the idealized data science projects that students are exposed to and what actually happens in the real world. This article explores ten of the most important surprises.
  • Why SQL is beating NoSQL, and what this means for the future of data
    After years of being left for dead, SQL today is making a comeback. How come? And what effect will this have on the data community?
  • When Websites Design Themselves
    Today, we’re on the verge of another revolution, as artificial intelligence and machine learning turn the graphic design field on its head again.
  • I asked Tinder for my data. It sent me 800 pages of my deepest, darkest secrets
    “The dating app knows me better than I do, but these reams of intimate information are just the tip of the iceberg. What if my data is hacked – or sold?”
  • Microsoft: “Announcing tools for the AI-driven digital transformation”
    Microsoft announces new AI tooling.
  • Apache Arrow and the “10 Things I Hate About pandas”
    “In this post I hope to explain as concisely as I can some of the key problems with pandas’s internals and how I’ve been steadily planning and building pragmatic, working solutions for them. To the outside eye, the projects I’ve invested in may seem only tangentially-related: e.g. pandas, Badger, Ibis, Arrow, Feather, Parquet. Quite the contrary, they are all closely-interrelated components of a continuous arc of work I started almost 10 years ago…”
  • The traveling metallurgist
    Explore simulated annealing & the traveling salesman problem: new interactive dataviz & blog post…
  • PixelNN: Example-based Image Synthesis
    “We present a simple nearest-neighbor (NN) approach that synthesizes high-frequency photorealistic images from an “incomplete” signal such as a low-resolution image, a surface normal map, or edges…”
  • Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning
    “Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology which offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report the first deep learning model – Chiron – that can directly translate the raw signal to DNA sequence without the error-prone segmentation step…”
  • How You Can Use the New Stack Overflow Bot from Microsoft
    “So when Microsoft showed us how they were bringing AI to every developer through their platforms and tools, and asked if they could partner with us to create an AI driven experience for developers to use and learn with, we of course said yes…”
  • europilot: A toolkit for controlling Euro Truck Simulator 2 with python to develop self-driving algorithms.
    Europilot is an open source project that leverages the popular Euro Truck Simulator(ETS2) to develop self-driving algorithms. Think of europilot as a bridge between the game environment, and your favorite deep-learning framework, such as Keras or Tensorflow. With europilot, you can capture the game screen input, and programmatically control the truck inside the simulator…
  • Predicting NFL Plays with the xgboost Decision Tree Algorithm
    “Enter – the play predictor. This tool aims to enhance in-game NFL decision making with a tool capable of predicting the type of play the opposing team will run at high accuracy in real-time. On average this tool is able to predict pass or run at 73.6% accuracy, with varying performance dictated by teams playing and mostly game situation.”
  • Learning to Optimize with Reinforcement Learning
    “There is a paradox in the current paradigm: the algorithms that power machine learning are still designed manually. This raises a natural question: can we learn these algorithms instead?”
  • 3D Face Reconstruction from a Single Image
    “This is an online demo of our paper Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression.”
  • Visualizing Distributions
    “Many charting taxonomies include distributions, but they only present a few options. Let’s remedy that with a post on the many. We’ll use a single (completely fake) data set so we can easily compare how each chart type displays the same data.”
  • The Complex World of Data Scientists and Black-Box Algorithms
    Hilary Mason speaks to audiences around the world about data, machine learning, AI, and how to build real, functional, and robust products. She’s the Founder of Fast Forward Labs and is the Data Scientist in Residence at Accel. In this interview, Hilary discusses a variety of topics including things like careers, data products, and black-box deep learning algorithms.
  • LEGO color themes as topic models
    This post uses text mining techniques to explore the color themes across a corpus of LEGO sets. It’s a fun read with nice visualizations along the way.
  • Franchise – an open-source notebook for SQL
    Franchise is an open-source SQL tool with a notebook interface. Supports CSVs, JSON, XLSX files and offers a variety of ways to explore your data.
  • STYLE2PAINTS: Converting sketches to paintings
    The AI can paint on a sketch according a given specific color style.