Web Picks (week of 3 December 2018)

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

  • Aequitas
    An open source bias audit toolkit for machine learning developers, analysts, and policymakers to audit machine learning models for discrimination and bias, and make informed and equitable decisions around developing and deploying predictive risk-assessment tools.
  • The deepest problem with deep learning
    “Some reflections on an accidental Twitterstorm, the future of AI and deep learning, and what happens when you confuse a schoolbus with a snow plow.”
  • How to Create Value with Machine Learning
    “A General-Purpose Framework for Defining and Solving Meaningful Problems in 3 Steps”
  • How AI automation could boost employment: The role of demand
    “So, why does automation lead to employment growth in some industries at particular times, while leading to job losses at other times and in other industries?”
  • GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
    “The #GANpaint app works by directly activating and deactivating sets of neurons in a deep network trained to generate images. Each button on the left (“door”, “brick”, etc) corresponds to a set of 20 neurons. The app demonstrates that, by learning to draw, the network also learns about objects such as trees and doors and rooftops. By switching neurons directly, you can observe the structure of the visual world that the network has learned to model.” Make sure to take a look at the examples!
  • BlackRock shelves unexplainable AI liquidity models
    Risk USA: Neural nets beat other models in tests, but results could not be explained
  • Better People Analytics
    “But hype, as it often does, has outpaced reality. The truth is, people analytics has made only modest progress over the past decade. A survey by Tata Consultancy Services found that just 5% of big-data investments go to HR, the group that typically manages people analytics.”
  • Angela Bassa discusses managing data science teams and much more
    Angela Bassa, the Director of Data Science at iRobot, talks about how to get into data science management, how to hire data scientists, and much more.
  • How Cheap Labor Drives China’s A.I. Ambitions
    “If China is the Saudi Arabia of data, as one expert says, these businesses are the refineries, turning raw data into the fuel that can power China’s A.I. ambitions.”
  • Bias-Variance Decomposition
    “Often, researchers use the terms bias and variance or “bias-variance tradeoff” to describe the performance of a model — i.e., you may stumble upon talks, books, or articles where people say that a model has a high variance or high bias. So, what does that mean? In general, we might say that “high variance” is proportional to overfitting, and “high bias” is proportional to underfitting.”
  • Twitter’s Kafka adoption story
    “In terms of performance, we saw that Kafka had significantly lower latency, regardless of the amount of throughput as measured by the timestamp difference from the time the message was created to when the consumer read the message.”
  • Amazon’s own ‘Machine Learning University’ now available to all developers
    “Today, I’m excited to share that, for the first time, the same machine learning courses used to train engineers at Amazon are now available to all developers through AWS.”
  • AWS Lambda Supports Python 3.7
    “You can now develop your AWS Lambda functions using Python 3.7 in addition to the already supported 2.7 and 3.6 versions. Python 3.7 is the newest major release of the Python language, and it contains many new features such as support for data classes, customization of access to module attributes, and typing enhancements.”
  • Google shut out privacy and security teams from secret China project
    “The objective, code-named Dragonfly, was to build a search engine for China that would censor broad categories of information about human rights, democracy, and peaceful protest.”
  • Reading Minds with Deep Learning
    “In this post, I’ll demonstrate how you can predict what people are doing from fluctuations in their brain voltage readings using some basic deep learning techniques.”
  • NYU, Facebook release massive MRI dataset as part of ongoing AI project
    “The New York University (NYU) School of Medicine’s department of radiology is releasing a knee MRI dataset of more than 1.5 million anonymous images as part of its ongoing collaboration with Facebook to make MRI scans 10 times faster with artificial intelligence (AI). The collaboration, known as fastMRI, involves NYU’s Center for Advanced Imaging Innovation and Facebook AI Research (FAIR).”
  • Facebook Knew Russia Was Harvesting Data in 2014, U.K. Lawmaker Says
    Facebook Inc. knew that Russian-linked entities were using a feature on the social network that let advertisers harvest large amounts of data as early as October 2014, according to an internal email a U.K. lawmaker said he had reviewed.
  • The Lesser Known Stars of the Tidyverse
    Interesting overview of some lesser known R packages.
  • Making Your Neural Network Say “I Don’t Know”
    Bayesian NNs using Pyro and PyTorch
  • Tensorflow 2.0: models migration and new design
    Tensorflow 2.0 will be a major milestone for the most popular machine learning framework: lots of changes are coming, and all with the aim of making ML accessible to everyone.
  • Things I Wish I’d Known About Spark When I Started
    The enigma team outlines some lesser known tips to keep in mind when working with Spark.
  • Translating Between Statistics and Machine Learning
    “Statistics and machine learning often use different terminology for similar concepts. I recently confronted this when I began reading about maximum causal entropy as part of a project on inverse reinforcement learning. Many of the terms were unfamiliar to me, but as I read closer, I realized that the concepts had close relationships with statistics concepts. This blog post presents a table of connections between terms that are standard in statistics and their related counterparts in machine learning.”
  • Imaginary worlds dreamed by BigGAN
    “These are some of the most amazing generated images I’ve ever seen. Introducing BigGAN, a neural network that generates high-resolution, sometimes photorealistic, imitations of photos it’s seen. None of the images below are real – they’re all generated by BigGAN.”
  • Varying Speaking Styles with Neural Text-to-Speech
    “When people speak, they use different speaking styles depending on context. A TV newscaster, for example, will use a very different style when conveying the day’s headlines than a parent will when reading a bedtime story. Amazon scientists have shown that our latest text-to-speech (TTS) system, which uses a generative neural network, can learn to employ a newscaster style from just a few hours of training data. This advance paves the way for Alexa and other services to adopt different speaking styles in different contexts, improving customer experiences.”
  • Q: Query CSV files with SQL
    q is a command line tool that allows direct execution of SQL-like queries on CSVs/TSVs (and any other tabular text files).
  • Translate Pictures of Food into Recipes with Deep Learning
    “In this tutorial I will show how to train deep convolutional neural networks with Keras to classify images into food categories and to output a matching recipe.”
  • Facebook Filed A Patent To Predict Your Household’s Demographics Based On Family Photos
    “Facebook’s proposed technology would analyze your #wifey tags, shared IP addresses, and photos to predict whom you live with.”
  • Generating Classical Music with Neural Networks
    “Christine McLeavey Payne may have finally cured songwriter’s block. Her recent project Clara is a long short-term memory (LSTM) neural network that composes piano and chamber music. Just give Clara a taste of your magnum-opus-in-progress, and Clara will figure out what you should play next.”
  • Singapore to test facial recognition on lampposts, stoking privacy fears
    “In the not too distant future, surveillance cameras sitting atop over 100,000 lampposts in Singapore could help authorities pick out and recognize faces in crowds across the island-state.”
  • Using graph theory to find cheap flights
    “We could model the relationship between airports as a graph, and then look for a cycle in the graph of size 3, with two nodes we already know.”
  • Researchers Created Fake ‘Master’ Fingerprints to Unlock Smartphones
    It’s the same principle as a master key, but applied to biometric identification with a high rate of success.
  • Python Data Visualization 2018: Why So Many Libraries?
    “There is a huge range of visualization functionality available for Python, with a diversity in approach and focus that is reflected in the large number of libraries available.”
  • MLflow v0.8.0 released
    This release features Improved Experiment UI and Deployment Tools
  • CakeML is a functional programming language and an ecosystem of proofs and tools built around the language
    The ecosystem includes a proven-correct compiler that can bootstrap itself.
  • AI Playbook
    “This site is designed as a resource for anyone asking those questions, complete with examples and sample code to help you get started.”
  • One of the fathers of AI is worried about its future
    Yoshua Bengio wants to stop talk of an AI arms race and make the technology more accessible to the developing world.
  • Flying for Thanksgiving
    “When is the best day to fly for Thanksgiving? And when is the worst day?”