Web Picks (week of 28 May 2018)

Posted on June 10, 2018

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Ten red flags signalling your analytics program will fail
Struggling to become analytics-driven? One or more of these issues is likely what’s holding your organization back.
Intelligent Machines
The US military is funding an effort to catch deepfakes and other AI trickery, but DARPA’s technologists admit that it might be a losing battle.
Delayed Impact of Fair Machine Learning
“In this post, we talk about our recent work on aligning decisions made by machine learning with long term social welfare goals. Commonly, machine learning models produce a score that summarizes information about an individual in order to make decisions about them. For example, a credit score summarizes an individual’s credit history and financial activities in a way that informs the bank about their creditworthiness. Let us continue to use the lending setting as a running example.” Very insightful!
Reporting is a Gateway Drug
“If executed poorly, the Analytics team can turn into a team of reporting monkeys – we all know what that is like. Here is some advice on how to use reporting as a means to create strong stakeholder relationships in your organization.”
When algorithms surprise us
When machine learning algorithms solve problems in unexpected ways, programmers find them, okay yes, annoying sometimes, but often purely delightful.
Zuckerberg didn’t make any friends in Europe today
“However there was little transparency or accountability on show during the session, given the upfront questions format which saw Zuckerberg cherry-picking a few comfy themes to riff on after silently absorbing an hour of MEPs’ highly specific questions with barely a facial twitch in response.”
How will the GDPR impact machine learning?
Answers to the three most commonly asked questions about maintaining GDPR-compliant machine learning programs.
To Build Truly Intelligent Machines, Teach Them Cause and Effect
“Judea Pearl, a pioneering figure in artificial intelligence, argues that AI has been stuck in a decades-long rut. His prescription for progress? Teach machines to understand the question why.”
Categorizing Listing Photos at Airbnb
“Large-scale deep learning models are changing the way we think about images of homes on our platform.”
Machine Learning Breaking Bad – addressing Bias and Fairness in ML models
“Looking ahead to 2018, rising awareness of the impact of bias, and the importance of fairness and transparency, means that data scientists need to go beyond simply optimizing a business metric.”
A.I. Is Harder Than You Think
“The reason Google Duplex is so narrow in scope isn’t that it represents a small but important first step toward such goals. The reason is that the field of A.I. doesn’t yet have a clue how to do any better.”
Amazon Pushes Facial Recognition to Police. Critics See Surveillance Risk
“Amazon promotes its facial recognition technology on the company’s website, saying that the service can track people in a video even when their faces are not visible.”
Employers are monitoring computers, toilet breaks – even emotions. Is your boss watching you?
“From microchip implants to wristband trackers and sensors that can detect fatigue and depression, new technology is enabling employers to watch staff in more and more intrusive ways. How worried should we be?”
Microsoft too, has been working on voice-ready, calling-capable bots (video)
In Mandarin, even.
Growing up with AI
How can families play and learn with their new smart toys and companions?
20 Questions to Ask Prior to Starting Data Analysis
It is crucial to ask the right questions and/or understand the problem, prior to beginning data analysis.
Top 20 R Libraries for Data Science in 2018
An infographic of Top 20 R packages for data science, which covers the libraries main features and GitHub activities.
Datasette Facets
“Datasette 0.22 is out with the most significant new feature I’ve added since the initial release: faceted browse. Datasette lets you deploy an instant web UI and JSON API for any SQLite database. csvs-to-sqlite makes it easy to create a SQLite database out of any collection of CSV files. Datasette Publish is a web app that can run these combined tools against CSV files you upload from your browser. And now the new Datasette Facets feature lets you explore any CSV file using faceted navigation with a couple of clicks.”
If We All Left to “Go Back Where We Came From”
“Based on data from the 2012-2016 American Community Survey, there were about 315 million people who lived in the 50 states or District of Columbia. Here are all of them. Each dot represents a person.”
Introducing state of the art text classification with universal language models
“This post is a lay-person’s introduction to our new paper, which shows how to classify documents automatically with both higher accuracy and less data requirements than previous approaches. We’ll explain in simple terms: natural language processing; text classification; transfer learning; language modeling; and how our approach brings these ideas together.”
LocationSmart API Vulnerability
“On May 16th, I found a vulnerability in the LocationSmart website which allowed anyone, with no prior authentication or consent, to obtain the realtime location of any cellphone in the US to within a few hundred feet.”
ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus
“Pearl dismisses most of what we do in ML as curve fitting. While I believe that’s an overstatement (conveniently ignores RL for example), it’s a nice reminder that most productive debates are often triggered by controversial or outright arrogant comments. Calling machine learning alchemy was a great recent example. After reading the article, I decided to look into his famous do-calculus and the topic causal inference once again.”
OpenAI’s Gym Retro
“We’re releasing the full version of Gym Retro, a platform for reinforcement learning research on games. This brings our publicly-released game count from around 70 Atari games and 30 Sega games to over 1,000 games across a variety of backing emulators. We’re also releasing the tool we use to add new games to the platform.”
Raytracing The Washington Monument in R
For natural looking shadows, let’s simulate nature. Let’s build a raytracer.