Web Picks (week of 25 June 2018)

Posted on July 3, 2018

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Neural scene representation and rendering
An amazing new architecture from Deepmind is able to render a 3d representation using only a handful of 2d pictures. Impressive stuff!
Model Tuning and the Bias-Variance Tradeoff
Great visualization. “The goal of modeling is to approximate real-life situations by identifying and encoding patterns in data. Models make mistakes if those patterns are overly simple or overly complex.”
Bias detectives: the researchers striving to make algorithms fair
As machine learning infiltrates society, scientists are trying to help ward off injustice.
Visualizing ML Models with LIME
“This post demonstrates how to use the lime package to perform local interpretations of ML models.”
Ethical Machine Learning: Spotting and Preventing Proxy Bias
“Machine learning systems often inherit biases against protected classes and historically disparaged groups via their. Though some biases in features are straightforward to detect (ex: age, gender, race), others are not explicit and rely on subtle correlations in machine learning algorithms to detect. The incorporation of unintended bias into predictive models is called proxy discrimination.”
This AI program could beat you in an argument—but it doesn’t know what it’s saying
The latest human-versus-machine matchup involves an argumentative AI system.
AI Nationalism
“The central prediction I want to make and defend in this post is that continued rapid progress in machine learning will drive the emergence of a new kind of geopolitics; I have been calling it AI Nationalism.”
The Machine Fired Me
No human could do a thing about it!
So your data science project isn’t working
Every data science project is a high-risk project at its core.
Facebook open sources DensePose
Today, Facebook AI Research (FAIR) open sourced DensePose, a real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body.
Programming Best Practices For Data Science
Often, the entire data science life cycle ends up as an arbitrary mess of notebook cells in either a Jupyter Notebook or a single messy script. But there’s a better way!
Visualizing social network with Networkx and Basemap
When it comes to complicated networks, it remains a challenge to illustrate attributes of the network visually, descriptively and effectively. Networkx and Basemap provide a whole-in-one solution, from creating network graphs over calculating various measures to neat visualizations.
Is There a Smarter Path to Artificial Intelligence? Some Experts Hope So
Benjamin Grosof is the chief scientist at Kyndi, a Silicon Valley start-up that is using a decades-old programming language to develop software that can generate facts, concepts and inferences from small amounts of data.
Using Deep Learning to Help Pathologists Find Tumors
From Baidu: “We have proposed a new deep learning algorithm that takes not just one individual patch but a grid of neighboring patches as input to jointly predict whether they are tumor cells or normal cells. This technique could be compared to a pathologist zooming out to see the larger field and make more confident judgements. The spatial correlations between neighboring patches are modelled through a specific type of probabilistic graphical model named conditional random fields. The whole deep learning framework can be trained end-to-end on GPU without any post processing.”
Google Engineers Refused to Build Security Tool to Win Military Contracts
A work boycott from the Group of Nine is yet another hurdle to the company’s efforts to compete for sensitive government work.
Add Constrained Optimization To Your Toolbelt
“My constrained optimization package of choice is the python library pyomo, an open source project for defining and solving optimization problems.”
Augmented space planning: Using procedural generation to automate desk layouts (paper)
“We developed a suite of procedural algorithms for space planning in commercial offices. These algorithms were benchmarked against 13,000 actual offices designed by human architects. The algorithm performed as well as an architect on 77% of offices, and achieved a higher capacity in an additional 6%, all while following a set of space standards.”
Automated Feature Engineering
“In this notebook, we will look at an exciting development in data science: automated feature engineering. A machine learning model can only learn from the data we give it, and making sure that data is relevant to the task is one of the most crucial steps in the machine learning pipeline (this is made clear in the excellent paper “A Few Useful Things to Know about Machine Learning”).”
Getting data from pdfs using the pdftools package
“It is often the case that data is trapped inside pdfs, but thankfully there are ways to extract it from the pdfs. A very nice package for this task is pdftools (Github link) and this blog post will describe some basic functionality from that package.”
Make an impact with your location data
From Uber engineering: Kepler.gl is a powerful open source geospatial analysis tool for large-scale data sets.
SNIPER, an efficient multi-scale object detection algorithm
SNIPER is an efficient multi-scale training approach for instance-level recognition tasks like object detection and instance-level segmentation. Instead of processing all pixels in an image pyramid, SNIPER selectively processes context regions around the ground-truth objects (a.k.a chips).
Vaex: Lazy Out-of-Core DataFrames for Python.
Visualize and explore big tabular datasets. A billion rows per second on a single computer.
Realtime tSNE Visualizations with TensorFlow.js
“In Linear tSNE Optimization for the Web, we present a novel approach to tSNE that heavily relies on modern graphics hardware. Furthermore, we are releasing this work as an open source library in the TensorFlow.js family in the hopes that the broader research community finds it useful.”
Machine Learning: The High Interest Credit Card of Technical Debt
A refresher from a paper published in 2014: “Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning.”
Retro Contest: Results
From OpenAI: “The first run of our Retro Contest — exploring the development of algorithms that can generalize from previous experience — is now complete. Though many approaches were tried, top results all came from tuning or extending existing algorithms such as PPO and Rainbow. There’s a long way to go: top performance was 4,692 after training while the theoretical max is 10,000. These results provide validation that our Sonic benchmark is a good problem for the community to double down on: the winning solutions are general machine learning approaches rather than competition-specific hacks, suggesting that one can’t cheat through this problem.”