Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Early images of Netflix.com show how far the service has come in its 20 years
“Its recommendations, now powered by algorithms, were curated by staffers then. Collections of movies grouped by common themes included Complicated Couples, Serious Spy Action, High-Tech Horrors, and Bad People, Great Movies—precursors to the remarkably specific personalized rows of categories now featured on Netflix.”
- Mark Zuckerberg is either ignorant or deliberately misleading congress
Again and again, before both Senate and House committees, Zuckerberg pleaded ignorance about the company he created and has controlled for 14 years.
- The Scientific Paper Is Obsolete
Here’s what’s next.
- The Mathematics of 2048: Optimal Play with Markov Decision Processes
“In this post, we’ll use a mathematical framework called a Markov Decision Process to find provably optimal strategies for 2048 when played on the 2×2 and 3×3 boards, and also on the 4×4 board up to the 64 tile.”
- Differentiable Plasticity: A New Method for Learning to Learn
Biological brains exhibit plasticity—that is, the ability for connections between neurons to change continually and automatically throughout life, allowing animals to learn quickly and efficiently from ongoing experience.
- Continuous Integration for Machine Learning
What do you call a machine-learning-focused data scientist who tests, versions, documents their code like a professional software engineer?
- Introducing TensorFlow Probability
“At the 2018 TensorFlow Developer Summit, we announced TensorFlow Probability: a probabilistic programming toolbox for machine learning researchers and practitioners to quickly and reliably build sophisticated models that leverage state-of-the-art hardware.” This looks great!
- Kafka, GDPR and Event Sourcing
You probably already know that the EU has approved this nice piece of legislation called GDPR. From a technical point of view, if you have bought into Event Sourcing and Kafka, it is of special interest GDPR’s “right to erasure” (aka. forget everything that you know about me), as it is at odds with the idea of an immutable event log that does not forget anything.
- Pseudonymisation is helping firms comply with a new EU privacy law
By stripping our identifying information, they are still able to do research.
- This is how Cambridge Analytica’s Facebook targeting model really worked — according to the person who built it
The method was similar to the one Netflix uses to recommend movies — no crystal ball, but good enough to make an effective political tool.
- Making AI Interpretable with Generative Adversarial Networks
“In this post, we share a framework we use for expanding the interpretability of our complex machine learning models.”
- Python & Big Data: Airflow & Jupyter Notebook with Hadoop 3, Spark & Presto
In this blog post I’ll take a single-node Hadoop installation, get Jupyter Notebook running and show how to create an Airflow job that can take a weather data feed, store it on HDFS, convert it into ORC format and then export it into a Microsoft Excel-formatted spreadsheet.
- How to easily Detect Objects with Deep Learning on Raspberry Pi
The real world poses challenges like having limited data and having tiny hardware like Mobile Phones and Raspberry Pis which can’t run complex Deep Learning models. This post demonstrates how you can do object detection using a Raspberry Pi. Like cars on a road, oranges in a fridge, signatures in a document and teslas in space.
- Deep Time-to-Failure
Survival analysis and time-to-failure predictive modeling using Weibull distributions and Recurrent Neural Networks in Keras
- Generative Adversarial Networks for Extreme Learned Image Compression
Powerful compression using deep learning.
- Evolutionary Computation Bestiary
A bestiary of evolutionary, swarm and other metaphor-based algorithms… what’s your favorite entry?
- GARS: a clustering-based Genetic Algorithm for the identification of Robust Subsets of variables in high-dimensional and challenging datasets
“Herein, we propose an efficient, robust and fast method that adopts stochastic optimization
approaches for high-dimensional data.”
- Lessons Learned Reproducing a Deep Reinforcement Learning Paper
“A key enabler of the switch to thinking more was keeping a much more detailed work log. Working without a log is fine when each chunk of progress takes less than a few hours, but anything longer than that and it’s easy to forget what you’ve tried so far and end up just going in circles.”
- Deep Painterly Harmonization
Code and data for the paper “Deep Painterly Harmonization”.
Create mathematical art with R
- An Introduction to Rocker: Docker Containers for R (pdf)
The Rocker project was launched in October 2014 as a collaboration between the authors to provide
high-quality Docker images containing the R environment.
- Word2Vec applied to Recommendation: Hyperparameters Matter
“Results reveal that optimizing neglected hyperparameters, namely negative sampling distribution, number of epochs, subsampling parameter and window-size, significantly improves performance on a recommendation task, and can increase it up to a factor of 10.”