Web Picks (week of 9 March 2020)

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

  • The Illustrated Self-Supervised Learning
    “If intelligence is a cake, the bulk of the cake is self-supervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning (RL).”
  • Facebook Now Lets You Turn Any 2D Photo into a 3D Image Using AI
    Facebook just expanded 3D photo posting to phones that don’t actually capture depth data. Using the magic of machine learning (i.e. artificial intelligence), Facebook taught a neural network how to “infer 3D structures from 2D photos,” even if those photos were taken with a single lens camera.
  • CNN-generated images are surprisingly easy to spot…for now
    “We demonstrate that, with careful pre- and post-processing and data augmentation, a standard image classifier trained on only one specific CNN generator (ProGAN) is able to generalize surprisingly well to unseen architectures, datasets, and training methods (including the just released StyleGAN2). Our findings suggest the intriguing possibility that today’s CNN-generated images share some common systematic flaws, preventing them from achieving realistic image synthesis.”
  • Aleph is a powerful tool for people who follow the money
    It helps investigators to securely access and search large amounts of data – no matter whether they are a government database or a leaked email archive.
  • Andreessen-Horowitz craps on “AI” startups from a great height
    “They use all the buzzwords (my personal bete-noir; the term “AI” when they mean “machine learning”), but they’ve finally publicly noticed certain things which are abundantly obvious to anyone who works in the field. For example, gross margins are low for deep learning startups that use “cloud” compute. Mostly because they use cloud compute.”
  • Mapping coronavirus, responsibly
    “When you throw in a developing human health story the ingredients are ripe for maps to take centre stage, as they have become with the ongoing coronavirus outbreak. Let’s take a look at how maps can help shape the narrative and, as concern (fear?) grows, how to map the data responsibly.”
  • The fastai book – draft
    These draft notebooks cover an introduction to deep learning, fastai, and PyTorch. fastai is a layered API for deep learning.
  • How We Improved Data Discovery for Data Scientists at Spotify
    To enable Spotifiers to make faster, smarter decisions, we’ve developed a suite of internal products to accelerate the production and consumption of insights. One of these products is Lexikon, a library of data and insights that help employees find and understand the data and knowledge generated by members of our insights community.
  • Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: a prospective study
    “For model development and validation, 46,096 anonymous images from 106 admitted patients, including 51 patients of laboratory confirmed COVID-19 pneumonia and 55 control patients of other diseases in Renmin Hospital of Wuhan University (Wuhan, Hubei province, China) were retrospectively collected and processed.”
  • A Primer in BERTology: What we know about how BERT works
    “Transformer-based models are now widely used in NLP, but we still do not understand a lot about their inner workings. This paper describes what is known to date about the famous BERT model (Devlin et al. 2019), synthesizing over 40 analysis studies. We also provide an overview of the proposed modifications to the model and its training regime. We then outline the directions for further research.”
  • Transformers are Graph Neural Networks
    Graph Deep Learning sounds great, but are there any big commercial success stories? Is it being deployed in practical applications?
  • Universal Data Tool
    Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app
  • SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems
    “We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization.”
  • Unscreen: remove video backgrounds
    Goodbye greenscreen, let’s use deeplearning instead.
  • Towards Interactive Weak Supervision with FlyingSquid
    “We wanted to push towards a much more interactive weak supervision loop by reducing this turnaround time from label source creation to working model. We present FlyingSquid, a first step in that direction.”
  • This artwork does not exist
    Trained by Michael Friesen on images of Modern Art