Web Picks (week of 1 November 2021)

Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

  • What Data Folk Were Saying about Zillow
    NOT about what happened AT Zillow, but instead about the discussion that lots of professional numbers people, data scientists, risk managers, finance, quants, etc have to say about and around the situation
  • Zillow, Prophet, Time Series, & Prices
    “So I made a mildly controversial tweet.”
  • A Non-Technical Guide to Interpreting SHAP Analyses
    With interpretability becoming an increasingly important requirement for machine learning projects, there’s a growing need for the complex outputs of techniques such as SHAP to be communicated to non-technical stakeholders.
  • Avoiding Data Disasters
    Things can go disastrously wrong in data science and machine learning projects when we undervalue data work, use data in contexts that it wasn’t gathered for, or ignore the crucial role that humans play in the data science pipeline.
  • How and why we built a custom gradient boosted-tree package
    In order to make accurate and fast travel-time predictions, Lyft built a gradient boosted tree (GBT) package from the ground up. It is slower to train than off-the-shelf packages, but can be customized to treat space and time more efficiently and yield less volatile predictions.
  • Machine learning is not nonparametric statistics
    Not only can I never get a consistent definition of what “nonparametric” means, but the jump from statistics to machine learning is considerably larger than most expect.
  • I’ve Stopped Using Box Plots. Should You?
    “After having explained how to read box plots to thousands of workshop participants, I now believe that they’re poorly conceived”
  • Coloring your ggplot2 art like a pro
    ggplot2 makes strong assumptions about how many different color scales there can be in a plot (exactly one) and how color palettes should be designed
  • The Breathing K-Means Algorithm
    The Breathing K-Means is an approximation algorithm for the k-means problem that (on average) is better (higher solution quality) and faster (lower CPU time usage) than k-means++.
  • Improving a Machine Learning System
    Making measurable improvements to a mature machine learning system is extremely difficult. In this post, we will explore why.
  • Hierarchical Transformers Are More Efficient Language Models (paper)
    “These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility. We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences.”
  • Large Language Models: A New Moore’s Law?
    Yet, should we be excited about this mega-model trend? I, for one, am not. Here’s why.
  • Just Ask for Generalization
    Generalizing to what you want may be easier than optimizing directly for what you want. We might even ask for “consciousness”.
  • Ask Delphi
    Delphi is a research prototype designed to investigate the promises and more importantly, the limitations of modeling people’s moral judgments on a variety of everyday situations.
  • Where are China’s Recent AI Ethics Guidelines Coming From?
    Examining Three Guidelines from the Ministry of Science and Technology’s Recent Document in Context
  • The theory-practice gap
    I want to claim that most of the action in theoretical AI alignment is people proposing various ways of getting around these problems by having your systems do things that are human understandable instead of doing things that are justified by working well.
  • AI can see through you: CEOs’ language under machine microscope
    CEOs and other managers are increasingly under the microscope as some investors use artificial intelligence to learn and analyse their language patterns and tone, opening up a new frontier of opportunities to slip up.
  • A Gentle Introduction to Graph Neural Networks
    Neural networks have been adapted to leverage the structure and properties of graphs. We explore the components needed for building a graph neural network – and motivate the design choices behind them.
  • Bayesian histograms for rare event classification
    Bayesian histograms are a stupidly fast, simple, and nonparametric way to find how rare event probabilities depend on a variable (with uncertainties!).
  • AI Recognises Race in Medical Images
    Previous studies have shown that AI can predict your sex and age from looking at an eye scan, or your race from a chest X-ray.
  • DeepMind AI predicts incoming rainfall with high accuracy
    Having flexed its muscles in predicting kidney injury, toppling Go champions and solving 50-year-old science problems, artificial intelligence company DeepMind is now dipping its toes in weather forecasting.
  • ClickHouse vs TimescaleDB