Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Too much efficiency makes everything worse: overfitting and the strong version of Goodhart’s law
“This same counterintuitive relationship between efficiency and outcome occurs in machine learning, where it is called overfitting.” Links in very nicely with many topics we also discuss in our Model Risk book!
- The past, present, and future of notebooks
Data science notebooks have come a long way since first introduced back in 1988. Here’s the 101 on how we got here, where the market is at, and predictions for the future.
- How Federated Learning Protects Privacy
“With federated learning, it’s possible to collaboratively train a model with data from multiple users without any raw data leaving their devices.”
- What Good Data Self-Serve Looks Like
“I once was tasked with figuring out how to ‘democratize data’ for internal employees. No other instructions, solely a general pain point of ‘the data team is stuck doing ad-hoc tickets’ and ‘stakeholders want to get data on their own.’”
- WeightedSHAP: analyzing and improving Shapley based feature attributions
“This repository provides an implementation of the paper WeightedSHAP: analyzing and improving Shapley based feature attributions accepted at NeurIPS 2022. We show the suboptimality of SHAP and propose a new feature attribution method called WeightedSHAP.”
- FLAN-T5, a yummy model superior to GPT-3
“Sometimes some artificial intelligence models go unnoticed despite their worth. This is the case with FLAN-T5, a model developed by Google and with a name as appetizing as its NLP power.”
- Very Large Language Models and How to Evaluate Them
Enabling zero-shot evaluation of language models on the Hub
- teex: a toolbox for evaluating XAI explanations
This project aims to provide a simple way of evaluating individual black box explanations against ground truth.
- Explainable AI: Explaining poker with LIME
Determine the rules of poker using LIME!
- No, you don’t need MLOps
“Keep It Simple: the complexity of full MLOps is rarely needed”
- A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
“This hands-on introduction is aimed to provide the reader a working understanding of conformal prediction and related distribution-free uncertainty quantification techniques with one self-contained document.”
- AI and the Future of Pixel Art
“The impact that AI will have and is already having on creativity cannot be understated.”
- Data persistency, large-scale data analytics and visualizations
“If you have been using NetworkX for one of your projects, you’ve probably noticed some of its limitations. Each time you want to change something in the dataset or run another algorithm, you have to load the dataset all over again.”
- On the biggest bottlenecks in robotics and reinforcement learning
“The train distribution and the test distribution are not always the same. The real world changes over time.”