Every so often, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Neural Chip Plays Doom Using a Thousandth of a Watt
“The purpose of the demo is not to show how well it can play Doom, but to demonstrate how efficient the NDP200 is at “bounding-box person detection” which would normally require a much more powerful processor.”
- High-resolution image reconstruction with latent diffusion models from human brain activity
Wild: “We demonstrate that our simple framework can reconstruct high-resolution images from brain activity with high semantic fidelity, without the need for training or fine-tuning of complex deep generative models.”
- Indirect Prompt Injection Threats
“If allowed by the user, Bing Chat can see currently open websites. We show that an attacker can plant an injection in a website the user is visiting, which silently turns Bing Chat into a Social Engineer who seeks out and exfiltrates personal information.”
- Keep your AI claims in check
“Marketers should know that — for FTC enforcement purposes — false or unsubstantiated claims about a product’s efficacy are our bread and butter.”
- Pixel Perfect: RTX Video Super Resolution Now Available for GeForce RTX 40 and 30 Series GPUs
AI-powered upscaling feature enhances streaming video content in Google Chrome and Microsoft Edge browsers.
- Diffusion With Offset Noise
Fine-tuning against a modified noise, enables Stable Diffusion to generate very dark or light images easily.
- Beating OpenAI CLIP with 100x less data and compute
Efficient pre-training of Vision-Language transformers for Semantic Search
- 200-Year-Old Math Opens Up AI’s Mysterious Black Box
Fourier analysis provides ideas on how to quickly train more accurate neural networks
- AI Tool Reveals How Celebrities’ Faces Have Been Photoshopped
“Within Health shows how celebrity faces are routinely warped in commercial photos and the increasingly unrealistic representations of beauty in the media.”
- Internet Explorer: Targeted Representation Learning on the Open Web
“We propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand”
- LLMs are compilers
“Sooner or later, code LLMs will achieve the same level of reliability, truly making them compilers from English to working code.”
- Understanding the attention mechanism in sequence models
“A unique context vector for every decoder time-step based on different weighted aggregations across all of the encoder hidden states.”
- Building an Efficient Machine Learning API
“Learn the techniques we used to build a performant and efficient product categorization endpoint that will be used within our product data pipeline.”
- Poisoning Web-Scale Training Datasets is Practical
“By exploiting specific invalid trust assumptions, we show how we could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60 USD.”
- Most Data Work Seems Fundamentally Worthless
“There is a flavor of despair I’ve become accustomed to, so deeply ingrained in the hearts of myself and my colleagues that it has settled into a hopeless passivity.”
Oxen helps you version your datasets like you version your code. (Kind of like dvc)
A collection of jailbreak prompts.
- Scribble Diffusion
Turn your sketch into a refined image using AI.