- Quo vadis, LLM benchmarks?
- QED-Nano: Teaching a Tiny Model to Prove Hard Theorems - a Hugging Face Space by lm-provers
- Why Diffusion Language Models Are the Future | Dimitri von Rütte
- The magic kernel | John Costella
- Aristotelian - MLBio Lab
- Beating GPT-2 for <<$100: the nanochat journey
- Synthetic Pretraining | Vintage Data
- Self-Distilled Reasoner: On-Policy Self-Distillation | Siyan Zhao
- Video Encoding 101: A Comprehensive Guide
- A Visual Introduction to Rectified Flows - Alec Helbling
- A Technical Deep Dive on Moondream2
- Cameras and Lenses | Bartosz Ciechanowski
- Continuous batching from first principles
- Weights & Biases gets a new terminal UI
- sws - Minimal, predictable, footgun-free config library - lucasb-eyer
- RL Learning with LoRA: A Diverse Deep Dive | kalomaze's kalomazing blog
- OlmoEarth: A new state-of-the-art Earth observation foundation model family | Ai2
- NeurIPS 2025 Papers
- the bug that taught me more about PyTorch than years of using it | Elana Simon
- Evaluating Long Context (Reasoning) Ability | wh
- State of Vision-Language-Action (VLA) Research at ICLR 2026 – Moritz Reuss
- State of AI Report 2025
- Maintain the unmaintainable - a Hugging Face Space by transformers-community
- LoRA Without Regret - Thinking Machines Lab
- How to Detect, Track, and Identify Basketball Players with Computer Vision
- Astronaut Photo Interactive Map
- CMU Advanced NLP Spring 2025 (16): Parallelism and Scaling
- Online versus Offline RL for LLMs
- What is a color space? | Making Software
- AI just Broke Trackmania's most Legendary Record
- Defeating Nondeterminism in LLM Inference - Thinking Machines Lab
- Attention Is All You Need | Why Self-Attention
- Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
- Inside vLLM: Anatomy of a High-Throughput LLM Inference System - Aleksa Gordić
- FineVision: Open Data is All You Need - a Hugging Face Space by HuggingFaceM4
- How To Become A Mechanistic Interpretability Researcher
- Big O
- PrimeIntellect | Environments Hub
- Adventures in State Space
- Do LLMs Have Good Music Taste?
- How Social Media Shortens Your Life - by Gurwinder
- The Circuits Research Landscape: Results and Perspectives - August 2025 | Neuronpedia
- How Attention Sinks Keep Language Models Stable
- avatarl: training language models from scratch with pure reinforcement learning
- blogs and resources 101 | by @himanshustwts
- How Does A Blind Model See The Earth? - by henry
- There Are No New Ideas in AI… Only New Datasets
- Efficient MultiModal Data Pipeline
- All AI Models Might Be The Same - by Jack Morris
- The Era of Exploration | Yiding's blog
- The Case for More Ambition - Jack Morris
- DJing and its potential Neurophysiological Implications
- Open-sourcing circuit-tracing tools \ Anthropic
- Why We Think | Lil'Log
- DumPy: NumPy except it's OK if you're dum
- On the speed of ViTs and CNNs
- Multimodal Dataloaders go brrrrrrr - by Haoli Yin
- Vision Language Models (Better, faster, stronger)
- Neel Nanda - How I Think About My Research Process: Explore, Understand, Distill
- Is Gemini now better than Claude at Pokémon?
- My dream VLM
- torch.compile, the missing manual - Documentos de Google
- Dario Amodei - The Urgency of Interpretability
- Prof. Judy Fan: Cognitive Tools for Making the Invisible Visible
- The Colors Of Her Coat - by Scott Alexander
- attention is logarithmic, actually
- Factorio Learning Environment
- The Genius of DeepSeek’s 57X Efficiency Boost [MLA]
- Learning Pokémon With Reinforcement Learning | Pokémon RL
- Feather - lightweight, efficient, and locally hosted YouTube Music TUI built with Rust
- GRPO Judge Experiments: Findings & Empirical Observations | kalomaze's kalomazing blog
- Attention Is Off By One - Evan Miller
- darkspark
- geohints
- Minimind-V
- SkalskiP - VLMs zero to hero
- moondream/moondream/torch at main · vikhyat/moondream · GitHub
- Removing Jeff Bezos From My Bed
- Being a High-Leverage Generalist - char.blog
- kudzueye/boreal-hl-v1 · Hugging Face
- What if Eye...?
- A calculator app? Anyone could make that.
- The Breakthrough Behind Modern AI Image Generators | Diffusion Models Part 1
- Everyone knows your location
- WikiTok
- I’m Lovin’ It: Exploiting McDonald’s APIs to hijack deliveries and order food for a penny
- Attribution-based parameter decomposition
- Mapping the Latent Space of Llama 3.3 70B - Goodfire Papers
- Learning CUDA by optimizing softmax: A worklog | Maharshi's blog
- Understanding LSTM Networks -- colah's blog
- Dino-V2 Large Microscope
- AI and Stress
- model merging
- The Best Tacit Knowledge Videos on Every Subject
- Long-Term Thinking, 2nd Order Consequences & Effect Horizons
- Weighted Skip Connections are Not Harmful for Deep Nets
- 2024 letter | Zhengdong
- Things we learned about LLMs in 2024
- Building Machine Learning Systems for a Trillion Trillion Floating Point Operations
- History of Residuals and a Word of Caution
- dev.log - Gazing in the Latent Space with Sparse Autoencoders
- Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers — LessWrong
- Visualizing transformers and attention | Talk for TNG Big Tech Day '24
- The Octalysis Framework for Gamification & Behavioral Design
- Can we control AI?
- Building effective agents \ Anthropic
- Fast LLM Inference From Scratch
- You could have designed state of the art positional encoding
- NERSC SC23 DL Tutorial
- Where do LLMs spend their FLOPs?
- How to make LLMs go fast
- FSDP & CUDACachingAllocator: an outsider newb perspective