Research Papers • Some things I like

Solaris: Building a Multiplayer Video World Model in Minecraft

by Georgy Savva et al.

27 Feb 2026
Soft Contamination Means Benchmarks Test Shallow Generalization

by Ari Spiesberger et al.

17 Feb 2026
Visually Prompted Benchmarks Are Surprisingly Fragile

by Haiwen Feng et al.

19 Jan 2026
BabyVision: Visual Reasoning Beyond Language

by Liang Chen et al.

19 Jan 2026
Vision Encoders in Vision-Language Models: A Survey

by Han Xiao

3 Jan 2026
Next-Embedding Prediction Makes Strong Vision Learners

by Sihan Xu et al.

25 Dec 2025
What Kind of Reasoning (if any) is an LLM actually doing? On the Stochastic Nature and Abductive Appearance of Large Language Models

by Luciano Floridi et al.

16 Dec 2025
Olmo 3

by Team Olmo

15 Dec 2025
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

by Charlie Zhang et al.

10 Dec 2025
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

by Weihao Tan et al.

20 Nov 2025
Questioning the Stability of Visual Question Answering

by Amir Rosenfeld et al.

20 Nov 2025
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning

by Yanqing Liu et al.

4 Sept 2025
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

by Alex Cloud et al.

12 Aug 2025
The Term 'Agent' Has Been Diluted Beyond Utility and Requires Redefinition

by Brinnae Bent

12 Aug 2025
Vision Language Models are Biased

by An Vo et al.

12 Aug 2025
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

by GLM-4. 5 Team et al.

12 Aug 2025
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

by Han Lin et al.

12 Aug 2025
Seed1.5-VL Technical Report

by Dong Guo et al.

24 May 2025
Emerging Properties in Unified Multimodal Pretraining

by Chaorui Deng et al.

24 May 2025
Harnessing the Universal Geometry of Embeddings

by Rishi Jha et al.

24 May 2025
Transfer between Modalities with MetaQueries

by Xichen Pan et al.

18 May 2025
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging

by Shiqi Chen et al.

18 May 2025
Rethinking Visual Layer Selection in Multimodal LLMs

by Haoran Chen et al.

18 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network

by Daniel Bolya et al.

25 Apr 2025
Scaling Laws for Native Multimodal Models

by Mustafa Shukor et al.

25 Apr 2025
Science-T2I: Addressing Scientific Illusions in Image Synthesis

by Jialuo Li et al.

25 Apr 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

by Xu Ma et al.

25 Apr 2025
Scaling Language-Free Visual Representation Learning

by David Fan et al.

7 Apr 2025
A Decade's Battle on Dataset Bias: Are We There Yet?

by Zhuang Liu et al.

7 Apr 2025
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models

by Samuel Stevens et al.

27 Feb 2025
Pretrained Transformers as Universal Computation Engines

by Kevin Lu et al.

24 Feb 2025
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

by Jonas Geiping et al.

10 Feb 2025
Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

by Paul Gavrikov et al.

1 Feb 2025
Open Problems in Mechanistic Interpretability

by Lee Sharkey et al.

1 Feb 2025
Why Do We Need Weight Decay in Modern Deep Learning?

by Francesco D'Angelo et al.

23 Jan 2025
Vision-Language Models Do Not Understand Negation

by Kumail Alhamoud et al.

23 Jan 2025
ICONS: Influence Consensus for Vision-Language Data Selection

by Xindi Wu et al.

11 Jan 2025
The GAN is dead; long live the GAN! A Modern GAN Baseline

by Yiwen Huang et al.

11 Jan 2025
The Unbearable Slowness of Being: Why do we live at 10 bits/s?

by Jieyu Zheng et al.

31 Dec 2024
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

by Yohan Mathew et al.

30 Dec 2024
Analyzing (In)Abilities of SAEs via Formal Languages

by Abhinav Menon et al.

29 Dec 2024
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

by Shengbang Tong et al.

29 Dec 2024