loading...

Month: <span>November 2024</span>

Latest Posts
Pause Tokens for Transformers: Making Language Models Think Before Speaking

Explore how adding special pause tokens during training and inference gives language models extra computation cycles, improving performance across tasks without increasing model size.

Uncovering Mesa-Optimization in Transformers

Explore Google DeepMind’s research revealing how large language models develop internal optimization algorithms that enable in-context learning without parameter updates.