Trends-US

Alibaba Qwen Wins “NeurIPS 2025 Best Paper Award” for Breakthrough in Attention Mechanisms

|

Published on Dec. 1, 2025

The Alibaba Qwen team has received the prestigious “NeurIPS 2025 Best Paper Award” at the Conference on Neural Information Processing Systems (NeurIPS), one of the world’s most premier conferences in machine learning and artificial intelligence. The award recognizes the team’s pioneering research on attention mechanisms in large language models (LLMs).

The winning paper, titled “Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free”, is the first in the industry to systematically examine how attention gating affects the performance and training of large models.

Gating, a mechanism that controls the flow of information through the network, is one of the most widely used techniques in LLM architectures. Functioning like “intelligent noise-canceling headphones” for a model, it helps filter out irrelevant information and boosts overall effectiveness.

To rigorously evaluate the role of gating, the Qwen team conducted an extensive study, comparing over 30 variants of 15B Mixture-of-Experts (MoE) models and 1.7B dense models trained on a 3.5-trillion-token dataset. Research results show that a simple architectural modification – adding a head-specific sigmoid gate after Scaled Dot-Product Attention (SDPA) – consistently improves model performance. This modification enhances training stability, allows for larger learning rates, and improves scaling properties.

These findings have already been incorporated into the Qwen3-Next model released in September 2025, which introduced architectural innovations by replacing standard attention with a combination of Gated DeltaNet and Gated Attention. This design improves in-context learning capabilities while increasing computational efficiency.

To support further research and community adoption, the Qwen team has already released related codes and models on Github and HuggingFace.

“The main recommendation of the paper is easily implemented, and given the extensive evidence provided in the paper for this modification to LLM architecture, we expect this idea to be widely adopted,” commented by the NeurIPS Selection Committee.

“This paper represents a substantial amount of work that is possible only with access to industrial scale computing resources, and the authors’ sharing of the results of their work, which will advance the community’s understanding of attention in large language models, is highly commendable, especially in an environment where there has been a move away from open sharing of scientific results around LLMs.” added the Selection Committee.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button