Publikation

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Sikuan Yan; Xiufeng Yang; Zuchao Huang; Ercong Nie; Zifeng Ding; Zonggen Li; Xiaowen Ma; Jinhe Bi; Kristian Kersting; Jeff Z. Pan; Hinrich Schütze; Volker Tresp; Yunpu Ma

In: Maria Liakata; Viviane P. Moreira; Jiajun Zhang; David Jurgens (Hrsg.). Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2026, San Diego, California, United States, July 2-7, 2026. Annual Meeting of the Association for Computational Linguistics (ACL), Pages 12805-12825, Association for Computational Linguistics, 2026.

Zusammenfassung

Large Language Models (LLMs) have demon- strated impressive capabilities across a wide range of NLP tasks, but they remain funda- mentally stateless, constrained by limited con- text windows that hinder long-horizon reason- ing. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learn- ing (RL) framework that equips LLMs with the ability to actively manage and utilize exter- nal memory through two specialized agents: a Memory Manager that learns structured oper- ations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre- selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outper- forms strong baselines and generalizes across diverse question types, three benchmarks (Lo- CoMo, MSC, LongMemEval), and multiple model scales (3B–14B).

Weitere Links

https://aclanthology.org/2026.acl-long.583/

2508.19828v5.pdf (pdf, 3 MB )