Publication
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Sikuan Yan; Xiufeng Yang; Zuchao Huang; Ercong Nie; Zifeng Ding; Zonggen Li; Xiaowen Ma; Jinhe Bi; Kristian Kersting; Jeff Z. Pan; Hinrich Schütze; Volker Tresp; Yunpu Ma
In: Maria Liakata; Viviane P. Moreira; Jiajun Zhang; David Jurgens (Hrsg.). Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2026, San Diego, California, United States, July 2-7, 2026. Annual Meeting of the Association for Computational Linguistics (ACL), Pages 12805-12825, Association for Computational Linguistics, 2026.
Abstract
Large Language Models (LLMs) have demon-
strated impressive capabilities across a wide
range of NLP tasks, but they remain funda-
mentally stateless, constrained by limited con-
text windows that hinder long-horizon reason-
ing. Recent efforts to address this limitation
often augment LLMs with an external memory
bank, yet most existing pipelines are static and
heuristic-driven, lacking a learned mechanism
for deciding what to store, update, or retrieve.
We present Memory-R1, a reinforcement learn-
ing (RL) framework that equips LLMs with
the ability to actively manage and utilize exter-
nal memory through two specialized agents: a
Memory Manager that learns structured oper-
ations, including ADD, UPDATE, DELETE,
and NOOP; and an Answer Agent that pre-
selects and reasons over relevant entries. Both
agents are fine-tuned with outcome-driven RL
(PPO and GRPO), enabling adaptive memory
management with minimal supervision. With
only 152 training QA pairs, Memory-R1 outper-
forms strong baselines and generalizes across
diverse question types, three benchmarks (Lo-
CoMo, MSC, LongMemEval), and multiple
model scales (3B–14B).
