LLM Agents
Memory Systems for Agents — Learning from Experience
Without memory, agents are amnesiacs — every conversation starts from scratch. Memory systems enable agents to remember past interactions, learn from experience, and build knowledge over time.
- Short-Term Memory — Working memory for current conversation
- Long-Term Memory — Persistent storage across sessions
- Episodic Memory — Remembering specific past experiences
Memory is the foundation of intelligence.
Memory Systems for Agents
LLMs have limited context windows and no inherent ability to remember past interactions. Memory systems extend agent capabilities by providing structured storage and retrieval of information across different time scales.
DfAgent Memory
Agent memory is a structured system that stores, organizes, and retrieves information for an LLM agent. It encompasses short-term (working) memory for current context, long-term memory for persistent knowledge, and episodic memory for past experiences.
Memory Architecture
Three-Layer Memory Model
DfThree-Layer Memory
Three-layer memory organizes agent memory into: (1) Sensory buffer — immediate input, (2) Working memory — current conversation context, (3) Long-term memory — persistent storage with retrieval.
Input -> [Sensory Buffer] -> [Working Memory] -> [Long-Term Memory]
(1s) (conversation) (persistent)
| |
v v
[Attention] [Retrieval]
| |
v v
[Action] [Recall]
class AgentMemory:
def __init__(self, working_memory_size=10, long_term_db=None):
self.working_memory = [] # Recent messages
self.working_memory_size = working_memory_size
self.long_term_db = long_term_db or VectorDB()
self.episodic_memory = []
def store(self, message, importance=0.5):
"""Store a message in appropriate memory layer."""
# Always add to working memory
self.working_memory.append(message)
if len(self.working_memory) > self.working_memory_size:
self.working_memory.pop(0)
# Store important messages in long-term memory
if importance > 0.7:
self.long_term_db.store(message)
# Store in episodic memory
self.episodic_memory.append({
"message": message,
"timestamp": time.time(),
"importance": importance
})
def retrieve(self, query, k=5):
"""Retrieve relevant memories."""
# Search long-term memory
long_term_results = self.long_term_db.search(query, k=k)
# Search episodic memory
episodic_results = self.search_episodic(query, k=k)
# Combine with working memory
all_memories = (
long_term_results +
episodic_results +
self.working_memory
)
return all_memories[:k]
Short-Term (Working) Memory
Context Window Management
DfWorking Memory
Working memory is the agent's current context window — the limited set of information the agent can actively process. It includes the conversation history, retrieved context, and current task state.
class WorkingMemory:
def __init__(self, max_tokens=4096):
self.max_tokens = max_tokens
self.messages = []
self.total_tokens = 0
def add(self, message):
"""Add a message, evicting old messages if necessary."""
msg_tokens = count_tokens(message)
self.messages.append(message)
self.total_tokens += msg_tokens
# Evict oldest messages if over limit
while self.total_tokens > self.max_tokens and len(self.messages) > 1:
removed = self.messages.pop(0)
self.total_tokens -= count_tokens(removed)
def get_context(self):
"""Get formatted context for the LLM."""
return "\n".join([f"{m['role']}: {m['content']}" for m in self.messages])
Summarization for Memory Compression
def summarize_and_compress(memory, llm, target_tokens=500):
"""Summarize old messages to compress working memory."""
if memory.total_tokens <= target_tokens:
return memory.get_context()
# Summarize older messages
old_messages = memory.messages[:-5] # Keep last 5 messages
recent_messages = memory.messages[-5:]
summary = llm.generate(
f"Summarize this conversation concisely:\n"
+ "\n".join([f"{m['role']}: {m['content']}" for m in old_messages])
)
compressed = f"[Summary]: {summary}\n" + "\n".join(
[f"{m['role']}: {m['content']}" for m in recent_messages]
)
return compressed
Long-Term Memory
Vector Database Storage
DfLong-Term Memory
Long-term memory stores information persistently using a vector database. Messages are embedded and indexed for efficient similarity search, enabling the agent to recall relevant past information.
import chromadb
class LongTermMemory:
def __init__(self, collection_name="agent_memory"):
self.client = chromadb.Client()
self.collection = self.client.create_collection(collection_name)
def store(self, message, metadata=None):
"""Store a message in long-term memory."""
embedding = embed_text(message["content"])
self.collection.add(
documents=[message["content"]],
embeddings=[embedding],
metadatas=[metadata or {"role": message["role"]}],
ids=[f"msg_{hash(message['content'])}"]
)
def retrieve(self, query, k=5):
"""Retrieve relevant memories."""
query_embedding = embed_text(query)
results = self.collection.query(
query_embeddings=[query_embedding],
n_results=k
)
return results["documents"][0]
Episodic Memory
Experience Storage
DfEpisodic Memory
Episodic memory stores specific experiences (conversations, tasks, outcomes) indexed by context and time. It enables agents to recall "what happened last time" in similar situations.
class EpisodicMemory:
def __init__(self):
self.episodes = []
def record_episode(self, situation, actions, outcome, success):
"""Record a complete episode."""
self.episodes.append({
"situation": situation,
"actions": actions,
"outcome": outcome,
"success": success,
"timestamp": time.time()
})
def recall_similar(self, current_situation, k=3):
"""Find episodes with similar situations."""
similarities = []
for episode in self.episodes:
sim = compute_similarity(current_situation, episode["situation"])
similarities.append((episode, sim))
similarities.sort(key=lambda x: x[1], reverse=True)
return [ep for ep, sim in similarities[:k]]
def learn_from_experience(self, current_situation, llm):
"""Use past episodes to inform current decisions."""
similar_episodes = self.recall_similar(current_situation)
prompt = f"""Based on similar past experiences, what should I do?
Current situation: {current_situation}
Similar past experiences:
{json.dumps(similar_episodes, indent=2)}
Recommendation:"""
return llm.generate(prompt)
Retrieval-Augmented Memory
Memory Retrieval Pipeline
class MemoryRetriever:
def __init__(self, long_term, episodic, working):
self.long_term = long_term
self.episodic = episodic
self.working = working
def retrieve_for_query(self, query, k=10):
"""Retrieve from all memory sources."""
results = []
# Working memory (always relevant)
for msg in self.working.messages:
results.append({"source": "working", "content": msg["content"], "score": 1.0})
# Long-term memory (similarity-based)
lt_results = self.long_term.retrieve(query, k=k)
for doc in lt_results:
results.append({"source": "long_term", "content": doc, "score": 0.8})
# Episodic memory (situation-based)
ep_results = self.episodic.recall_similar(query, k=k)
for ep in ep_results:
results.append({"source": "episodic", "content": str(ep), "score": 0.7})
# Rank by relevance
ranked = rank_by_relevance(query, results)
return ranked[:k]
Memory Consolidation
From Working to Long-Term
DfMemory Consolidation
Memory consolidation is the process of moving important information from working memory to long-term memory. This happens periodically based on importance scoring and novelty detection.
def consolidate_memory(memory, importance_threshold=0.7):
"""Move important working memory items to long-term storage."""
for message in memory.working_memory:
importance = score_importance(message)
if importance > importance_threshold:
memory.long_term.store(message, metadata={"importance": importance})
# Also consolidate episodic memory
for episode in memory.episodic_memory:
if episode["importance"] > importance_threshold:
memory.long_term.store(
{"role": "system", "content": json.dumps(episode)},
metadata={"type": "episode", "success": episode.get("success")}
)
Practice Exercises
-
Working Memory: Implement a working memory system with summarization compression. How does summarization affect conversation quality?
-
Long-Term Retrieval: Build a long-term memory system with vector search. Test retrieval accuracy on a set of 100 past conversations.
-
Episodic Learning: Implement episodic memory for a task-completion agent. Does recalling similar past episodes improve performance?
-
Memory Consolidation: Design an importance scoring function for memory consolidation. What signals indicate a message should be stored long-term?
Key Takeaways
Summary: Memory Systems for Agents
- Working memory holds current conversation context (limited by context window)
- Long-term memory provides persistent storage with vector-based retrieval
- Episodic memory stores specific experiences indexed by situation
- Memory compression summarizes old messages to fit context limits
- Memory consolidation moves important items from working to long-term memory
- Multi-source retrieval combines memories from all layers
- Importance scoring determines what to store long-term
- Recall enables learning from past experiences
What to Learn Next
-> LLM Agent Frameworks Building autonomous agents with LLMs.
-> Long Context and Context Window Handling very long sequences.
-> Retrieval-Augmented Generation RAG fundamentals and basic implementation.
-> Planning and Reasoning in Agents How agents plan and execute multi-step tasks.
-> Agent Evaluation and Safety Measuring and ensuring agent safety.
-> Multi-Agent Systems Coordinating multiple agents.