CW

Memory Systems for Agents

LLM AgentsAgent MemoryFree Lesson

Advertisement

LLM Agents

Memory Systems for Agents — Learning from Experience

Without memory, agents are amnesiacs — every conversation starts from scratch. Memory systems enable agents to remember past interactions, learn from experience, and build knowledge over time.

  • Short-Term Memory — Working memory for current conversation
  • Long-Term Memory — Persistent storage across sessions
  • Episodic Memory — Remembering specific past experiences

Memory is the foundation of intelligence.

Memory Systems for Agents

LLMs have limited context windows and no inherent ability to remember past interactions. Memory systems extend agent capabilities by providing structured storage and retrieval of information across different time scales.

DfAgent Memory

Agent memory is a structured system that stores, organizes, and retrieves information for an LLM agent. It encompasses short-term (working) memory for current context, long-term memory for persistent knowledge, and episodic memory for past experiences.

Memory Architecture

Three-Layer Memory Model

DfThree-Layer Memory

Three-layer memory organizes agent memory into: (1) Sensory buffer — immediate input, (2) Working memory — current conversation context, (3) Long-term memory — persistent storage with retrieval.

Architecture Diagram
Input -> [Sensory Buffer] -> [Working Memory] -> [Long-Term Memory]
              (1s)            (conversation)      (persistent)
                                   |                      |
                                   v                      v
                              [Attention]            [Retrieval]
                                   |                      |
                                   v                      v
                              [Action]              [Recall]
class AgentMemory:
    def __init__(self, working_memory_size=10, long_term_db=None):
        self.working_memory = []  # Recent messages
        self.working_memory_size = working_memory_size
        self.long_term_db = long_term_db or VectorDB()
        self.episodic_memory = []
    
    def store(self, message, importance=0.5):
        """Store a message in appropriate memory layer."""
        # Always add to working memory
        self.working_memory.append(message)
        if len(self.working_memory) > self.working_memory_size:
            self.working_memory.pop(0)
        
        # Store important messages in long-term memory
        if importance > 0.7:
            self.long_term_db.store(message)
        
        # Store in episodic memory
        self.episodic_memory.append({
            "message": message,
            "timestamp": time.time(),
            "importance": importance
        })
    
    def retrieve(self, query, k=5):
        """Retrieve relevant memories."""
        # Search long-term memory
        long_term_results = self.long_term_db.search(query, k=k)
        
        # Search episodic memory
        episodic_results = self.search_episodic(query, k=k)
        
        # Combine with working memory
        all_memories = (
            long_term_results + 
            episodic_results + 
            self.working_memory
        )
        
        return all_memories[:k]

Short-Term (Working) Memory

Context Window Management

DfWorking Memory

Working memory is the agent's current context window — the limited set of information the agent can actively process. It includes the conversation history, retrieved context, and current task state.

class WorkingMemory:
    def __init__(self, max_tokens=4096):
        self.max_tokens = max_tokens
        self.messages = []
        self.total_tokens = 0
    
    def add(self, message):
        """Add a message, evicting old messages if necessary."""
        msg_tokens = count_tokens(message)
        
        self.messages.append(message)
        self.total_tokens += msg_tokens
        
        # Evict oldest messages if over limit
        while self.total_tokens > self.max_tokens and len(self.messages) > 1:
            removed = self.messages.pop(0)
            self.total_tokens -= count_tokens(removed)
    
    def get_context(self):
        """Get formatted context for the LLM."""
        return "\n".join([f"{m['role']}: {m['content']}" for m in self.messages])

Summarization for Memory Compression

def summarize_and_compress(memory, llm, target_tokens=500):
    """Summarize old messages to compress working memory."""
    if memory.total_tokens <= target_tokens:
        return memory.get_context()
    
    # Summarize older messages
    old_messages = memory.messages[:-5]  # Keep last 5 messages
    recent_messages = memory.messages[-5:]
    
    summary = llm.generate(
        f"Summarize this conversation concisely:\n"
        + "\n".join([f"{m['role']}: {m['content']}" for m in old_messages])
    )
    
    compressed = f"[Summary]: {summary}\n" + "\n".join(
        [f"{m['role']}: {m['content']}" for m in recent_messages]
    )
    return compressed

Long-Term Memory

Vector Database Storage

DfLong-Term Memory

Long-term memory stores information persistently using a vector database. Messages are embedded and indexed for efficient similarity search, enabling the agent to recall relevant past information.

import chromadb

class LongTermMemory:
    def __init__(self, collection_name="agent_memory"):
        self.client = chromadb.Client()
        self.collection = self.client.create_collection(collection_name)
    
    def store(self, message, metadata=None):
        """Store a message in long-term memory."""
        embedding = embed_text(message["content"])
        self.collection.add(
            documents=[message["content"]],
            embeddings=[embedding],
            metadatas=[metadata or {"role": message["role"]}],
            ids=[f"msg_{hash(message['content'])}"]
        )
    
    def retrieve(self, query, k=5):
        """Retrieve relevant memories."""
        query_embedding = embed_text(query)
        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=k
        )
        return results["documents"][0]

Episodic Memory

Experience Storage

DfEpisodic Memory

Episodic memory stores specific experiences (conversations, tasks, outcomes) indexed by context and time. It enables agents to recall "what happened last time" in similar situations.

class EpisodicMemory:
    def __init__(self):
        self.episodes = []
    
    def record_episode(self, situation, actions, outcome, success):
        """Record a complete episode."""
        self.episodes.append({
            "situation": situation,
            "actions": actions,
            "outcome": outcome,
            "success": success,
            "timestamp": time.time()
        })
    
    def recall_similar(self, current_situation, k=3):
        """Find episodes with similar situations."""
        similarities = []
        for episode in self.episodes:
            sim = compute_similarity(current_situation, episode["situation"])
            similarities.append((episode, sim))
        
        similarities.sort(key=lambda x: x[1], reverse=True)
        return [ep for ep, sim in similarities[:k]]
    
    def learn_from_experience(self, current_situation, llm):
        """Use past episodes to inform current decisions."""
        similar_episodes = self.recall_similar(current_situation)
        
        prompt = f"""Based on similar past experiences, what should I do?

Current situation: {current_situation}

Similar past experiences:
{json.dumps(similar_episodes, indent=2)}

Recommendation:"""
        
        return llm.generate(prompt)

Retrieval-Augmented Memory

Memory Retrieval Pipeline

class MemoryRetriever:
    def __init__(self, long_term, episodic, working):
        self.long_term = long_term
        self.episodic = episodic
        self.working = working
    
    def retrieve_for_query(self, query, k=10):
        """Retrieve from all memory sources."""
        results = []
        
        # Working memory (always relevant)
        for msg in self.working.messages:
            results.append({"source": "working", "content": msg["content"], "score": 1.0})
        
        # Long-term memory (similarity-based)
        lt_results = self.long_term.retrieve(query, k=k)
        for doc in lt_results:
            results.append({"source": "long_term", "content": doc, "score": 0.8})
        
        # Episodic memory (situation-based)
        ep_results = self.episodic.recall_similar(query, k=k)
        for ep in ep_results:
            results.append({"source": "episodic", "content": str(ep), "score": 0.7})
        
        # Rank by relevance
        ranked = rank_by_relevance(query, results)
        return ranked[:k]

Memory Consolidation

From Working to Long-Term

DfMemory Consolidation

Memory consolidation is the process of moving important information from working memory to long-term memory. This happens periodically based on importance scoring and novelty detection.

def consolidate_memory(memory, importance_threshold=0.7):
    """Move important working memory items to long-term storage."""
    for message in memory.working_memory:
        importance = score_importance(message)
        if importance > importance_threshold:
            memory.long_term.store(message, metadata={"importance": importance})
    
    # Also consolidate episodic memory
    for episode in memory.episodic_memory:
        if episode["importance"] > importance_threshold:
            memory.long_term.store(
                {"role": "system", "content": json.dumps(episode)},
                metadata={"type": "episode", "success": episode.get("success")}
            )

Practice Exercises

  1. Working Memory: Implement a working memory system with summarization compression. How does summarization affect conversation quality?

  2. Long-Term Retrieval: Build a long-term memory system with vector search. Test retrieval accuracy on a set of 100 past conversations.

  3. Episodic Learning: Implement episodic memory for a task-completion agent. Does recalling similar past episodes improve performance?

  4. Memory Consolidation: Design an importance scoring function for memory consolidation. What signals indicate a message should be stored long-term?

Key Takeaways

Summary: Memory Systems for Agents

  • Working memory holds current conversation context (limited by context window)
  • Long-term memory provides persistent storage with vector-based retrieval
  • Episodic memory stores specific experiences indexed by situation
  • Memory compression summarizes old messages to fit context limits
  • Memory consolidation moves important items from working to long-term memory
  • Multi-source retrieval combines memories from all layers
  • Importance scoring determines what to store long-term
  • Recall enables learning from past experiences

What to Learn Next

-> LLM Agent Frameworks Building autonomous agents with LLMs.

-> Long Context and Context Window Handling very long sequences.

-> Retrieval-Augmented Generation RAG fundamentals and basic implementation.

-> Planning and Reasoning in Agents How agents plan and execute multi-step tasks.

-> Agent Evaluation and Safety Measuring and ensuring agent safety.

-> Multi-Agent Systems Coordinating multiple agents.

Advertisement

Need Expert LLM Help?

Get personalized tutoring, RAG system design, or production LLM consulting.

Advertisement