LLM Agent Frameworks

SystemsAgentsFree Lesson

Advertisement

LLM Agent Frameworks

LLM agents extend language models beyond text generation by enabling them to interact with external tools, plan multi-step tasks, and maintain memory across interactions. This tutorial covers the foundations and practical implementation of agentic systems.

What are LLM Agents?

A system that uses a large language model as a core reasoning engine, augmented with the ability to observe environments, take actions using tools, and maintain state across interactions to accomplish complex goals.

The agent loop consists of:

  1. Observe: Receive input from the environment
  2. Think: Reason about the current state and goals
  3. Act: Take an action using available tools
  4. Reflect: Evaluate the outcome and update internal state

ReAct (Reasoning + Acting)

ReAct interleaves reasoning traces with actions:

ReAct Action-Observation Loop

textStatet+1=textAgent(textStatet,textObservationt)\\text{State}_{t+1} = \\text{Agent}(\\text{State}_t, \\text{Observation}_t)

Here,

  • =
  • =
  • =
from typing import List, Dict, Callable, Optional
import json

class ReActAgent:
    def __init__(
        self,
        llm,
        tools: Dict[str, Callable],
        max_steps: int = 10,
        verbose: bool = True
    ):
        self.llm = llm
        self.tools = tools
        self.max_steps = max_steps
        self.verbose = verbose
    
    def run(self, task: str) -> str:
        """Execute task using ReAct framework."""
        history = []
        current_state = f"Task: {task}\n\n"
        
        for step in range(self.max_steps):
            # Generate next action
            prompt = self._build_prompt(current_state, history)
            response = self.llm.generate(prompt)
            
            # Parse action from response
            action = self._parse_action(response)
            
            if action["type"] == "finish":
                return action["output"]
            
            # Execute action
            if action["tool"] in self.tools:
                observation = self.tools[action["tool"]](**action["args"])
            else:
                observation = f"Error: Unknown tool '{action['tool']}'"
            
            # Update history
            history.append({
                "thought": action.get("thought", ""),
                "action": action["tool"],
                "action_input": action["args"],
                "observation": observation
            })
            
            if self.verbose:
                print(f"Step {step + 1}:")
                print(f"  Thought: {action.get('thought', 'N/A')}")
                print(f"  Action: {action['tool']}")
                print(f"  Observation: {observation}\n")
            
            current_state += f"Observation: {observation}\n\n"
        
        return "Max steps reached without completion"
    
    def _build_prompt(self, state: str, history: List[Dict]) -> str:
        tools_desc = "\n".join([
            f"- {name}: {func.__doc__ or 'No description'}"
            for name, func in self.tools.items()
        ])
        
        history_text = ""
        for h in history:
            history_text += f"Thought: {h['thought']}\n"
            history_text += f"Action: {h['action']}\n"
            history_text += f"Action Input: {json.dumps(h['action_input'])}\n"
            history_text += f"Observation: {h['observation']}\n\n"
        
        return f"""You are a helpful assistant that solves tasks step by step.

Available tools:
{tools_desc}

{state}

{history_text}What should you do next? Use this format:

Thought: [your reasoning about what to do next]
Action: [tool name]
Action Input: [JSON arguments for the tool]

Or if you have the final answer:
Thought: [final reasoning]
Final Answer: [your answer to the task]"""
    
    def _parse_action(self, response: str) -> Dict:
        """Parse action from LLM response."""
        lines = response.strip().split("\n")
        result = {"type": "action"}
        
        for line in lines:
            if line.startswith("Thought:"):
                result["thought"] = line[len("Thought:"):].strip()
            elif line.startswith("Action:"):
                result["tool"] = line[len("Action:"):].strip()
            elif line.startswith("Action Input:"):
                try:
                    result["args"] = json.loads(line[len("Action Input:"):].strip())
                except json.JSONDecodeError:
                    result["args"] = {"input": line[len("Action Input:"):].strip()}
            elif line.startswith("Final Answer:"):
                result["type"] = "finish"
                result["output"] = line[len("Final Answer:"):].strip()
        
        return result

Function Calling

Modern LLMs support structured function calling:

import json
from typing import Any, Dict, List

class FunctionCallingAgent:
    def __init__(self, llm, functions: List[Dict[str, Any]]):
        self.llm = llm
        self.functions = {f["name"]: f for f in functions}
    
    def run(self, task: str, max_iterations: int = 5) -> str:
        messages = [{"role": "user", "content": task}]
        
        for _ in range(max_iterations):
            response = self.llm.chat(messages, functions=self.functions)
            
            if response.get("function_call"):
                func_name = response["function_call"]["name"]
                func_args = json.loads(response["function_call"]["arguments"])
                
                # Execute function
                result = self._execute_function(func_name, func_args)
                
                messages.append(response)
                messages.append({
                    "role": "function",
                    "name": func_name,
                    "content": json.dumps(result)
                })
            else:
                return response["content"]
        
        return "Max iterations reached"
    
    def _execute_function(self, name: str, args: Dict) -> Any:
        if name in self.functions:
            func = self.functions[name]["function"]
            return func(**args)
        return {"error": f"Unknown function: {name}"}

# Define functions for the agent
functions = [
    {
        "name": "search_web",
        "description": "Search the web for information",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        },
        "function": lambda query: {"results": [f"Result for: {query}"]}
    },
    {
        "name": "calculate",
        "description": "Perform mathematical calculation",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression"}
            },
            "required": ["expression"]
        },
        "function": lambda expression: {"result": eval(expression)}
    }
]

Toolformer

Toolformer trains LLMs to use tools autonomously:

Toolformer (Schick et al., 2023) fine-tunes a language model to decide when and how to use tools (calculators, search engines, etc.) by learning to insert API calls into its generation. The model is trained to minimize perplexity while using tools effectively.

Toolformer Perplexity Improvement

DeltatextPPL=textPPLtextwithouttoolstextPPLtextwithtools\\Delta \\text{PPL} = \\text{PPL}_{\\text{without tools}} - \\text{PPL}_{\\text{with tools}}

Here,

  • =
  • =
class ToolformerTrainer:
    def __init__(self, base_model, tokenizer):
        self.model = base_model
        self.tokenizer = tokenizer
    
    def prepare_training_data(self, texts: List[str], tools: List[Dict]):
        """Insert tool calls into training data."""
        training_examples = []
        
        for text in texts:
            # Identify potential tool call positions
            positions = self._find_tool_positions(text)
            
            for pos in positions:
                # Generate tool call
                tool_call = self._generate_tool_call(
                    text[:pos], text[pos:], tools
                )
                
                if tool_call and self._is_useful(tool_call, text):
                    # Create training example with tool call
                    example = self._insert_tool_call(text, pos, tool_call)
                    training_examples.append(example)
        
        return training_examples
    
    def _is_useful(self, tool_call: Dict, context: str) -> bool:
        """Check if tool call reduces perplexity."""
        # Compute perplexity without tool
        ppl_without = self._compute_perplexity(context)
        
        # Compute perplexity with tool result
        text_with_tool = context + tool_call["call"] + tool_call["result"]
        ppl_with = self._compute_perplexity(text_with_tool)
        
        return ppl_with < ppl_without

Memory Systems

Short-term Memory

Working memory for current task context:

from collections import deque

class ShortTermMemory:
    def __init__(self, max_size: int = 20):
        self.buffer = deque(maxlen=max_size)
        self.summary = ""
    
    def add(self, observation: str, action: str, result: str):
        self.buffer.append({
            "observation": observation,
            "action": action,
            "result": result
        })
    
    def get_context(self) -> str:
        context = self.summary + "\n\nRecent interactions:\n"
        for item in self.buffer:
            context += f"- {item['observation']} → {item['action']}\n"
        return context
    
    def summarize(self, llm):
        """Summarize buffer contents to save context."""
        if len(self.buffer) > 10:
            content = "\n".join([
                f"Obs: {b['observation']}, Act: {b['action']}, Res: {b['result']}"
                for b in self.buffer
            ])
            self.summary = llm.generate(f"Summarize these interactions:\n{content}")
            self.buffer.clear()

Long-term Memory

Persistent memory across sessions:

import chromadb

class LongTermMemory:
    def __init__(self, collection_name: str = "agent_memory"):
        self.client = chromadb.Client()
        self.collection = self.client.create_collection(collection_name)
    
    def store(self, content: str, metadata: Dict = None):
        """Store information in long-term memory."""
        self.collection.add(
            documents=[content],
            metadatas=[metadata or {}],
            ids=[f"mem_{self.collection.count()}"]
        )
    
    def retrieve(self, query: str, k: int = 5) -> List[str]:
        """Retrieve relevant memories."""
        results = self.collection.query(
            query_texts=[query],
            n_results=k
        )
        return results["documents"][0] if results["documents"] else []
    
    def search_and_retrieve(self, context: str, query: str) -> str:
        """Search memory and format for context."""
        memories = self.retrieve(f"{context} {query}")
        return "\n".join([f"- {m}" for m in memories])

Agent Frameworks

LangChain

from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI

def calculator(expression: str) -> str:
    """Calculate mathematical expressions."""
    return str(eval(expression))

def search(query: str) -> str:
    """Search the web."""
    return f"Search results for: {query}"

tools = [
    Tool(name="Calculator", func=calculator, description="Calculate math"),
    Tool(name="Search", func=search, description="Search the web")
]

llm = OpenAI(temperature=0)
agent = initialize_agent(tools, llm, agent="react-agent", verbose=True)
result = agent.run("What is 2 + 2 and what is the capital of France?")

LlamaIndex

from llama_index.agent import ReActAgent
from llama_index.tools import ToolMetadata
from llama_index.llms import OpenAI

def get_weather(city: str) -> str:
    """Get weather for a city."""
    return f"Weather in {city}: 72°F, sunny"

tools = [
    ToolMetadata(
        name="get_weather",
        description="Get current weather for a city",
        fn=get_weather
    )
]

llm = OpenAI(model="gpt-4")
agent = ReActAgent.from_tools(tools, llm=llm, verbose=True)
response = agent.chat("What's the weather in Boston?")

CrewAI

from crewai import Agent, Task, Crew
from crewai_tools import SerperDevTool

search_tool = SerperDevTool()

researcher = Agent(
    role="Researcher",
    goal="Find accurate information",
    backstory="Expert researcher with attention to detail",
    tools=[search_tool],
    verbose=True
)

writer = Agent(
    role="Writer",
    goal="Write compelling content",
    backstory="Experienced writer with clear communication",
    verbose=True
)

research_task = Task(
    description="Research the latest AI developments",
    agent=researcher,
    expected_output="Comprehensive research summary"
)

writing_task = Task(
    description="Write a blog post based on research",
    agent=writer,
    expected_output="1000-word blog post"
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    verbose=True
)

result = crew.kickoff()

Practical Agent Implementation

class AutonomousAgent:
    def __init__(self, llm, tools: List[Callable], memory: ShortTermMemory):
        self.llm = llm
        self.tools = {t.__name__: t for t in tools}
        self.memory = memory
        self.goals = []
        self.plan = []
    
    def set_goal(self, goal: str):
        self.goals.append(goal)
        self.plan = self._create_plan(goal)
    
    def _create_plan(self, goal: str) -> List[str]:
        """Create multi-step plan for achieving goal."""
        prompt = f"""Create a step-by-step plan to achieve this goal:
{goal}

Available tools: {list(self.tools.keys())}

Plan (list steps as numbered items):"""
        
        response = self.llm.generate(prompt)
        steps = [s.strip() for s in response.split("\n") if s.strip().startswith(("1.", "2.", "3."))]
        return steps
    
    def execute_step(self) -> bool:
        """Execute next step in plan."""
        if not self.plan:
            return False
        
        current_step = self.plan[0]
        context = self.memory.get_context()
        
        prompt = f"""Current task: {current_step}

Context from previous steps:
{context}

What tool should you use and with what arguments?"""
        
        response = self.llm.generate(prompt)
        action = self._parse_action(response)
        
        if action["tool"] in self.tools:
            result = self.tools[action["tool"]](**action["args"])
            self.memory.add(current_step, action["tool"], str(result))
            self.plan.pop(0)
            return True
        
        return False
    
    def run(self, max_steps: int = 20):
        """Execute full plan."""
        for _ in range(max_steps):
            if not self.plan:
                break
            self.execute_step()
        
        return self.memory.get_context()

When building agents, start with simple ReAct patterns and add complexity incrementally. Overly complex agent architectures can be difficult to debug and may underperform simpler approaches.

Summary

  • LLM agents combine language models with tool use, planning, and memory
  • ReAct interleaves reasoning traces with actions in an observation-action loop
  • Function calling provides structured tool integration
  • Toolformer trains models to autonomously decide when to use tools
  • Short-term memory maintains working context; long-term memory persists across sessions
  • Frameworks like LangChain, LlamaIndex, and CrewAI simplify agent development
  • Start simple and add complexity incrementally for best results

Practice Exercises

  1. ReAct Agent: Implement a ReAct agent with calculator and search tools. Test it on multi-step reasoning tasks.

  2. Function Calling: Build a function-calling agent with 5 tools. Compare its performance with ReAct.

  3. Memory System: Implement both short-term and long-term memory. Test how memory affects agent performance on long tasks.

  4. Multi-Agent: Create a two-agent system where one agent researches and another writes. Compare with single-agent performance.

  5. Tool Learning: Implement a simplified Toolformer that learns to insert tool calls. Measure perplexity improvement.


Previous: 19 - Mixture of Experts ← | Next: 21 - Instruction Tuning →

Advertisement

Need Expert LLM Help?

Get personalized tutoring, RAG system design, or production LLM consulting.

Advertisement