LLM Agent Frameworks
LLM agents extend language models beyond text generation by enabling them to interact with external tools, plan multi-step tasks, and maintain memory across interactions. This tutorial covers the foundations and practical implementation of agentic systems.
What are LLM Agents?
A system that uses a large language model as a core reasoning engine, augmented with the ability to observe environments, take actions using tools, and maintain state across interactions to accomplish complex goals.
The agent loop consists of:
- Observe: Receive input from the environment
- Think: Reason about the current state and goals
- Act: Take an action using available tools
- Reflect: Evaluate the outcome and update internal state
ReAct (Reasoning + Acting)
ReAct interleaves reasoning traces with actions:
ReAct Action-Observation Loop
Here,
- =
- =
- =
from typing import List, Dict, Callable, Optional
import json
class ReActAgent:
def __init__(
self,
llm,
tools: Dict[str, Callable],
max_steps: int = 10,
verbose: bool = True
):
self.llm = llm
self.tools = tools
self.max_steps = max_steps
self.verbose = verbose
def run(self, task: str) -> str:
"""Execute task using ReAct framework."""
history = []
current_state = f"Task: {task}\n\n"
for step in range(self.max_steps):
# Generate next action
prompt = self._build_prompt(current_state, history)
response = self.llm.generate(prompt)
# Parse action from response
action = self._parse_action(response)
if action["type"] == "finish":
return action["output"]
# Execute action
if action["tool"] in self.tools:
observation = self.tools[action["tool"]](**action["args"])
else:
observation = f"Error: Unknown tool '{action['tool']}'"
# Update history
history.append({
"thought": action.get("thought", ""),
"action": action["tool"],
"action_input": action["args"],
"observation": observation
})
if self.verbose:
print(f"Step {step + 1}:")
print(f" Thought: {action.get('thought', 'N/A')}")
print(f" Action: {action['tool']}")
print(f" Observation: {observation}\n")
current_state += f"Observation: {observation}\n\n"
return "Max steps reached without completion"
def _build_prompt(self, state: str, history: List[Dict]) -> str:
tools_desc = "\n".join([
f"- {name}: {func.__doc__ or 'No description'}"
for name, func in self.tools.items()
])
history_text = ""
for h in history:
history_text += f"Thought: {h['thought']}\n"
history_text += f"Action: {h['action']}\n"
history_text += f"Action Input: {json.dumps(h['action_input'])}\n"
history_text += f"Observation: {h['observation']}\n\n"
return f"""You are a helpful assistant that solves tasks step by step.
Available tools:
{tools_desc}
{state}
{history_text}What should you do next? Use this format:
Thought: [your reasoning about what to do next]
Action: [tool name]
Action Input: [JSON arguments for the tool]
Or if you have the final answer:
Thought: [final reasoning]
Final Answer: [your answer to the task]"""
def _parse_action(self, response: str) -> Dict:
"""Parse action from LLM response."""
lines = response.strip().split("\n")
result = {"type": "action"}
for line in lines:
if line.startswith("Thought:"):
result["thought"] = line[len("Thought:"):].strip()
elif line.startswith("Action:"):
result["tool"] = line[len("Action:"):].strip()
elif line.startswith("Action Input:"):
try:
result["args"] = json.loads(line[len("Action Input:"):].strip())
except json.JSONDecodeError:
result["args"] = {"input": line[len("Action Input:"):].strip()}
elif line.startswith("Final Answer:"):
result["type"] = "finish"
result["output"] = line[len("Final Answer:"):].strip()
return result
Function Calling
Modern LLMs support structured function calling:
import json
from typing import Any, Dict, List
class FunctionCallingAgent:
def __init__(self, llm, functions: List[Dict[str, Any]]):
self.llm = llm
self.functions = {f["name"]: f for f in functions}
def run(self, task: str, max_iterations: int = 5) -> str:
messages = [{"role": "user", "content": task}]
for _ in range(max_iterations):
response = self.llm.chat(messages, functions=self.functions)
if response.get("function_call"):
func_name = response["function_call"]["name"]
func_args = json.loads(response["function_call"]["arguments"])
# Execute function
result = self._execute_function(func_name, func_args)
messages.append(response)
messages.append({
"role": "function",
"name": func_name,
"content": json.dumps(result)
})
else:
return response["content"]
return "Max iterations reached"
def _execute_function(self, name: str, args: Dict) -> Any:
if name in self.functions:
func = self.functions[name]["function"]
return func(**args)
return {"error": f"Unknown function: {name}"}
# Define functions for the agent
functions = [
{
"name": "search_web",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
},
"function": lambda query: {"results": [f"Result for: {query}"]}
},
{
"name": "calculate",
"description": "Perform mathematical calculation",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression"}
},
"required": ["expression"]
},
"function": lambda expression: {"result": eval(expression)}
}
]
Toolformer
Toolformer trains LLMs to use tools autonomously:
Toolformer (Schick et al., 2023) fine-tunes a language model to decide when and how to use tools (calculators, search engines, etc.) by learning to insert API calls into its generation. The model is trained to minimize perplexity while using tools effectively.
Toolformer Perplexity Improvement
Here,
- =
- =
class ToolformerTrainer:
def __init__(self, base_model, tokenizer):
self.model = base_model
self.tokenizer = tokenizer
def prepare_training_data(self, texts: List[str], tools: List[Dict]):
"""Insert tool calls into training data."""
training_examples = []
for text in texts:
# Identify potential tool call positions
positions = self._find_tool_positions(text)
for pos in positions:
# Generate tool call
tool_call = self._generate_tool_call(
text[:pos], text[pos:], tools
)
if tool_call and self._is_useful(tool_call, text):
# Create training example with tool call
example = self._insert_tool_call(text, pos, tool_call)
training_examples.append(example)
return training_examples
def _is_useful(self, tool_call: Dict, context: str) -> bool:
"""Check if tool call reduces perplexity."""
# Compute perplexity without tool
ppl_without = self._compute_perplexity(context)
# Compute perplexity with tool result
text_with_tool = context + tool_call["call"] + tool_call["result"]
ppl_with = self._compute_perplexity(text_with_tool)
return ppl_with < ppl_without
Memory Systems
Short-term Memory
Working memory for current task context:
from collections import deque
class ShortTermMemory:
def __init__(self, max_size: int = 20):
self.buffer = deque(maxlen=max_size)
self.summary = ""
def add(self, observation: str, action: str, result: str):
self.buffer.append({
"observation": observation,
"action": action,
"result": result
})
def get_context(self) -> str:
context = self.summary + "\n\nRecent interactions:\n"
for item in self.buffer:
context += f"- {item['observation']} → {item['action']}\n"
return context
def summarize(self, llm):
"""Summarize buffer contents to save context."""
if len(self.buffer) > 10:
content = "\n".join([
f"Obs: {b['observation']}, Act: {b['action']}, Res: {b['result']}"
for b in self.buffer
])
self.summary = llm.generate(f"Summarize these interactions:\n{content}")
self.buffer.clear()
Long-term Memory
Persistent memory across sessions:
import chromadb
class LongTermMemory:
def __init__(self, collection_name: str = "agent_memory"):
self.client = chromadb.Client()
self.collection = self.client.create_collection(collection_name)
def store(self, content: str, metadata: Dict = None):
"""Store information in long-term memory."""
self.collection.add(
documents=[content],
metadatas=[metadata or {}],
ids=[f"mem_{self.collection.count()}"]
)
def retrieve(self, query: str, k: int = 5) -> List[str]:
"""Retrieve relevant memories."""
results = self.collection.query(
query_texts=[query],
n_results=k
)
return results["documents"][0] if results["documents"] else []
def search_and_retrieve(self, context: str, query: str) -> str:
"""Search memory and format for context."""
memories = self.retrieve(f"{context} {query}")
return "\n".join([f"- {m}" for m in memories])
Agent Frameworks
LangChain
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
def calculator(expression: str) -> str:
"""Calculate mathematical expressions."""
return str(eval(expression))
def search(query: str) -> str:
"""Search the web."""
return f"Search results for: {query}"
tools = [
Tool(name="Calculator", func=calculator, description="Calculate math"),
Tool(name="Search", func=search, description="Search the web")
]
llm = OpenAI(temperature=0)
agent = initialize_agent(tools, llm, agent="react-agent", verbose=True)
result = agent.run("What is 2 + 2 and what is the capital of France?")
LlamaIndex
from llama_index.agent import ReActAgent
from llama_index.tools import ToolMetadata
from llama_index.llms import OpenAI
def get_weather(city: str) -> str:
"""Get weather for a city."""
return f"Weather in {city}: 72°F, sunny"
tools = [
ToolMetadata(
name="get_weather",
description="Get current weather for a city",
fn=get_weather
)
]
llm = OpenAI(model="gpt-4")
agent = ReActAgent.from_tools(tools, llm=llm, verbose=True)
response = agent.chat("What's the weather in Boston?")
CrewAI
from crewai import Agent, Task, Crew
from crewai_tools import SerperDevTool
search_tool = SerperDevTool()
researcher = Agent(
role="Researcher",
goal="Find accurate information",
backstory="Expert researcher with attention to detail",
tools=[search_tool],
verbose=True
)
writer = Agent(
role="Writer",
goal="Write compelling content",
backstory="Experienced writer with clear communication",
verbose=True
)
research_task = Task(
description="Research the latest AI developments",
agent=researcher,
expected_output="Comprehensive research summary"
)
writing_task = Task(
description="Write a blog post based on research",
agent=writer,
expected_output="1000-word blog post"
)
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
verbose=True
)
result = crew.kickoff()
Practical Agent Implementation
class AutonomousAgent:
def __init__(self, llm, tools: List[Callable], memory: ShortTermMemory):
self.llm = llm
self.tools = {t.__name__: t for t in tools}
self.memory = memory
self.goals = []
self.plan = []
def set_goal(self, goal: str):
self.goals.append(goal)
self.plan = self._create_plan(goal)
def _create_plan(self, goal: str) -> List[str]:
"""Create multi-step plan for achieving goal."""
prompt = f"""Create a step-by-step plan to achieve this goal:
{goal}
Available tools: {list(self.tools.keys())}
Plan (list steps as numbered items):"""
response = self.llm.generate(prompt)
steps = [s.strip() for s in response.split("\n") if s.strip().startswith(("1.", "2.", "3."))]
return steps
def execute_step(self) -> bool:
"""Execute next step in plan."""
if not self.plan:
return False
current_step = self.plan[0]
context = self.memory.get_context()
prompt = f"""Current task: {current_step}
Context from previous steps:
{context}
What tool should you use and with what arguments?"""
response = self.llm.generate(prompt)
action = self._parse_action(response)
if action["tool"] in self.tools:
result = self.tools[action["tool"]](**action["args"])
self.memory.add(current_step, action["tool"], str(result))
self.plan.pop(0)
return True
return False
def run(self, max_steps: int = 20):
"""Execute full plan."""
for _ in range(max_steps):
if not self.plan:
break
self.execute_step()
return self.memory.get_context()
When building agents, start with simple ReAct patterns and add complexity incrementally. Overly complex agent architectures can be difficult to debug and may underperform simpler approaches.
Summary
- LLM agents combine language models with tool use, planning, and memory
- ReAct interleaves reasoning traces with actions in an observation-action loop
- Function calling provides structured tool integration
- Toolformer trains models to autonomously decide when to use tools
- Short-term memory maintains working context; long-term memory persists across sessions
- Frameworks like LangChain, LlamaIndex, and CrewAI simplify agent development
- Start simple and add complexity incrementally for best results
Practice Exercises
-
ReAct Agent: Implement a ReAct agent with calculator and search tools. Test it on multi-step reasoning tasks.
-
Function Calling: Build a function-calling agent with 5 tools. Compare its performance with ReAct.
-
Memory System: Implement both short-term and long-term memory. Test how memory affects agent performance on long tasks.
-
Multi-Agent: Create a two-agent system where one agent researches and another writes. Compare with single-agent performance.
-
Tool Learning: Implement a simplified Toolformer that learns to insert tool calls. Measure perplexity improvement.
Previous: 19 - Mixture of Experts ← | Next: 21 - Instruction Tuning →