LLM Agents
Planning and Reasoning in Agents — Thinking Before Acting
Effective agents do not just react — they plan. Planning enables agents to break down complex goals into manageable steps, anticipate obstacles, and adapt their strategy when things go wrong.
- Task Decomposition — Break complex goals into actionable steps
- ReAct — Interleave reasoning and action for grounded decisions
- Tree-of-Thought — Explore multiple reasoning paths in parallel
Failing to plan is planning to fail.
Planning and Reasoning in Agents
Planning is the ability to formulate a sequence of actions to achieve a goal. For LLM agents, planning involves decomposing tasks, reasoning about dependencies, and selecting the best action at each step.
DfAgent Planning
Agent planning is the process of generating a sequence of actions to achieve a goal, considering available tools, environmental constraints, and potential obstacles. Planning enables agents to handle complex, multi-step tasks that require strategic thinking.
Task Decomposition
Chain-of-Thought Planning
DfChain-of-Thought Planning
Chain-of-Thought planning generates a step-by-step plan by reasoning through the problem sequentially. Each step builds on the previous one, creating a linear execution path.
def chain_of_thought_plan(goal, llm, tools):
"""Generate a step-by-step plan using chain-of-thought reasoning."""
prompt = f"""Create a detailed plan to achieve this goal:
Goal: {goal}
Available tools: {', '.join(tools.keys())}
Think step by step:
1. What information do I need?
2. What actions should I take?
3. What is the order of operations?
Plan:"""
response = llm.generate(prompt)
steps = parse_plan(response)
return steps
Hierarchical Task Network
DfHierarchical Task Network (HTN)
HTN planning decomposes tasks hierarchically. High-level tasks are broken into subtasks, which are further decomposed until reaching primitive actions that can be executed directly.
def htn_decompose(task, llm, max_depth=3):
"""Hierarchically decompose a task."""
if max_depth == 0 or is_primitive(task):
return [task]
prompt = f"""Break this task into subtasks:
Task: {task}
Subtasks (one per line):"""
response = llm.generate(prompt)
subtasks = [s.strip() for s in response.split("\n") if s.strip()]
# Recursively decompose each subtask
all_steps = []
for subtask in subtasks:
all_steps.extend(htn_decompose(subtask, llm, max_depth - 1))
return all_steps
ReAct: Reasoning + Acting
The ReAct Loop
DfReAct
ReAct (Yao et al., 2023) interleaves reasoning and action. The agent generates a thought (reasoning about what to do next), takes an action (calls a tool), observes the result, and repeats. This grounds reasoning in actual observations.
Thought: I need to find the population of France.
Action: search("population of France 2024")
Observation: France has a population of approximately 68 million.
Thought: Now I need to find the GDP.
Action: search("GDP of France 2024")
Observation: France's GDP is approximately $3.0 trillion.
Thought: I now have both pieces of information.
Action: finish("France has 68 million people and $3.0T GDP.")
def react_agent(goal, llm, tools, max_steps=10):
"""ReAct agent that interleaves reasoning and action."""
steps = []
context = f"Goal: {goal}\n\n"
for i in range(max_steps):
# Generate thought and action
prompt = context + "Thought:"
thought = llm.generate(prompt)
prompt += f" {thought}\nAction:"
action = llm.generate(prompt)
# Execute action
if action.startswith("finish("):
answer = action[8:-2]
return answer, steps
# Parse and execute tool call
tool_name, tool_input = parse_action(action)
observation = tools[tool_name].execute(tool_input)
# Update context
context += f"Thought: {thought}\nAction: {action}\nObservation: {observation}\n"
steps.append({"thought": thought, "action": action, "observation": observation})
return "Max steps reached", steps
ReAct outperforms both pure reasoning and pure action-only approaches because reasoning without action leads to hallucinations, while action without reasoning leads to suboptimal decisions.
Tree-of-Thought
DfTree-of-Thought
Tree-of-Thought (Yao et al., 2023) explores multiple reasoning paths simultaneously, evaluating each path and pruning unpromising branches. This enables more thorough exploration of the solution space.
def tree_of_thought(problem, llm, num_branches=3, max_depth=4):
"""Explore multiple reasoning paths using tree search."""
def evaluate_path(path):
"""Evaluate how promising a reasoning path is."""
prompt = f"""Evaluate this reasoning path for solving: {problem}
Path: {' -> '.join(path)}
Score (0-1, where 1 is most promising):"""
score = float(llm.generate(prompt))
return score
def expand_path(path):
"""Generate next steps for a path."""
prompt = f"""Given this problem: {problem}
Current reasoning: {' -> '.join(path)}
Generate {num_branches} possible next steps:"""
response = llm.generate(prompt)
next_steps = parse_steps(response)
return [path + [step] for step in next_steps]
# BFS with pruning
current_paths = [[]]
for depth in range(max_depth):
all_candidates = []
for path in current_paths:
all_candidates.extend(expand_path(path))
# Evaluate and keep top paths
scored = [(path, evaluate_path(path)) for path in all_candidates]
scored.sort(key=lambda x: x[1], reverse=True)
current_paths = [path for path, score in scored[:num_branches]]
# Return best path
best_path = max(current_paths, key=evaluate_path)
return best_path
Goal-Conditioned Planning
GOFAI-Style Planning
class GoalConditionedPlanner:
def __init__(self, llm, tools, state_tracker):
self.llm = llm
self.tools = tools
self.state = state_tracker
def plan(self, goal, current_state):
"""Generate a plan to reach the goal from current state."""
prompt = f"""Current state: {current_state}
Goal: {goal}
Available actions: {list(self.tools.keys())}
Generate a plan (list of actions) to reach the goal:"""
response = self.llm.generate(prompt)
plan = parse_plan(response)
return plan
def replan(self, goal, current_state, executed_actions, failure_reason):
"""Replan after a failure."""
prompt = f"""Previous plan failed.
Current state: {current_state}
Goal: {goal}
Executed actions: {executed_actions}
Failure reason: {failure_reason}
Generate a new plan considering the failure:"""
response = self.llm.generate(prompt)
new_plan = parse_plan(response)
return new_plan
Adaptive Planning
Plan-Execute-Observe Loop
def plan_execute_observe(goal, llm, tools, max_iterations=5):
"""Main agent loop with planning, execution, and observation."""
plan = None
observations = []
for i in range(max_iterations):
# Plan
if plan is None or needs_replan(observations):
plan = generate_plan(goal, observations, llm, tools)
# Execute next step
current_step = plan["steps"][0]
try:
result = execute_step(current_step, tools)
observations.append({"step": current_step, "result": result, "success": True})
plan["steps"] = plan["steps"][1:]
# Check if goal achieved
if goal_achieved(goal, observations):
return synthesize_answer(goal, observations, llm)
except Exception as e:
observations.append({"step": current_step, "error": str(e), "success": False})
plan = None # Trigger replanning
return "Could not achieve goal within iteration limit"
Practice Exercises
-
Chain-of-Thought Planning: Implement a CoT planner for a research task. How does the plan quality compare to direct execution?
-
ReAct Implementation: Build a ReAct agent with web search and calculator tools. Test on questions requiring multiple reasoning steps.
-
Tree-of-Thought: Implement ToT for a puzzle-solving task (e.g., 24 game). How many branches and depth are needed for good performance?
-
Adaptive Planning: Test your planner on tasks where initial plans fail. How quickly does the agent recover with replanning?
Key Takeaways
Summary: Planning and Reasoning in Agents
- Task decomposition breaks complex goals into manageable steps
- Chain-of-Thought planning generates linear step-by-step plans
- HTN planning hierarchically decomposes tasks to primitive actions
- ReAct interleaves reasoning with action for grounded decisions
- Tree-of-Thought explores multiple reasoning paths in parallel
- Goal-conditioned planning generates plans to reach specific goals
- Adaptive planning replans when initial plans fail
- Plan-execute-observe is the core agent loop
What to Learn Next
-> Chain-of-Thought Reasoning Teaching models to reason step by step.
-> LLM Agent Frameworks Building autonomous agents with LLMs.
-> Tool Use and Function Calling Teaching LLMs to use external tools.
-> Memory Systems for Agents Long-term and short-term memory for agents.
-> Reinforcement Learning Training agents through reward signals.
-> Multi-Agent Systems Coordinating multiple agents.