LLM Agents

Planning and Reasoning in Agents — Thinking Before Acting

Effective agents do not just react — they plan. Planning enables agents to break down complex goals into manageable steps, anticipate obstacles, and adapt their strategy when things go wrong.

Task Decomposition — Break complex goals into actionable steps
ReAct — Interleave reasoning and action for grounded decisions
Tree-of-Thought — Explore multiple reasoning paths in parallel

Failing to plan is planning to fail.

Planning and Reasoning in Agents

Planning is the ability to formulate a sequence of actions to achieve a goal. For LLM agents, planning involves decomposing tasks, reasoning about dependencies, and selecting the best action at each step.

DfAgent Planning

Agent planning is the process of generating a sequence of actions to achieve a goal, considering available tools, environmental constraints, and potential obstacles. Planning enables agents to handle complex, multi-step tasks that require strategic thinking.

Task Decomposition

Chain-of-Thought Planning

DfChain-of-Thought Planning

Chain-of-Thought planning generates a step-by-step plan by reasoning through the problem sequentially. Each step builds on the previous one, creating a linear execution path.

def chain_of_thought_plan(goal, llm, tools):
    """Generate a step-by-step plan using chain-of-thought reasoning."""
    prompt = f"""Create a detailed plan to achieve this goal:

Goal: {goal}

Available tools: {', '.join(tools.keys())}

Think step by step:
1. What information do I need?
2. What actions should I take?
3. What is the order of operations?

Plan:"""
    
    response = llm.generate(prompt)
    steps = parse_plan(response)
    return steps

Hierarchical Task Network

DfHierarchical Task Network (HTN)

HTN planning decomposes tasks hierarchically. High-level tasks are broken into subtasks, which are further decomposed until reaching primitive actions that can be executed directly.

def htn_decompose(task, llm, max_depth=3):
    """Hierarchically decompose a task."""
    if max_depth == 0 or is_primitive(task):
        return [task]
    
    prompt = f"""Break this task into subtasks:

Task: {task}

Subtasks (one per line):"""
    
    response = llm.generate(prompt)
    subtasks = [s.strip() for s in response.split("\n") if s.strip()]
    
    # Recursively decompose each subtask
    all_steps = []
    for subtask in subtasks:
        all_steps.extend(htn_decompose(subtask, llm, max_depth - 1))
    
    return all_steps

ReAct: Reasoning + Acting

The ReAct Loop

DfReAct

ReAct (Yao et al., 2023) interleaves reasoning and action. The agent generates a thought (reasoning about what to do next), takes an action (calls a tool), observes the result, and repeats. This grounds reasoning in actual observations.

Architecture Diagram

Thought: I need to find the population of France.
Action: search("population of France 2024")
Observation: France has a population of approximately 68 million.
Thought: Now I need to find the GDP.
Action: search("GDP of France 2024")
Observation: France's GDP is approximately $3.0 trillion.
Thought: I now have both pieces of information.
Action: finish("France has 68 million people and $3.0T GDP.")

def react_agent(goal, llm, tools, max_steps=10):
    """ReAct agent that interleaves reasoning and action."""
    steps = []
    context = f"Goal: {goal}\n\n"
    
    for i in range(max_steps):
        # Generate thought and action
        prompt = context + "Thought:"
        thought = llm.generate(prompt)
        
        prompt += f" {thought}\nAction:"
        action = llm.generate(prompt)
        
        # Execute action
        if action.startswith("finish("):
            answer = action[8:-2]
            return answer, steps
        
        # Parse and execute tool call
        tool_name, tool_input = parse_action(action)
        observation = tools[tool_name].execute(tool_input)
        
        # Update context
        context += f"Thought: {thought}\nAction: {action}\nObservation: {observation}\n"
        steps.append({"thought": thought, "action": action, "observation": observation})
    
    return "Max steps reached", steps

ReAct outperforms both pure reasoning and pure action-only approaches because reasoning without action leads to hallucinations, while action without reasoning leads to suboptimal decisions.

Tree-of-Thought

DfTree-of-Thought

Tree-of-Thought (Yao et al., 2023) explores multiple reasoning paths simultaneously, evaluating each path and pruning unpromising branches. This enables more thorough exploration of the solution space.

def tree_of_thought(problem, llm, num_branches=3, max_depth=4):
    """Explore multiple reasoning paths using tree search."""
    
    def evaluate_path(path):
        """Evaluate how promising a reasoning path is."""
        prompt = f"""Evaluate this reasoning path for solving: {problem}

Path: {' -> '.join(path)}

Score (0-1, where 1 is most promising):"""
        score = float(llm.generate(prompt))
        return score
    
    def expand_path(path):
        """Generate next steps for a path."""
        prompt = f"""Given this problem: {problem}
Current reasoning: {' -> '.join(path)}

Generate {num_branches} possible next steps:"""
        response = llm.generate(prompt)
        next_steps = parse_steps(response)
        return [path + [step] for step in next_steps]
    
    # BFS with pruning
    current_paths = [[]]
    
    for depth in range(max_depth):
        all_candidates = []
        for path in current_paths:
            all_candidates.extend(expand_path(path))
        
        # Evaluate and keep top paths
        scored = [(path, evaluate_path(path)) for path in all_candidates]
        scored.sort(key=lambda x: x[1], reverse=True)
        current_paths = [path for path, score in scored[:num_branches]]
    
    # Return best path
    best_path = max(current_paths, key=evaluate_path)
    return best_path

Goal-Conditioned Planning

GOFAI-Style Planning

class GoalConditionedPlanner:
    def __init__(self, llm, tools, state_tracker):
        self.llm = llm
        self.tools = tools
        self.state = state_tracker
    
    def plan(self, goal, current_state):
        """Generate a plan to reach the goal from current state."""
        prompt = f"""Current state: {current_state}
Goal: {goal}

Available actions: {list(self.tools.keys())}

Generate a plan (list of actions) to reach the goal:"""
        
        response = self.llm.generate(prompt)
        plan = parse_plan(response)
        return plan
    
    def replan(self, goal, current_state, executed_actions, failure_reason):
        """Replan after a failure."""
        prompt = f"""Previous plan failed.
Current state: {current_state}
Goal: {goal}
Executed actions: {executed_actions}
Failure reason: {failure_reason}

Generate a new plan considering the failure:"""
        
        response = self.llm.generate(prompt)
        new_plan = parse_plan(response)
        return new_plan

Adaptive Planning

Plan-Execute-Observe Loop

def plan_execute_observe(goal, llm, tools, max_iterations=5):
    """Main agent loop with planning, execution, and observation."""
    plan = None
    observations = []
    
    for i in range(max_iterations):
        # Plan
        if plan is None or needs_replan(observations):
            plan = generate_plan(goal, observations, llm, tools)
        
        # Execute next step
        current_step = plan["steps"][0]
        try:
            result = execute_step(current_step, tools)
            observations.append({"step": current_step, "result": result, "success": True})
            plan["steps"] = plan["steps"][1:]
            
            # Check if goal achieved
            if goal_achieved(goal, observations):
                return synthesize_answer(goal, observations, llm)
        except Exception as e:
            observations.append({"step": current_step, "error": str(e), "success": False})
            plan = None  # Trigger replanning
    
    return "Could not achieve goal within iteration limit"

Practice Exercises

Chain-of-Thought Planning: Implement a CoT planner for a research task. How does the plan quality compare to direct execution?
ReAct Implementation: Build a ReAct agent with web search and calculator tools. Test on questions requiring multiple reasoning steps.
Tree-of-Thought: Implement ToT for a puzzle-solving task (e.g., 24 game). How many branches and depth are needed for good performance?
Adaptive Planning: Test your planner on tasks where initial plans fail. How quickly does the agent recover with replanning?

Key Takeaways

Summary: Planning and Reasoning in Agents

Task decomposition breaks complex goals into manageable steps
Chain-of-Thought planning generates linear step-by-step plans
HTN planning hierarchically decomposes tasks to primitive actions
ReAct interleaves reasoning with action for grounded decisions
Tree-of-Thought explores multiple reasoning paths in parallel
Goal-conditioned planning generates plans to reach specific goals
Adaptive planning replans when initial plans fail
Plan-execute-observe is the core agent loop

What to Learn Next

-> Chain-of-Thought Reasoning Teaching models to reason step by step.

-> LLM Agent Frameworks Building autonomous agents with LLMs.

-> Tool Use and Function Calling Teaching LLMs to use external tools.

-> Memory Systems for Agents Long-term and short-term memory for agents.

-> Reinforcement Learning Training agents through reward signals.

-> Multi-Agent Systems Coordinating multiple agents.

Planning and Reasoning in Agents

Planning and Reasoning in Agents — Thinking Before Acting

Planning and Reasoning in Agents

DfAgent Planning

Task Decomposition

Chain-of-Thought Planning

DfChain-of-Thought Planning

Hierarchical Task Network

DfHierarchical Task Network (HTN)

ReAct: Reasoning + Acting

The ReAct Loop

DfReAct

Tree-of-Thought

DfTree-of-Thought

Goal-Conditioned Planning

GOFAI-Style Planning

Adaptive Planning

Plan-Execute-Observe Loop

Practice Exercises

Key Takeaways

Summary: Planning and Reasoning in Agents

What to Learn Next

Need Expert LLM Help?