LLM Usage

Chain-of-Thought Reasoning — Making LLMs Think Step by Step

Chain-of-thought prompting enables LLMs to solve complex problems by decomposing them into intermediate reasoning steps. This guide covers CoT variants, self-consistency, tree-of-thought methods, and practical applications.

Zero-Shot CoT — "Let's think step by step" unlocks reasoning capabilities
Self-Consistency — Sample multiple reasoning paths and majority-vote the answer
Tree-of-Thought — Explore branching reasoning strategies for harder problems

Thinking step by step is not just for humans anymore.

Chain-of-Thought Reasoning

Chain-of-thought (CoT) prompting enables LLMs to solve complex problems by decomposing them into intermediate reasoning steps. This tutorial covers CoT variants, theoretical foundations, and practical applications.

CoT Variants

Zero-Shot CoT

Simply append "Let's think step by step" to the prompt:

prompt = """Q: A juggeler can juggle 16 balls. Half are golf balls, 
and half the golf balls are blue. How many blue golf balls?
A: Let's think step by step.
1. Total balls = 16
2. Golf balls = 16 / 2 = 8
3. Blue golf balls = 8 / 2 = 4
The answer is 4."""

Few-Shot CoT

Provide reasoning examples in the prompt:

prompt = """Q: Roger has 5 tennis balls. He buys 2 more cans of 3. How many does he have now?
A: Roger started with 5 balls. 2 cans of 3 is 6 balls. 5 + 6 = 11. The answer is 11.

Q: The school cafeteria ordered 42 apples for the lunches. They used 6 for Monday's lunches. 
Then they bought 4 more cases of 3 apples each. How many apples do they have now?
A: The cafeteria started with 42 apples. They used 6, leaving 42 - 6 = 36. 
They bought 4 cases of 3 = 12 apples. 36 + 12 = 48. The answer is 48.

Q: {question}
A: Let's think step by step."""

Self-Consistency

Generate multiple reasoning paths and select the most common answer:

Tree-of-Thought (ToT)

When CoT Helps

Tasks Where CoT Excels

Task Type	Example	CoT Improvement
Arithmetic	Multi-step math	+30-40%
Logic	Syllogisms	+20-30%
Commonsense	Physical reasoning	+15-25%
Symbolic	Variable tracking	+25-35%
Multi-hop QA	Reading comprehension	+10-20%

Tasks Where CoT Hurts

Simple factual recall: CoT adds unnecessary complexity
Classification: CoT can lead to overthinking
Creative writing: CoT constrains creativity
Translation: CoT is not helpful for direct mapping

Mathematical Foundation

CoT as Search

CoT can be viewed as a search problem in the space of reasoning sequences:

Self-Consistency as Ensemble

Self-consistency can be viewed as an ensemble of different reasoning strategies:

Implementation

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

def generate_cot(question, n_samples=5, temperature=0.7):
    prompt = f"Q: {question}\nA: Let's think step by step.\n"
    answers = []
    
    for _ in range(n_samples):
        inputs = tokenizer(prompt, return_tensors="pt")
        output = model.generate(
            **inputs, max_new_tokens=512,
            temperature=temperature, do_sample=True,
        )
        response = tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
        # Extract final answer
        if "The answer is" in response:
            answer = response.split("The answer is")[-1].strip().split(".")[0]
            answers.append(answer)
    
    # Majority vote
    from collections import Counter
    counts = Counter(answers)
    return counts.most_common(1)[0][0], dict(counts)

question = "If a train travels at 60 mph for 2.5 hours, then at 80 mph for 1.5 hours, what is the total distance?"
answer, distribution = generate_cot(question)
print(f"Answer: {answer}")
print(f"Distribution: {distribution}")

Practice Exercises

Empirical: Compare zero-shot, few-shot, and CoT prompting on 10 arithmetic problems. Measure accuracy for each.
Self-Consistency: Implement self-consistency with 3, 5, 10, and 20 samples. At what point does accuracy plateau?
ToT: Implement a simple tree-of-thought system for a planning problem. Compare with standard CoT.
Analysis: Identify 5 problems where CoT helps and 5 where it hurts. What pattern emerges?

Advanced CoT Methods

Auto-CoT

Auto-CoT automatically generates chain-of-thought demonstrations by clustering questions and selecting representative examples from each cluster. This eliminates the need for manual CoT example crafting.

Program-of-Thought

Instead of natural language reasoning, generate executable code that solves the problem. The code is executed to produce the answer. This combines the reasoning capabilities of LLMs with the precision of program execution.

Graph-of-Thought

Extends tree-of-thought by allowing reasoning paths to merge and form a graph structure. This enables sharing of intermediate results across different reasoning branches, improving efficiency and solution quality.

Evaluating CoT Quality

When evaluating CoT, assess both the reasoning process and the final answer. Key evaluation criteria include:

Logical coherence of the reasoning steps
Faithfulness to the provided context or problem
Completeness of the reasoning chain
Accuracy of the final answer
Conciseness of the reasoning process

What to Learn Next

-> In-Context Learning Teaching LLMs new tasks without training—purely through prompts.

-> Prompt Engineering Getting the most out of language models through effective input design.

-> RAG System Design Building production-ready retrieval systems for grounded generation.

-> Retrieval-Augmented Generation Combining LLMs with external knowledge for accurate, cited answers.

-> LLM Agent Frameworks Building autonomous agents that reason, plan, and act.

-> Building Production LLM Apps From prototype to production: deploying LLMs at scale.

Chain-of-Thought Reasoning

Chain-of-Thought Reasoning — Making LLMs Think Step by Step

Chain-of-Thought Reasoning

CoT Variants

Zero-Shot CoT

Few-Shot CoT

Self-Consistency

Tree-of-Thought (ToT)

When CoT Helps

Tasks Where CoT Excels

Tasks Where CoT Hurts

Mathematical Foundation

CoT as Search

Self-Consistency as Ensemble

Implementation

Practice Exercises

Advanced CoT Methods

Auto-CoT

Program-of-Thought

Graph-of-Thought

Evaluating CoT Quality

What to Learn Next

Need Expert LLM Help?