LLM Usage

In-Context Learning — Teaching LLMs New Tasks Without Training

In-context learning is one of the most remarkable emergent capabilities of LLMs—the ability to learn new tasks from examples provided in the prompt. This guide explores what ICL is, why it works, and how to use it effectively.

No Training Required — Adapt models to new tasks purely through prompts
Bayesian Inference — ICL implicitly performs posterior inference over task hypotheses
Production Deployment — Balance example count, latency, and cost for real-world use

The most elegant learning happens without changing a single weight.

In-Context Learning

In-context learning (ICL) is one of the most remarkable emergent capabilities of LLMs. This tutorial explores what ICL is, why it works, and how to use it effectively.

How ICL Works

The Bayesian Inference Hypothesis

The model effectively performs Bayesian inference over task hypotheses, using the in-context examples to update its beliefs about which task is being performed.

The Task Vector Hypothesis

The Grokking Hypothesis

Impact of Example Ordering

The order of in-context examples significantly affects performance:

Ordering Strategies

Random ordering: Baseline, moderate performance
Similarity-based: Most similar examples last (best average)
Label-balanced: Equal representation of each class
Difficulty-based: Easy examples first, hard examples last

Impact of Example Selection

Selection Strategies

Random: No selection bias, but may include irrelevant examples
Top-k retrieval: Select k most similar examples from a pool
Diverse selection: Balance similarity with diversity
Label-aware: Ensure balanced label distribution in selected examples

ICL vs Fine-tuning

Aspect	ICL	Fine-tuning
Data required	2-32 examples	100-10,000+ examples
Compute	Forward pass only	Gradient updates
Task switching	Change prompt	Retrain model
Performance	80-95% of fine-tuning	100% (baseline)
Latency	Higher (longer prompts)	Lower (shorter prompts)
Knowledge access	Full pre-trained knowledge	May forget pre-trained knowledge

Practical ICL Implementation

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

# Few-shot examples
examples = [
    ("This movie was fantastic!", "Positive"),
    ("Terrible waste of time.", "Negative"),
    ("The food was okay.", "Neutral"),
]

def build_icl_prompt(test_input, examples):
    prompt = "Classify the sentiment of each review.\n\n"
    for text, label in examples:
        prompt += f"Review: \"{text}\" -> {label}\n"
    prompt += f"Review: \"{test_input}\" -> "
    return prompt

test = "Absolutely loved every minute of it!"
prompt = build_icl_prompt(test, examples)

inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=10, temperature=0.0)
response = tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(f"Prediction: {response}")  # Expected: Positive

Practice Exercises

Empirical: Test ICL on a 3-class classification task with 1, 2, 4, 8, and 16 examples. Plot accuracy vs number of examples.
Ordering: Compare random, similarity-based, and reverse ordering of examples. Which is most robust?
Selection: Implement a retrieval-based example selector using sentence embeddings. Compare with random selection.
Theory: Explain why ICL works for decoder-only models but not for encoder-only models like BERT.

Theoretical Foundations

ICL as Gradient Descent

Recent research suggests that transformer attention mechanisms can implicitly perform gradient descent during the forward pass. The attention computation effectively computes a linear regression over the in-context examples, updating the model internal representations without explicit parameter updates.

Mechanistic Interpretability of ICL

Mechanistic interpretability studies have identified specific circuits responsible for ICL. The induction head circuit, composed of two attention heads, learns to perform pattern completion by copying from previous contexts. This circuit emerges during training and is essential for few-shot learning.

Limitations of ICL

Context window limits: ICL is constrained by the maximum sequence length. Long contexts increase latency and cost.
Distribution shift: ICL performance degrades when test examples differ significantly from the pre-training distribution.
Instability: Small changes in example ordering or selection can cause large performance swings.
Limited complexity: ICL struggles with tasks requiring deep reasoning or memorization of complex patterns.

Advanced ICL Techniques

Retrieval-Augmented ICL

Instead of selecting examples randomly, use a retrieval system to find the most relevant in-context examples for each query. This combines the benefits of RAG with ICL, improving accuracy on diverse inputs.

Learned Prompting

Rather than selecting natural examples, learn continuous prompt embeddings that maximize task performance. This is the basis of prefix tuning and prompt tuning methods, which bridge the gap between ICL and fine-tuning.

Task-Aware ICL

Analyze the task structure and design ICL prompts that explicitly communicate the task type, input format, and output format. This reduces ambiguity and improves consistency across diverse inputs.

ICL in Production

When deploying ICL in production, consider:

Latency: Longer prompts mean slower inference. Balance example count with speed requirements.
Cost: API calls are priced by token count. Fewer, more relevant examples reduce cost.
Consistency: Use deterministic example selection for reproducible outputs.
Fallback: Have a fine-tuned model as fallback when ICL performance is insufficient.

What to Learn Next

-> Chain-of-Thought Reasoning Making LLMs think step by step for complex reasoning problems.

-> Prompt Engineering Getting the most out of language models through effective input design.

-> RAG System Design Building production-ready retrieval systems for grounded generation.

-> Retrieval-Augmented Generation Combining LLMs with external knowledge for accurate, cited answers.

-> LLM Agent Frameworks Building autonomous agents that reason, plan, and act.

-> Building Production LLM Apps From prototype to production: deploying LLMs at scale.

In-Context Learning

In-Context Learning — Teaching LLMs New Tasks Without Training

In-Context Learning

How ICL Works

The Bayesian Inference Hypothesis

The Task Vector Hypothesis

The Grokking Hypothesis

Impact of Example Ordering

Ordering Strategies

Impact of Example Selection

Selection Strategies

ICL vs Fine-tuning

Practical ICL Implementation

Practice Exercises

Theoretical Foundations

ICL as Gradient Descent

Mechanistic Interpretability of ICL

Limitations of ICL

Advanced ICL Techniques

Retrieval-Augmented ICL

Learned Prompting

Task-Aware ICL

ICL in Production

What to Learn Next

Need Expert LLM Help?