Instruction Tuning
Instruction tuning is the process of fine-tuning a language model on (instruction, response) pairs to improve its ability to follow human instructions. It bridges the gap between pre-training on raw text and aligning with user intent.
What is Instruction Tuning?
Fine-tuning a pre-trained language model on datasets of structured instructions paired with desired responses, teaching the model to generalize to unseen instructions and follow them accurately.
The key insight: pre-training teaches the model to predict the next token, but instruction tuning teaches the model to understand and execute instructions.
Mathematical Formulation
Instruction Tuning Loss
Here,
- =
- =
- =
- =
- =
Unlike pre-training, instruction tuning computes loss only on the response tokens, not the instruction tokens. The instruction provides context but is not trained to be predicted.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
def compute_instruction_loss(model, batch, tokenizer):
"""Compute loss only on response tokens."""
input_ids = batch["input_ids"]
attention_mask = batch["attention_mask"]
labels = batch["labels"]
# Find where response starts (after instruction)
# Labels are -100 for instruction tokens
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask,
labels=labels
)
return outputs.loss
def prepare_instruction_data(
instruction: str,
response: str,
tokenizer,
max_length: int = 512
) -> dict:
"""Prepare instruction-response pair with proper masking."""
# Format: <s> Instruction: {instruction} \n Response: {response} </s>
full_text = f"Instruction: {instruction}\nResponse: {response}"
instruction_text = f"Instruction: {instruction}\nResponse:"
full_encoded = tokenizer(full_text, truncation=True, max_length=max_length)
instruction_encoded = tokenizer(instruction_text, truncation=True, max_length=max_length)
# Create labels: mask instruction tokens with -100
labels = full_encoded["input_ids"].copy()
instruction_length = len(instruction_encoded["input_ids"])
labels[:instruction_length] = [-100] * instruction_length
full_encoded["labels"] = labels
return full_encoded
FLAN (Fine-tuned LAnguage Net)
FLAN demonstrated the power of multi-task instruction tuning across hundreds of tasks:
FLAN (Wei et al., 2022) fine-tuned PaLM on 62 NLP datasets converted to instruction format. The key finding: instruction tuning on diverse tasks improves zero-shot performance on unseen tasks.
FLAN Task Formats
| Task Type | Format Example |
|---|---|
| Classification | "Is the following sentence positive or negative? {text}" |
| Summarization | "Summarize the following article: {article}" |
| QA | "Based on the context, answer: {question}\nContext: {context}" |
| Translation | "Translate the following from English to French: {text}" |
flan_task_templates = {
"sentiment": "Is the following review positive or negative? Review: {input}\nAnswer:",
"summarization": "Summarize the following text in one sentence: {input}\nSummary:",
"question_answering": "Based on the following passage, answer the question.\nPassage: {passage}\nQuestion: {input}\nAnswer:",
"translation": "Translate the following sentence to French: {input}\nTranslation:",
"classification": "Classify the following text into one of these categories: {labels}\nText: {input}\nCategory:",
"reasoning": "Answer the following question step by step.\nQuestion: {input}\nSolution:"
}
InstructGPT
InstructGPT (Ouyang et al., 2022) established the three-stage training paradigm:
Stage 1: Supervised Fine-Tuning (SFT)
Train on human-written demonstrations:
def train_sft_stage(model, dataset, tokenizer, epochs=3):
"""Stage 1: Supervised Fine-Tuning on human demonstrations."""
training_args = TrainingArguments(
output_dir="./sft_model",
num_train_epochs=epochs,
per_device_train_batch_size=8,
learning_rate=2e-5,
weight_decay=0.01,
warmup_ratio=0.1,
fp16=True
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
return trainer.model
Stage 2: Reward Model Training
Train reward model on human preferences:
class RewardModelTrainer:
def __init__(self, reward_model, tokenizer):
self.model = reward_model
self.tokenizer = tokenizer
def compute_reward_loss(self, chosen, rejected):
"""Train reward model to rank chosen > rejected."""
chosen_rewards = self.model(chosen).logits
rejected_rewards = self.model(rejected).logits
# Pairwise ranking loss
loss = -torch.log(torch.sigmoid(chosen_rewards - rejected_rewards))
return loss.mean()
Stage 3: PPO Optimization
Optimize policy against reward model with KL penalty:
InstructGPT PPO Objective
Here,
- =
- =
- =
- =
Dataset Construction
Self-Instruct
Self-Instruct generates instruction-following data using the model itself:
class SelfInstructGenerator:
def __init__(self, model, tokenizer, seed_tasks: List[str]):
self.model = model
self.tokenizer = tokenizer
self.seed_tasks = seed_tasks
def generate_instructions(self, num_instructions: int = 1000) -> List[Dict]:
"""Generate instruction-following data."""
generated = []
for _ in range(num_instructions):
# Sample seed tasks for context
context = random.sample(self.seed_tasks, k=3)
prompt = f"""Here are some example instructions:
{chr(10).join(f'- {t}' for t in context)}
Generate a new, different instruction that is similar in complexity:"""
instruction = self._generate(prompt)
# Generate response
response_prompt = f"Instruction: {instruction}\nResponse:"
response = self._generate(response_prompt, max_tokens=256)
if self._is_valid(instruction, response):
generated.append({
"instruction": instruction,
"response": response
})
return generated
def _is_valid(self, instruction: str, response: str) -> bool:
"""Validate generated data."""
# Check length
if len(instruction) < 10 or len(response) < 20:
return False
# Check for repetition
if instruction == response:
return False
# Check for coherence
if response.count(".") < 1:
return False
return True
Evol-Instruct
Evol-Instruct progressively makes instructions more complex:
Evol-Instruct (Xu et al., 2023) creates diverse, high-quality instruction data by iteratively "evolving" seed instructions through deepening (adding constraints), widening (adding input context), concretizing, and increasing reasoning steps.
class EvolInstructGenerator:
def __init__(self, model, tokenizer):
self.model = model
self.tokenizer = tokenizer
self.evolution_prompts = {
"deepen": "Add more constraints or requirements to this instruction: {instruction}",
"widen": "Add more context or background information to this instruction: {instruction}",
"concretize": "Make this instruction more specific and concrete: {instruction}",
"reasoning": "Modify this instruction to require multi-step reasoning: {instruction}"
}
def evolve(self, instruction: str, evolution_type: str) -> str:
"""Evolve instruction using specified type."""
prompt = self.evolution_prompts[evolution_type].format(instruction=instruction)
evolved = self._generate(prompt)
return evolved
def generate_dataset(
self,
seed_instructions: List[str],
num_evolutions: int = 3,
evolution_factor: int = 2
) -> List[Dict]:
"""Generate evolved instruction dataset."""
dataset = []
for seed in seed_instructions:
current = [seed]
for _ in range(num_evolutions):
next_level = []
for instr in current:
for evo_type in self.evolution_prompts.keys():
evolved = self.evolve(instr, evo_type)
next_level.append(evolved)
dataset.append({
"instruction": evolved,
"evolution_type": evo_type,
"level": _
})
current = next_level[:evolution_factor]
return dataset
Multi-Task vs Single-Task Tuning
Multi-Task Instruction Tuning
multi_task_dataset = {
"classification": load_dataset("glue", "sst2"),
"summarization": load_dataset("cnn_dailymail", "3.0.0"),
"qa": load_dataset("squad"),
"translation": load_dataset("wmt14", "fr-en"),
"code": load_dataset("code_x_glue_tc_nl_code_search_adv")
}
# Combine all tasks with instruction format
def format_multi_task_dataset(datasets, task_formats):
combined = []
for task_name, dataset in datasets.items():
format_fn = task_formats[task_name]
for example in dataset["train"]:
instruction, response = format_fn(example)
combined.append({
"instruction": instruction,
"response": response,
"task": task_name
})
return combined
Single-Task Instruction Tuning
Focus on one domain for specialized performance:
single_task_configs = {
"medical": {
"dataset": "medical_dialog",
"instruction_format": "Based on the patient's symptoms, provide a diagnosis: {symptoms}",
"response_format": "Diagnosis: {diagnosis}\nReasoning: {reasoning}"
},
"legal": {
"dataset": "legal_contract_qa",
"instruction_format": "Analyze the following legal clause: {clause}\nQuestion: {question}",
"response_format": "Analysis: {analysis}\nAnswer: {answer}"
},
"code": {
"dataset": "code_instruct",
"instruction_format": "Write code to: {task_description}\nLanguage: {language}",
"response_format": "```{language}\n{code}\n```\nExplanation: {explanation}"
}
}
Multi-task instruction tuning improves generalization across tasks, while single-task tuning excels in domain-specific applications. Choose based on your use case: multi-task for general assistants, single-task for specialized domains.
Training Best Practices
instruction_training_config = {
"learning_rate": 2e-5,
"batch_size": 32,
"epochs": 3,
"warmup_ratio": 0.03,
"weight_decay": 0.1,
"max_seq_length": 512,
"gradient_accumulation_steps": 4,
"fp16": True,
"logging_steps": 100,
"save_strategy": "epoch",
"evaluation_strategy": "epoch"
}
Common Pitfalls
| Pitfall | Symptom | Solution |
|---|---|---|
| Overfitting | Train loss β, eval loss β | Early stopping, dropout |
| Underfitting | Both losses plateau high | Increase epochs, reduce regularization |
| Catastrophic forgetting | Good on target, bad on general | Lower learning rate, mix general data |
| Data contamination | High eval accuracy, poor real perf | Use held-out test sets |
Summary
- Instruction tuning trains LLMs on (instruction, response) pairs to follow human intent
- Loss is computed only on response tokens, not instruction tokens
- FLAN showed multi-task instruction tuning improves zero-shot generalization
- InstructGPT established the SFT β Reward Model β PPO training paradigm
- Self-Instruct generates data using the model itself
- Evol-Instruct progressively increases instruction complexity
- Multi-task tuning improves generalization; single-task tuning excels in domains
Practice Exercises
-
Instruction Dataset: Create an instruction dataset with 100 examples covering 5 different task types.
-
SFT Training: Fine-tune a small language model (e.g., GPT-2) on instruction data. Compare before and after on held-out instructions.
-
Self-Instruct: Implement Self-Instruct to generate 500 instruction-response pairs. Evaluate quality.
-
Evol-Instruct: Use Evol-Instruct to create a dataset with increasing complexity levels. Analyze difficulty distribution.
-
Multi-Task Evaluation: Train on 3 tasks and evaluate zero-shot on 5 unseen tasks. Compare with single-task training.
Previous: 20 - LLM Agent Frameworks β | Next: 22 - LLM Safety & Red Teaming β