LoRA and PEFT
What is LoRA?
Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method that injects trainable rank decomposition matrices into transformer layers, dramatically reducing the number of parameters to train.
How LoRA Works
import torch
import torch.nn as nn
class LoRALayer(nn.Module):
def __init__(self, in_features, out_features, rank=8, alpha=16):
super().__init__()
self.rank = rank
self.alpha = alpha
# Original frozen weight
self.W = nn.Linear(in_features, out_features, bias=False)
self.W.weight.requires_grad = False
# LoRA trainable matrices
self.A = nn.Linear(in_features, rank, bias=False)
self.B = nn.Linear(rank, out_features, bias=False)
# Scaling factor
self.scaling = alpha / rank
def forward(self, x):
# Original output + LoRA output
return self.W(x) + self.B(self.A(x)) * self.scaling
# Usage
layer = LoRALayer(768, 768, rank=8)
print(f"Original params: {768 * 768:,}")
print(f"LoRA params: {768 * 8 * 2:,}")
print(f"Reduction: {(1 - (768*8*2)/(768*768))*100:.1f}%")
Using PEFT Library
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM
def setup_lora(model_name, rank=8, alpha=16):
model = AutoModelForCausalLM.from_pretrained(model_name)
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=rank,
lora_alpha=alpha,
lora_dropout=0.1,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"]
)
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()
return peft_model
# Example usage
model = setup_lora("meta-llama/Llama-2-7b-hf", rank=16)
# Output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.0622
LoRA Variants
| Variant | Description |
|---|---|
| QLoRA | Quantized base model + LoRA |
| AdaLoRA | Adaptive rank allocation |
| DoRA | Weight-decomposed LoRA |
| LoRA+ | Different learning rates for A and B |
Training with LoRA
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir="./lora-output",
num_train_epochs=3,
per_device_train_batch_size=4,
learning_rate=2e-4,
fp16=True,
save_steps=500,
logging_steps=100,
)
trainer = Trainer(
model=peft_model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
trainer.train()
Summary
LoRA and PEFT enable fine-tuning large models with minimal resources, making AI customization accessible to more developers.
Next: We'll explore model quantization techniques.