LoRA and PEFT

What is LoRA?

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method that injects trainable rank decomposition matrices into transformer layers, dramatically reducing the number of parameters to train.

How LoRA Works

import torch
import torch.nn as nn

class LoRALayer(nn.Module):
    def __init__(self, in_features, out_features, rank=8, alpha=16):
        super().__init__()
        self.rank = rank
        self.alpha = alpha

        # Original frozen weight
        self.W = nn.Linear(in_features, out_features, bias=False)
        self.W.weight.requires_grad = False

        # LoRA trainable matrices
        self.A = nn.Linear(in_features, rank, bias=False)
        self.B = nn.Linear(rank, out_features, bias=False)

        # Scaling factor
        self.scaling = alpha / rank

    def forward(self, x):
        # Original output + LoRA output
        return self.W(x) + self.B(self.A(x)) * self.scaling

# Usage
layer = LoRALayer(768, 768, rank=8)
print(f"Original params: {768 * 768:,}")
print(f"LoRA params: {768 * 8 * 2:,}")
print(f"Reduction: {(1 - (768*8*2)/(768*768))*100:.1f}%")

Using PEFT Library

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM

def setup_lora(model_name, rank=8, alpha=16):
    model = AutoModelForCausalLM.from_pretrained(model_name)

    lora_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=rank,
        lora_alpha=alpha,
        lora_dropout=0.1,
        target_modules=["q_proj", "v_proj", "k_proj", "o_proj"]
    )

    peft_model = get_peft_model(model, lora_config)
    peft_model.print_trainable_parameters()

    return peft_model

# Example usage
model = setup_lora("meta-llama/Llama-2-7b-hf", rank=16)
# Output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.0622

LoRA Variants

Variant	Description
QLoRA	Quantized base model + LoRA
AdaLoRA	Adaptive rank allocation
DoRA	Weight-decomposed LoRA
LoRA+	Different learning rates for A and B

Training with LoRA

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./lora-output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    learning_rate=2e-4,
    fp16=True,
    save_steps=500,
    logging_steps=100,
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

Summary

LoRA and PEFT enable fine-tuning large models with minimal resources, making AI customization accessible to more developers.

Next: We'll explore model quantization techniques.

LoRA and PEFT

LoRA and PEFT

What is LoRA?

How LoRA Works

Using PEFT Library

LoRA Variants

Training with LoRA

Summary

Premium Content

Need Expert Generative AI Help?