LLM Capstone

LLM Capstone Project — Building Production-Ready AI Applications

This capstone project guides you through building an end-to-end LLM application, from conception to deployment. You'll apply everything learned throughout the course to create a real-world system.

Project Design — Requirements gathering and architecture
Implementation — Building the complete system
Deployment — Production deployment and monitoring

The best way to learn is by doing.

LLM Capstone Project

This capstone project challenges you to build a complete LLM application that solves a real-world problem. You'll go through the full software development lifecycle, from requirements gathering to production deployment.

DfCapstone Project

A capstone project integrates knowledge and skills from throughout the curriculum to solve a complex, real-world problem. It demonstrates mastery of LLM concepts, engineering best practices, and system design.

Project Selection

Choosing a Project

Select a project that:

Solves a real problem: Addresses a genuine need
Is achievable: Can be completed in the time frame
Demonstrates skills: Showcases LLM knowledge
Has data available: Has accessible training/evaluation data
Can be deployed: Can run in a production environment

Project Ideas

Category	Project	Complexity
Education	Adaptive tutoring system	High
Healthcare	Medical document summarizer	High
Finance	Financial news analyzer	Medium
Legal	Contract analysis tool	High
Customer Service	Multi-channel support bot	Medium
Content	Automated content pipeline	Medium
Research	Literature review assistant	Medium
Productivity	Meeting summarizer	Low-Medium

Project Selection Matrix

Score projects on:

Impact (1-5): How many people benefit?
Feasibility (1-5): How achievable is it?
Learning (1-5): How much will you learn?
Interest (1-5): How interested are you?

Select the project with the highest total score.

Project Design Phase

Requirements Gathering

DfRequirements Gathering

Requirements gathering is the process of understanding what the system should do, who will use it, and what constraints exist. For LLM projects, this includes functional requirements, performance requirements, and ethical considerations.

Requirements template:

## Functional Requirements
- [ ] Core feature 1: Description
- [ ] Core feature 2: Description
- [ ] User interaction model: Chat/API/Batch
- [ ] Output format: Text/JSON/Structured

## Non-Functional Requirements
- [ ] Latency: < X seconds
- [ ] Throughput: X requests/minute
- [ ] Accuracy: X% on evaluation set
- [ ] Availability: X% uptime

## Constraints
- [ ] Budget: $X
- [ ] Timeline: X weeks
- [ ] Team size: X people
- [ ] Technology stack: Specific tools/frameworks

## Ethical Considerations
- [ ] Bias mitigation strategies
- [ ] Privacy requirements
- [ ] Safety measures
- [ ] Transparency needs

Architecture Design

DfLLM System Architecture

LLM system architecture defines the components, their interactions, data flows, and technology choices for an LLM application.

Typical architecture components:

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                     User Interface                          │
├─────────────────────────────────────────────────────────────┤
│                     API Gateway                             │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │   Prompt     │  │   Retrieval │  │   Post-     │         │
│  │   Engine     │  │   System    │  │   Processor │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
├─────────────────────────────────────────────────────────────┤
│                     LLM Engine                              │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │   Model      │  │   Cache     │  │   Monitor   │         │
│  │   Server     │  │   System    │  │   System    │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
└─────────────────────────────────────────────────────────────┘

Technology Selection

Technology Selection Score

S = \\sum_{i=1}^{n} w_i \\cdot s_i

Here,

$w_i$ =Weight for criterion i
$s_i$ =Score for technology on criterion i
$n$ =Number of criteria

Criteria:

Performance: Latency, throughput
Scalability: Ability to handle growth
Cost: Infrastructure and operational costs
Community: Support and ecosystem
Team expertise: Existing knowledge

Implementation Phase

Project Structure

Architecture Diagram

llm-capstone/
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── models/
│   │   ├── __init__.py
│   │   └── llm.py
│   ├── services/
│   │   ├── __init__.py
│   │   ├── retrieval.py
│   │   └── generation.py
│   ├── api/
│   │   ├── __init__.py
│   │   └── endpoints.py
│   └── utils/
│       ├── __init__.py
│       └── helpers.py
├── tests/
│   ├── unit/
│   ├── integration/
│   └── evaluation/
├── data/
│   ├── raw/
│   ├── processed/
│   └── evaluation/
├── notebooks/
├── docs/
├── docker/
├── requirements.txt
├── Dockerfile
└── README.md

Configuration Management

from pydantic import BaseModel
from typing import Optional
from enum import Enum

class Environment(str, Enum):
    DEVELOPMENT = "development"
    STAGING = "staging"
    PRODUCTION = "production"

class LLMConfig(BaseModel):
    model_name: str = "meta-llama/Llama-3-8B-Instruct"
    temperature: float = 0.7
    max_tokens: int = 512
    top_p: float = 0.9
    
class RAGConfig(BaseModel):
    chunk_size: int = 1000
    chunk_overlap: int = 200
    retrieval_top_k: int = 5
    
class AppConfig(BaseModel):
    environment: Environment = Environment.DEVELOPMENT
    llm: LLMConfig = LLMConfig()
    rag: RAGConfig = RAGConfig()
    debug: bool = False

Model Implementation

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from typing import List, Dict, Any

class LLMEngine:
    def __init__(self, config: LLMConfig):
        self.config = config
        self.tokenizer = AutoTokenizer.from_pretrained(config.model_name)
        self.model = AutoModelForCausalLM.from_pretrained(
            config.model_name,
            torch_dtype=torch.float16,
            device_map="auto"
        )
    
    def generate(self, prompt: str, **kwargs) -> str:
        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
        
        generation_kwargs = {
            "max_new_tokens": self.config.max_tokens,
            "temperature": self.config.temperature,
            "top_p": self.config.top_p,
            "do_sample": self.config.temperature > 0
        }
        generation_kwargs.update(kwargs)
        
        outputs = self.model.generate(**inputs, **generation_kwargs)
        response = self.tokenizer.decode(
            outputs[0][inputs.shape[-1]:], 
            skip_special_tokens=True
        )
        return response
    
    def generate_with_history(
        self, 
        messages: List[Dict[str, str]], 
        **kwargs
    ) -> str:
        prompt = self.tokenizer.apply_chat_template(
            messages, 
            tokenize=False, 
            add_generation_prompt=True
        )
        return self.generate(prompt, **kwargs)

Retrieval System

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from typing import List, Dict, Any

class RetrievalSystem:
    def __init__(self, config: RAGConfig):
        self.config = config
        self.embeddings = HuggingFaceEmbeddings(
            model_name="sentence-transformers/all-MiniLM-L6-v2"
        )
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=config.chunk_size,
            chunk_overlap=config.chunk_overlap
        )
        self.vectorstore = None
    
    def index_documents(self, documents: List[Dict[str, Any]]):
        texts = [doc["content"] for doc in documents]
        metadatas = [doc["metadata"] for doc in documents]
        
        splits = self.text_splitter.create_documents(texts, metadatas)
        self.vectorstore = Chroma.from_documents(
            splits, 
            self.embeddings
        )
    
    def retrieve(self, query: str) -> List[Dict[str, Any]]:
        if self.vectorstore is None:
            return []
        
        results = self.vectorstore.similarity_search_with_score(
            query, 
            k=self.config.retrieval_top_k
        )
        
        return [
            {
                "content": doc.page_content,
                "metadata": doc.metadata,
                "score": score
            }
            for doc, score in results
        ]

API Implementation

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import uvicorn

app = FastAPI(title="LLM Application API")

class GenerateRequest(BaseModel):
    prompt: str
    max_tokens: Optional[int] = 512
    temperature: Optional[float] = 0.7

class GenerateResponse(BaseModel):
    text: str
    tokens_used: int
    latency_ms: float

@app.post("/generate", response_model=GenerateResponse)
async def generate_text(request: GenerateRequest):
    try:
        start_time = time.time()
        
        # Generate response
        response = llm_engine.generate(
            request.prompt,
            max_new_tokens=request.max_tokens,
            temperature=request.temperature
        )
        
        latency = (time.time() - start_time) * 1000
        
        return GenerateResponse(
            text=response,
            tokens_used=len(tokenizer.encode(response)),
            latency_ms=latency
        )
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/rag/generate")
async def rag_generate(request: GenerateRequest):
    # Retrieve relevant documents
    documents = retrieval_system.retrieve(request.prompt)
    
    # Construct prompt with context
    context = "\n\n".join([doc["content"] for doc in documents])
    prompt = f"""Context: {context}

Question: {request.prompt}

Answer:"""
    
    # Generate response
    response = llm_engine.generate(prompt)
    
    return {
        "answer": response,
        "sources": documents
    }

Evaluation Phase

Evaluation Framework

DfProject Evaluation

Project evaluation systematically assesses the system against requirements using quantitative metrics, qualitative assessment, and user testing.

Evaluation dimensions:

Functional: Does it do what it should?
Performance: Does it meet latency/throughput requirements?
Quality: Are outputs accurate and useful?
Usability: Is it easy to use?
Robustness: Does it handle edge cases?

Evaluation Metrics

class ProjectEvaluator:
    def __init__(self, test_cases: List[Dict]):
        self.test_cases = test_cases
        self.results = []
    
    def evaluate_functional(self, system):
        correct = 0
        for test_case in self.test_cases:
            output = system.process(test_case["input"])
            if self.check_output(output, test_case["expected"]):
                correct += 1
        
        return correct / len(self.test_cases)
    
    def evaluate_performance(self, system, num_requests=100):
        latencies = []
        for _ in range(num_requests):
            start = time.time()
            system.process("Test input")
            latencies.append(time.time() - start)
        
        return {
            "avg_latency": sum(latencies) / len(latencies),
            "p95_latency": sorted(latencies)[int(0.95 * len(latencies))],
            "p99_latency": sorted(latencies)[int(0.99 * len(latencies))]
        }
    
    def evaluate_quality(self, system, evaluator_llm):
        scores = []
        for test_case in self.test_cases:
            output = system.process(test_case["input"])
            score = evaluator_llm.evaluate(
                test_case["input"],
                output,
                test_case.get("criteria", [])
            )
            scores.append(score)
        
        return sum(scores) / len(scores)

User Testing

DfUser Testing

User testing involves real users interacting with the system to gather feedback on usability, satisfaction, and real-world effectiveness.

User testing protocol:

Recruit users: Find representative users
Define tasks: Create realistic usage scenarios
Observe usage: Watch how users interact
Gather feedback: Collect qualitative and quantitative feedback
Iterate: Improve based on findings

Deployment Phase

Deployment Architecture

DfProduction Deployment

Production deployment involves making the system available to users with appropriate infrastructure, monitoring, and operational procedures.

Deployment options:

Cloud: AWS, GCP, Azure
On-premises: Self-hosted infrastructure
Edge: Local deployment
Hybrid: Combination of cloud and edge

Containerization

# Dockerfile
FROM python:3.10-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ ./src/
COPY models/ ./models/

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "src.api.endpoints:app", "--host", "0.0.0.0", "--port", "8000"]

Monitoring and Observability

DfLLM Observability

LLM observability includes logging, monitoring, and tracing to understand system behavior, detect issues, and optimize performance.

Key metrics:

Latency: Response time percentiles
Throughput: Requests per second
Error rate: Failed requests percentage
Quality: Output quality scores
Cost: Token usage and cost per request

Continuous Improvement

Continuous Improvement Cycle

P_{t+1} = P_t + \\alpha \\cdot \\text{Feedback}_t

Here,

$P_t$ =Performance at time t
$\alpha$ =Learning rate
$\text{Feedback}_t$ =User feedback at time t

Improvement cycle:

Monitor: Collect metrics and feedback
Analyze: Identify issues and opportunities
Plan: Prioritize improvements
Implement: Make changes
Verify: Test and validate changes

Project Documentation

README Template

# LLM Capstone Project: [Project Name]

## Overview
Brief description of the project and its goals.

## Architecture
High-level architecture diagram and component descriptions.

## Setup
Instructions for setting up the development environment.

## Usage
Examples of how to use the system.

## Evaluation
Results of evaluation against requirements.

## Deployment
Instructions for deploying to production.

## Future Work
Potential improvements and extensions.

Model Card

Model Card for Capstone Project

Model Card:

Model: Llama-3-8B-Instruct
Task: [Your specific task]
Training data: [Dataset description]
Evaluation results: [Metrics]
Limitations: [Known limitations]
Ethical considerations: [Bias, safety]

Best Practices

Project Management

Agile methodology: Use sprints and iterative development
Version control: Use Git for code and documentation
Regular reviews: Demo progress regularly
Risk management: Identify and mitigate risks early

Code Quality

Code review: Review all code before merging
Testing: Maintain high test coverage
Documentation: Document code and decisions
Refactoring: Continuously improve code quality

Learning

Reflection: Reflect on what you learned
Knowledge sharing: Share findings with others
Portfolio: Document the project for your portfolio
Presentation: Prepare a presentation of your work

Start with a minimal viable product (MVP) and iterate. It's better to have a working simple system than an incomplete complex one.

Practice Exercises

Project Planning: Create a detailed project plan for your capstone project, including milestones, risks, and mitigation strategies.
Architecture Design: Design the system architecture for your project, including component diagrams and data flows.
Implementation Sprint: Implement the core functionality of your project in a time-boxed sprint.
Evaluation: Conduct a comprehensive evaluation of your system against requirements.

Key Takeaways:

Capstone projects integrate all course knowledge into a real-world application
Start with clear requirements and architecture design
Implement iteratively, testing at each stage
Deploy with proper monitoring and observability
Document everything for portfolio and future reference

What to Learn Next

-> LLM Research Paper Guide Key papers, reading guides, and research methodology for LLMs.

-> LLM Glossary Comprehensive glossary of LLM terms and concepts.

-> LLM Tool Ecosystem Overview of HuggingFace, LangChain, LlamaIndex, and other tools.

-> LLM Best Practices Best practices for common LLM tasks and applications.

-> LLM Roadmap Learning roadmap, skill progression, and career paths in LLMs.

-> Back to LLM Overview Return to the beginning of the LLM course.

LLM Capstone Project

LLM Capstone Project — Building Production-Ready AI Applications

LLM Capstone Project

DfCapstone Project

Project Selection

Choosing a Project

Project Ideas

Project Selection Matrix

Project Design Phase

Requirements Gathering

DfRequirements Gathering

Architecture Design

DfLLM System Architecture

Technology Selection

Technology Selection Score

Implementation Phase

Project Structure

Configuration Management

Model Implementation

Retrieval System

API Implementation

Evaluation Phase

Evaluation Framework

DfProject Evaluation

Evaluation Metrics

User Testing

DfUser Testing

Deployment Phase

Deployment Architecture

DfProduction Deployment

Containerization

Monitoring and Observability

DfLLM Observability

Continuous Improvement

Continuous Improvement Cycle

Project Documentation

README Template

Model Card

Model Card for Capstone Project

Best Practices

Project Management

Code Quality

Learning

Practice Exercises

What to Learn Next

Need Expert LLM Help?