CW

LLM Capstone Project

CapstoneEnd-to-End ProjectFree Lesson

Advertisement

LLM Capstone

LLM Capstone Project — Building Production-Ready AI Applications

This capstone project guides you through building an end-to-end LLM application, from conception to deployment. You'll apply everything learned throughout the course to create a real-world system.

  • Project Design — Requirements gathering and architecture
  • Implementation — Building the complete system
  • Deployment — Production deployment and monitoring

The best way to learn is by doing.

LLM Capstone Project

This capstone project challenges you to build a complete LLM application that solves a real-world problem. You'll go through the full software development lifecycle, from requirements gathering to production deployment.

DfCapstone Project

A capstone project integrates knowledge and skills from throughout the curriculum to solve a complex, real-world problem. It demonstrates mastery of LLM concepts, engineering best practices, and system design.

Project Selection

Choosing a Project

Select a project that:

  1. Solves a real problem: Addresses a genuine need
  2. Is achievable: Can be completed in the time frame
  3. Demonstrates skills: Showcases LLM knowledge
  4. Has data available: Has accessible training/evaluation data
  5. Can be deployed: Can run in a production environment

Project Ideas

CategoryProjectComplexity
EducationAdaptive tutoring systemHigh
HealthcareMedical document summarizerHigh
FinanceFinancial news analyzerMedium
LegalContract analysis toolHigh
Customer ServiceMulti-channel support botMedium
ContentAutomated content pipelineMedium
ResearchLiterature review assistantMedium
ProductivityMeeting summarizerLow-Medium

Project Selection Matrix

Score projects on:

  • Impact (1-5): How many people benefit?
  • Feasibility (1-5): How achievable is it?
  • Learning (1-5): How much will you learn?
  • Interest (1-5): How interested are you?

Select the project with the highest total score.

Project Design Phase

Requirements Gathering

DfRequirements Gathering

Requirements gathering is the process of understanding what the system should do, who will use it, and what constraints exist. For LLM projects, this includes functional requirements, performance requirements, and ethical considerations.

Requirements template:

## Functional Requirements
- [ ] Core feature 1: Description
- [ ] Core feature 2: Description
- [ ] User interaction model: Chat/API/Batch
- [ ] Output format: Text/JSON/Structured

## Non-Functional Requirements
- [ ] Latency: < X seconds
- [ ] Throughput: X requests/minute
- [ ] Accuracy: X% on evaluation set
- [ ] Availability: X% uptime

## Constraints
- [ ] Budget: $X
- [ ] Timeline: X weeks
- [ ] Team size: X people
- [ ] Technology stack: Specific tools/frameworks

## Ethical Considerations
- [ ] Bias mitigation strategies
- [ ] Privacy requirements
- [ ] Safety measures
- [ ] Transparency needs

Architecture Design

DfLLM System Architecture

LLM system architecture defines the components, their interactions, data flows, and technology choices for an LLM application.

Typical architecture components:

Architecture Diagram
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│                     User Interface                          │
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
│                     API Gateway                             │
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
│  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”         │
│  │   Prompt     │  │   Retrieval │  │   Post-     │         │
│  │   Engine     │  │   System    │  │   Processor │         │
│  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜         │
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
│                     LLM Engine                              │
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
│  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”         │
│  │   Model      │  │   Cache     │  │   Monitor   │         │
│  │   Server     │  │   System    │  │   System    │         │
│  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜         │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Technology Selection

Technology Selection Score

S=sumi=1nwicdotsiS = \\sum_{i=1}^{n} w_i \\cdot s_i

Here,

  • wiw_i=Weight for criterion i
  • sis_i=Score for technology on criterion i
  • nn=Number of criteria

Criteria:

  1. Performance: Latency, throughput
  2. Scalability: Ability to handle growth
  3. Cost: Infrastructure and operational costs
  4. Community: Support and ecosystem
  5. Team expertise: Existing knowledge

Implementation Phase

Project Structure

Architecture Diagram
llm-capstone/
ā”œā”€ā”€ src/
│   ā”œā”€ā”€ __init__.py
│   ā”œā”€ā”€ config.py
│   ā”œā”€ā”€ models/
│   │   ā”œā”€ā”€ __init__.py
│   │   └── llm.py
│   ā”œā”€ā”€ services/
│   │   ā”œā”€ā”€ __init__.py
│   │   ā”œā”€ā”€ retrieval.py
│   │   └── generation.py
│   ā”œā”€ā”€ api/
│   │   ā”œā”€ā”€ __init__.py
│   │   └── endpoints.py
│   └── utils/
│       ā”œā”€ā”€ __init__.py
│       └── helpers.py
ā”œā”€ā”€ tests/
│   ā”œā”€ā”€ unit/
│   ā”œā”€ā”€ integration/
│   └── evaluation/
ā”œā”€ā”€ data/
│   ā”œā”€ā”€ raw/
│   ā”œā”€ā”€ processed/
│   └── evaluation/
ā”œā”€ā”€ notebooks/
ā”œā”€ā”€ docs/
ā”œā”€ā”€ docker/
ā”œā”€ā”€ requirements.txt
ā”œā”€ā”€ Dockerfile
└── README.md

Configuration Management

from pydantic import BaseModel
from typing import Optional
from enum import Enum

class Environment(str, Enum):
    DEVELOPMENT = "development"
    STAGING = "staging"
    PRODUCTION = "production"

class LLMConfig(BaseModel):
    model_name: str = "meta-llama/Llama-3-8B-Instruct"
    temperature: float = 0.7
    max_tokens: int = 512
    top_p: float = 0.9
    
class RAGConfig(BaseModel):
    chunk_size: int = 1000
    chunk_overlap: int = 200
    retrieval_top_k: int = 5
    
class AppConfig(BaseModel):
    environment: Environment = Environment.DEVELOPMENT
    llm: LLMConfig = LLMConfig()
    rag: RAGConfig = RAGConfig()
    debug: bool = False

Model Implementation

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from typing import List, Dict, Any

class LLMEngine:
    def __init__(self, config: LLMConfig):
        self.config = config
        self.tokenizer = AutoTokenizer.from_pretrained(config.model_name)
        self.model = AutoModelForCausalLM.from_pretrained(
            config.model_name,
            torch_dtype=torch.float16,
            device_map="auto"
        )
    
    def generate(self, prompt: str, **kwargs) -> str:
        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
        
        generation_kwargs = {
            "max_new_tokens": self.config.max_tokens,
            "temperature": self.config.temperature,
            "top_p": self.config.top_p,
            "do_sample": self.config.temperature > 0
        }
        generation_kwargs.update(kwargs)
        
        outputs = self.model.generate(**inputs, **generation_kwargs)
        response = self.tokenizer.decode(
            outputs[0][inputs.shape[-1]:], 
            skip_special_tokens=True
        )
        return response
    
    def generate_with_history(
        self, 
        messages: List[Dict[str, str]], 
        **kwargs
    ) -> str:
        prompt = self.tokenizer.apply_chat_template(
            messages, 
            tokenize=False, 
            add_generation_prompt=True
        )
        return self.generate(prompt, **kwargs)

Retrieval System

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from typing import List, Dict, Any

class RetrievalSystem:
    def __init__(self, config: RAGConfig):
        self.config = config
        self.embeddings = HuggingFaceEmbeddings(
            model_name="sentence-transformers/all-MiniLM-L6-v2"
        )
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=config.chunk_size,
            chunk_overlap=config.chunk_overlap
        )
        self.vectorstore = None
    
    def index_documents(self, documents: List[Dict[str, Any]]):
        texts = [doc["content"] for doc in documents]
        metadatas = [doc["metadata"] for doc in documents]
        
        splits = self.text_splitter.create_documents(texts, metadatas)
        self.vectorstore = Chroma.from_documents(
            splits, 
            self.embeddings
        )
    
    def retrieve(self, query: str) -> List[Dict[str, Any]]:
        if self.vectorstore is None:
            return []
        
        results = self.vectorstore.similarity_search_with_score(
            query, 
            k=self.config.retrieval_top_k
        )
        
        return [
            {
                "content": doc.page_content,
                "metadata": doc.metadata,
                "score": score
            }
            for doc, score in results
        ]

API Implementation

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import uvicorn

app = FastAPI(title="LLM Application API")

class GenerateRequest(BaseModel):
    prompt: str
    max_tokens: Optional[int] = 512
    temperature: Optional[float] = 0.7

class GenerateResponse(BaseModel):
    text: str
    tokens_used: int
    latency_ms: float

@app.post("/generate", response_model=GenerateResponse)
async def generate_text(request: GenerateRequest):
    try:
        start_time = time.time()
        
        # Generate response
        response = llm_engine.generate(
            request.prompt,
            max_new_tokens=request.max_tokens,
            temperature=request.temperature
        )
        
        latency = (time.time() - start_time) * 1000
        
        return GenerateResponse(
            text=response,
            tokens_used=len(tokenizer.encode(response)),
            latency_ms=latency
        )
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/rag/generate")
async def rag_generate(request: GenerateRequest):
    # Retrieve relevant documents
    documents = retrieval_system.retrieve(request.prompt)
    
    # Construct prompt with context
    context = "\n\n".join([doc["content"] for doc in documents])
    prompt = f"""Context: {context}

Question: {request.prompt}

Answer:"""
    
    # Generate response
    response = llm_engine.generate(prompt)
    
    return {
        "answer": response,
        "sources": documents
    }

Evaluation Phase

Evaluation Framework

DfProject Evaluation

Project evaluation systematically assesses the system against requirements using quantitative metrics, qualitative assessment, and user testing.

Evaluation dimensions:

  1. Functional: Does it do what it should?
  2. Performance: Does it meet latency/throughput requirements?
  3. Quality: Are outputs accurate and useful?
  4. Usability: Is it easy to use?
  5. Robustness: Does it handle edge cases?

Evaluation Metrics

class ProjectEvaluator:
    def __init__(self, test_cases: List[Dict]):
        self.test_cases = test_cases
        self.results = []
    
    def evaluate_functional(self, system):
        correct = 0
        for test_case in self.test_cases:
            output = system.process(test_case["input"])
            if self.check_output(output, test_case["expected"]):
                correct += 1
        
        return correct / len(self.test_cases)
    
    def evaluate_performance(self, system, num_requests=100):
        latencies = []
        for _ in range(num_requests):
            start = time.time()
            system.process("Test input")
            latencies.append(time.time() - start)
        
        return {
            "avg_latency": sum(latencies) / len(latencies),
            "p95_latency": sorted(latencies)[int(0.95 * len(latencies))],
            "p99_latency": sorted(latencies)[int(0.99 * len(latencies))]
        }
    
    def evaluate_quality(self, system, evaluator_llm):
        scores = []
        for test_case in self.test_cases:
            output = system.process(test_case["input"])
            score = evaluator_llm.evaluate(
                test_case["input"],
                output,
                test_case.get("criteria", [])
            )
            scores.append(score)
        
        return sum(scores) / len(scores)

User Testing

DfUser Testing

User testing involves real users interacting with the system to gather feedback on usability, satisfaction, and real-world effectiveness.

User testing protocol:

  1. Recruit users: Find representative users
  2. Define tasks: Create realistic usage scenarios
  3. Observe usage: Watch how users interact
  4. Gather feedback: Collect qualitative and quantitative feedback
  5. Iterate: Improve based on findings

Deployment Phase

Deployment Architecture

DfProduction Deployment

Production deployment involves making the system available to users with appropriate infrastructure, monitoring, and operational procedures.

Deployment options:

  1. Cloud: AWS, GCP, Azure
  2. On-premises: Self-hosted infrastructure
  3. Edge: Local deployment
  4. Hybrid: Combination of cloud and edge

Containerization

# Dockerfile
FROM python:3.10-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ ./src/
COPY models/ ./models/

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "src.api.endpoints:app", "--host", "0.0.0.0", "--port", "8000"]

Monitoring and Observability

DfLLM Observability

LLM observability includes logging, monitoring, and tracing to understand system behavior, detect issues, and optimize performance.

Key metrics:

  1. Latency: Response time percentiles
  2. Throughput: Requests per second
  3. Error rate: Failed requests percentage
  4. Quality: Output quality scores
  5. Cost: Token usage and cost per request

Continuous Improvement

Continuous Improvement Cycle

Pt+1=Pt+alphacdottextFeedbacktP_{t+1} = P_t + \\alpha \\cdot \\text{Feedback}_t

Here,

  • PtP_t=Performance at time t
  • α\alpha=Learning rate
  • Feedbackt\text{Feedback}_t=User feedback at time t

Improvement cycle:

  1. Monitor: Collect metrics and feedback
  2. Analyze: Identify issues and opportunities
  3. Plan: Prioritize improvements
  4. Implement: Make changes
  5. Verify: Test and validate changes

Project Documentation

README Template

# LLM Capstone Project: [Project Name]

## Overview
Brief description of the project and its goals.

## Architecture
High-level architecture diagram and component descriptions.

## Setup
Instructions for setting up the development environment.

## Usage
Examples of how to use the system.

## Evaluation
Results of evaluation against requirements.

## Deployment
Instructions for deploying to production.

## Future Work
Potential improvements and extensions.

Model Card

Model Card for Capstone Project

Model Card:

  • Model: Llama-3-8B-Instruct
  • Task: [Your specific task]
  • Training data: [Dataset description]
  • Evaluation results: [Metrics]
  • Limitations: [Known limitations]
  • Ethical considerations: [Bias, safety]

Best Practices

Project Management

  1. Agile methodology: Use sprints and iterative development
  2. Version control: Use Git for code and documentation
  3. Regular reviews: Demo progress regularly
  4. Risk management: Identify and mitigate risks early

Code Quality

  1. Code review: Review all code before merging
  2. Testing: Maintain high test coverage
  3. Documentation: Document code and decisions
  4. Refactoring: Continuously improve code quality

Learning

  1. Reflection: Reflect on what you learned
  2. Knowledge sharing: Share findings with others
  3. Portfolio: Document the project for your portfolio
  4. Presentation: Prepare a presentation of your work

Start with a minimal viable product (MVP) and iterate. It's better to have a working simple system than an incomplete complex one.

Practice Exercises

  1. Project Planning: Create a detailed project plan for your capstone project, including milestones, risks, and mitigation strategies.

  2. Architecture Design: Design the system architecture for your project, including component diagrams and data flows.

  3. Implementation Sprint: Implement the core functionality of your project in a time-boxed sprint.

  4. Evaluation: Conduct a comprehensive evaluation of your system against requirements.

Key Takeaways:

  • Capstone projects integrate all course knowledge into a real-world application
  • Start with clear requirements and architecture design
  • Implement iteratively, testing at each stage
  • Deploy with proper monitoring and observability
  • Document everything for portfolio and future reference

What to Learn Next

-> LLM Research Paper Guide Key papers, reading guides, and research methodology for LLMs.

-> LLM Glossary Comprehensive glossary of LLM terms and concepts.

-> LLM Tool Ecosystem Overview of HuggingFace, LangChain, LlamaIndex, and other tools.

-> LLM Best Practices Best practices for common LLM tasks and applications.

-> LLM Roadmap Learning roadmap, skill progression, and career paths in LLMs.

-> Back to LLM Overview Return to the beginning of the LLM course.

Advertisement

Need Expert LLM Help?

Get personalized tutoring, RAG system design, or production LLM consulting.

Advertisement