LLM Capstone
LLM Capstone Project ā Building Production-Ready AI Applications
This capstone project guides you through building an end-to-end LLM application, from conception to deployment. You'll apply everything learned throughout the course to create a real-world system.
- Project Design ā Requirements gathering and architecture
- Implementation ā Building the complete system
- Deployment ā Production deployment and monitoring
The best way to learn is by doing.
LLM Capstone Project
This capstone project challenges you to build a complete LLM application that solves a real-world problem. You'll go through the full software development lifecycle, from requirements gathering to production deployment.
DfCapstone Project
A capstone project integrates knowledge and skills from throughout the curriculum to solve a complex, real-world problem. It demonstrates mastery of LLM concepts, engineering best practices, and system design.
Project Selection
Choosing a Project
Select a project that:
- Solves a real problem: Addresses a genuine need
- Is achievable: Can be completed in the time frame
- Demonstrates skills: Showcases LLM knowledge
- Has data available: Has accessible training/evaluation data
- Can be deployed: Can run in a production environment
Project Ideas
| Category | Project | Complexity |
|---|---|---|
| Education | Adaptive tutoring system | High |
| Healthcare | Medical document summarizer | High |
| Finance | Financial news analyzer | Medium |
| Legal | Contract analysis tool | High |
| Customer Service | Multi-channel support bot | Medium |
| Content | Automated content pipeline | Medium |
| Research | Literature review assistant | Medium |
| Productivity | Meeting summarizer | Low-Medium |
Project Selection Matrix
Score projects on:
- Impact (1-5): How many people benefit?
- Feasibility (1-5): How achievable is it?
- Learning (1-5): How much will you learn?
- Interest (1-5): How interested are you?
Select the project with the highest total score.
Project Design Phase
Requirements Gathering
DfRequirements Gathering
Requirements gathering is the process of understanding what the system should do, who will use it, and what constraints exist. For LLM projects, this includes functional requirements, performance requirements, and ethical considerations.
Requirements template:
## Functional Requirements
- [ ] Core feature 1: Description
- [ ] Core feature 2: Description
- [ ] User interaction model: Chat/API/Batch
- [ ] Output format: Text/JSON/Structured
## Non-Functional Requirements
- [ ] Latency: < X seconds
- [ ] Throughput: X requests/minute
- [ ] Accuracy: X% on evaluation set
- [ ] Availability: X% uptime
## Constraints
- [ ] Budget: $X
- [ ] Timeline: X weeks
- [ ] Team size: X people
- [ ] Technology stack: Specific tools/frameworks
## Ethical Considerations
- [ ] Bias mitigation strategies
- [ ] Privacy requirements
- [ ] Safety measures
- [ ] Transparency needs
Architecture Design
DfLLM System Architecture
LLM system architecture defines the components, their interactions, data flows, and technology choices for an LLM application.
Typical architecture components:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā User Interface ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā API Gateway ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
ā ā Prompt ā ā Retrieval ā ā Post- ā ā
ā ā Engine ā ā System ā ā Processor ā ā
ā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā LLM Engine ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
ā ā Model ā ā Cache ā ā Monitor ā ā
ā ā Server ā ā System ā ā System ā ā
ā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Technology Selection
Technology Selection Score
Here,
- =Weight for criterion i
- =Score for technology on criterion i
- =Number of criteria
Criteria:
- Performance: Latency, throughput
- Scalability: Ability to handle growth
- Cost: Infrastructure and operational costs
- Community: Support and ecosystem
- Team expertise: Existing knowledge
Implementation Phase
Project Structure
llm-capstone/
āāā src/
ā āāā __init__.py
ā āāā config.py
ā āāā models/
ā ā āāā __init__.py
ā ā āāā llm.py
ā āāā services/
ā ā āāā __init__.py
ā ā āāā retrieval.py
ā ā āāā generation.py
ā āāā api/
ā ā āāā __init__.py
ā ā āāā endpoints.py
ā āāā utils/
ā āāā __init__.py
ā āāā helpers.py
āāā tests/
ā āāā unit/
ā āāā integration/
ā āāā evaluation/
āāā data/
ā āāā raw/
ā āāā processed/
ā āāā evaluation/
āāā notebooks/
āāā docs/
āāā docker/
āāā requirements.txt
āāā Dockerfile
āāā README.md
Configuration Management
from pydantic import BaseModel
from typing import Optional
from enum import Enum
class Environment(str, Enum):
DEVELOPMENT = "development"
STAGING = "staging"
PRODUCTION = "production"
class LLMConfig(BaseModel):
model_name: str = "meta-llama/Llama-3-8B-Instruct"
temperature: float = 0.7
max_tokens: int = 512
top_p: float = 0.9
class RAGConfig(BaseModel):
chunk_size: int = 1000
chunk_overlap: int = 200
retrieval_top_k: int = 5
class AppConfig(BaseModel):
environment: Environment = Environment.DEVELOPMENT
llm: LLMConfig = LLMConfig()
rag: RAGConfig = RAGConfig()
debug: bool = False
Model Implementation
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from typing import List, Dict, Any
class LLMEngine:
def __init__(self, config: LLMConfig):
self.config = config
self.tokenizer = AutoTokenizer.from_pretrained(config.model_name)
self.model = AutoModelForCausalLM.from_pretrained(
config.model_name,
torch_dtype=torch.float16,
device_map="auto"
)
def generate(self, prompt: str, **kwargs) -> str:
inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
generation_kwargs = {
"max_new_tokens": self.config.max_tokens,
"temperature": self.config.temperature,
"top_p": self.config.top_p,
"do_sample": self.config.temperature > 0
}
generation_kwargs.update(kwargs)
outputs = self.model.generate(**inputs, **generation_kwargs)
response = self.tokenizer.decode(
outputs[0][inputs.shape[-1]:],
skip_special_tokens=True
)
return response
def generate_with_history(
self,
messages: List[Dict[str, str]],
**kwargs
) -> str:
prompt = self.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return self.generate(prompt, **kwargs)
Retrieval System
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from typing import List, Dict, Any
class RetrievalSystem:
def __init__(self, config: RAGConfig):
self.config = config
self.embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=config.chunk_size,
chunk_overlap=config.chunk_overlap
)
self.vectorstore = None
def index_documents(self, documents: List[Dict[str, Any]]):
texts = [doc["content"] for doc in documents]
metadatas = [doc["metadata"] for doc in documents]
splits = self.text_splitter.create_documents(texts, metadatas)
self.vectorstore = Chroma.from_documents(
splits,
self.embeddings
)
def retrieve(self, query: str) -> List[Dict[str, Any]]:
if self.vectorstore is None:
return []
results = self.vectorstore.similarity_search_with_score(
query,
k=self.config.retrieval_top_k
)
return [
{
"content": doc.page_content,
"metadata": doc.metadata,
"score": score
}
for doc, score in results
]
API Implementation
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import uvicorn
app = FastAPI(title="LLM Application API")
class GenerateRequest(BaseModel):
prompt: str
max_tokens: Optional[int] = 512
temperature: Optional[float] = 0.7
class GenerateResponse(BaseModel):
text: str
tokens_used: int
latency_ms: float
@app.post("/generate", response_model=GenerateResponse)
async def generate_text(request: GenerateRequest):
try:
start_time = time.time()
# Generate response
response = llm_engine.generate(
request.prompt,
max_new_tokens=request.max_tokens,
temperature=request.temperature
)
latency = (time.time() - start_time) * 1000
return GenerateResponse(
text=response,
tokens_used=len(tokenizer.encode(response)),
latency_ms=latency
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/rag/generate")
async def rag_generate(request: GenerateRequest):
# Retrieve relevant documents
documents = retrieval_system.retrieve(request.prompt)
# Construct prompt with context
context = "\n\n".join([doc["content"] for doc in documents])
prompt = f"""Context: {context}
Question: {request.prompt}
Answer:"""
# Generate response
response = llm_engine.generate(prompt)
return {
"answer": response,
"sources": documents
}
Evaluation Phase
Evaluation Framework
DfProject Evaluation
Project evaluation systematically assesses the system against requirements using quantitative metrics, qualitative assessment, and user testing.
Evaluation dimensions:
- Functional: Does it do what it should?
- Performance: Does it meet latency/throughput requirements?
- Quality: Are outputs accurate and useful?
- Usability: Is it easy to use?
- Robustness: Does it handle edge cases?
Evaluation Metrics
class ProjectEvaluator:
def __init__(self, test_cases: List[Dict]):
self.test_cases = test_cases
self.results = []
def evaluate_functional(self, system):
correct = 0
for test_case in self.test_cases:
output = system.process(test_case["input"])
if self.check_output(output, test_case["expected"]):
correct += 1
return correct / len(self.test_cases)
def evaluate_performance(self, system, num_requests=100):
latencies = []
for _ in range(num_requests):
start = time.time()
system.process("Test input")
latencies.append(time.time() - start)
return {
"avg_latency": sum(latencies) / len(latencies),
"p95_latency": sorted(latencies)[int(0.95 * len(latencies))],
"p99_latency": sorted(latencies)[int(0.99 * len(latencies))]
}
def evaluate_quality(self, system, evaluator_llm):
scores = []
for test_case in self.test_cases:
output = system.process(test_case["input"])
score = evaluator_llm.evaluate(
test_case["input"],
output,
test_case.get("criteria", [])
)
scores.append(score)
return sum(scores) / len(scores)
User Testing
DfUser Testing
User testing involves real users interacting with the system to gather feedback on usability, satisfaction, and real-world effectiveness.
User testing protocol:
- Recruit users: Find representative users
- Define tasks: Create realistic usage scenarios
- Observe usage: Watch how users interact
- Gather feedback: Collect qualitative and quantitative feedback
- Iterate: Improve based on findings
Deployment Phase
Deployment Architecture
DfProduction Deployment
Production deployment involves making the system available to users with appropriate infrastructure, monitoring, and operational procedures.
Deployment options:
- Cloud: AWS, GCP, Azure
- On-premises: Self-hosted infrastructure
- Edge: Local deployment
- Hybrid: Combination of cloud and edge
Containerization
# Dockerfile
FROM python:3.10-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY src/ ./src/
COPY models/ ./models/
# Expose port
EXPOSE 8000
# Run application
CMD ["uvicorn", "src.api.endpoints:app", "--host", "0.0.0.0", "--port", "8000"]
Monitoring and Observability
DfLLM Observability
LLM observability includes logging, monitoring, and tracing to understand system behavior, detect issues, and optimize performance.
Key metrics:
- Latency: Response time percentiles
- Throughput: Requests per second
- Error rate: Failed requests percentage
- Quality: Output quality scores
- Cost: Token usage and cost per request
Continuous Improvement
Continuous Improvement Cycle
Here,
- =Performance at time t
- =Learning rate
- =User feedback at time t
Improvement cycle:
- Monitor: Collect metrics and feedback
- Analyze: Identify issues and opportunities
- Plan: Prioritize improvements
- Implement: Make changes
- Verify: Test and validate changes
Project Documentation
README Template
# LLM Capstone Project: [Project Name]
## Overview
Brief description of the project and its goals.
## Architecture
High-level architecture diagram and component descriptions.
## Setup
Instructions for setting up the development environment.
## Usage
Examples of how to use the system.
## Evaluation
Results of evaluation against requirements.
## Deployment
Instructions for deploying to production.
## Future Work
Potential improvements and extensions.
Model Card
Model Card for Capstone Project
Model Card:
- Model: Llama-3-8B-Instruct
- Task: [Your specific task]
- Training data: [Dataset description]
- Evaluation results: [Metrics]
- Limitations: [Known limitations]
- Ethical considerations: [Bias, safety]
Best Practices
Project Management
- Agile methodology: Use sprints and iterative development
- Version control: Use Git for code and documentation
- Regular reviews: Demo progress regularly
- Risk management: Identify and mitigate risks early
Code Quality
- Code review: Review all code before merging
- Testing: Maintain high test coverage
- Documentation: Document code and decisions
- Refactoring: Continuously improve code quality
Learning
- Reflection: Reflect on what you learned
- Knowledge sharing: Share findings with others
- Portfolio: Document the project for your portfolio
- Presentation: Prepare a presentation of your work
Start with a minimal viable product (MVP) and iterate. It's better to have a working simple system than an incomplete complex one.
Practice Exercises
-
Project Planning: Create a detailed project plan for your capstone project, including milestones, risks, and mitigation strategies.
-
Architecture Design: Design the system architecture for your project, including component diagrams and data flows.
-
Implementation Sprint: Implement the core functionality of your project in a time-boxed sprint.
-
Evaluation: Conduct a comprehensive evaluation of your system against requirements.
Key Takeaways:
- Capstone projects integrate all course knowledge into a real-world application
- Start with clear requirements and architecture design
- Implement iteratively, testing at each stage
- Deploy with proper monitoring and observability
- Document everything for portfolio and future reference
What to Learn Next
-> LLM Research Paper Guide Key papers, reading guides, and research methodology for LLMs.
-> LLM Glossary Comprehensive glossary of LLM terms and concepts.
-> LLM Tool Ecosystem Overview of HuggingFace, LangChain, LlamaIndex, and other tools.
-> LLM Best Practices Best practices for common LLM tasks and applications.
-> LLM Roadmap Learning roadmap, skill progression, and career paths in LLMs.
-> Back to LLM Overview Return to the beginning of the LLM course.