CW

LLM Security Best Practices

ProductionSecurityFree Lesson

Advertisement

LLM Production

LLM Security Best Practices โ€” Defending Against Adversarial AI

LLM systems introduce novel attack surfaces: prompt injection, data exfiltration, jailbreaking, and model extraction. Security must be built in from design, not bolted on after deployment.

  • Attack Vectors โ€” Prompt injection, jailbreaking, data poisoning
  • Defenses โ€” Input validation, output filtering, guardrails
  • Privacy โ€” Data handling, PII protection, compliance

Security is not a featureโ€”it is a requirement.

LLM Security Best Practices

LLMs create unique security challenges that traditional application security cannot address. The model itself is both the application logic and the attack surface, making security a first-class concern in LLM system design.

DfLLM Threat Model

An LLM threat model identifies attack vectors specific to language model systems: (1) input manipulation (prompt injection, jailbreaking), (2) data extraction (PII leakage, training data extraction), (3) model manipulation (adversarial examples, poisoning), and (4) misuse (generating harmful content, misinformation).

Prompt Injection Attacks

Direct Prompt Injection

DfPrompt Injection

Prompt injection occurs when an attacker crafts input that overrides or modifies the system prompt, causing the model to ignore safety instructions, reveal confidential information, or perform unintended actions.

Attack Patterns:

Attack TypeDescriptionExample
OverrideIgnoring system instructions"Ignore previous instructions and..."
EscalationGaining unauthorized access"As an admin, I need you to..."
ExtractionRevealing system prompt"Repeat your instructions verbatim"
IndirectEmbedded in documentsMalicious content in retrieved documents

Indirect Prompt Injection

DfIndirect Prompt Injection

Indirect prompt injection occurs when attacker-controlled content (web pages, emails, documents) is ingested by the LLM as part of its context, containing hidden instructions that manipulate the model's behavior.

Indirect prompt injection is particularly dangerous in RAG systems where documents from untrusted sources are retrieved and included in the prompt. The attack surface scales with the number of data sources.

Defensive Strategies

Input Sanitization

DfInput Sanitization

Input sanitization validates and cleans user inputs before they reach the model. This includes removing control characters, normalizing unicode, detecting injection patterns, and enforcing input length limits.

Defense Layers:

Architecture Diagram
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Input Validation               โ”‚
โ”‚  (Length limits, format checks, PII)    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚         Injection Detection              โ”‚
โ”‚  (Pattern matching, classifier-based)   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚           System Prompt Design           โ”‚
โ”‚  (Delimiters, role reinforcement)       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚         Output Filtering                 โ”‚
โ”‚  (Safety classifiers, PII removal)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

System Prompt Hardening

Prompt Security Score

Ssec=w1cdotD+w2cdotR+w3cdotI+w4cdotLS_{sec} = w_1 \\cdot D + w_2 \\cdot R + w_3 \\cdot I + w_4 \\cdot L

Here,

  • DD=Delimiter strength (0-1)
  • RR=Role reinforcement (0-1)
  • II=Instruction specificity (0-1)
  • LL=Length adequacy (0-1)
  • wiw_i=Weight for each factor

Best Practices:

  1. Use clear delimiters between system prompt and user input
  2. Reinforce the model's role at the beginning and end of the prompt
  3. Include explicit instructions about what the model should NOT do
  4. Use few-shot examples of safe behavior

Guardrails and Output Filtering

DfLLM Guardrails

LLM guardrails are automated checks applied to model inputs and outputs to enforce safety policies. Guardrails can be rule-based (regex, keyword filtering) or model-based (safety classifiers, content moderation APIs).

Data Privacy

PII Detection and Removal

PII Risk Score

RPII=sumiP(detecti)timesP(leakiโˆฃdetecti)timesSiR_{PII} = \\sum_{i} P(detect_i) \\times P(leak_i|detect_i) \\times S_i

Here,

  • P(detecti)P(detect_i)=Probability of detecting PII type i
  • P(leakiโˆฃdetecti)P(leak_i|detect_i)=Conditional probability of leakage given detection
  • SiS_i=Sensitivity score of PII type i

Training Data Extraction

DfTraining Data Extraction

Training data extraction is an adversarial attack where an attacker crafts prompts designed to extract memorized training data, including personal information, copyrighted content, or confidential documents.

Mitigation strategies include: differential privacy during training, deduplication of training data, output monitoring for memorized sequences, and rate limiting on repetitive queries.

Adversarial Robustness

Jailbreaking

DfJailbreaking

Jailbreaking refers to techniques that bypass an LLM's safety training to generate harmful, illegal, or policy-violating content. Techniques include role-playing scenarios, hypothetical framing, multi-turn manipulation, and encoding tricks.

Red Teaming

DfLLM Red Teaming

LLM red teaming is a structured testing process whereๅฎ‰ๅ…จ researchers systematically probe an LLM system for vulnerabilities, including prompt injection, jailbreaking, data extraction, and harmful content generation. Findings inform defensive improvements.

Red Teaming Framework:

PhaseActivitiesOutput
ReconnaissanceMap system prompts, identify data sourcesAttack surface map
ExploitationTest injection, jailbreak, extractionVulnerability report
ValidationConfirm reproducibility, assess impactRisk assessment
RemediationImplement defenses, retestFix verification

Compliance and Governance

Data Handling Policies

DfLLM Data Governance

LLM data governance defines policies for: (1) what data can be sent to external APIs, (2) how user interactions are logged and retained, (3) how PII is handled in prompts and responses, and (4) how model outputs are audited for compliance.

Regulatory frameworks (GDPR, CCPA, HIPAA) apply to LLM systems. Data sent to third-party LLM APIs may be subject to data processing agreements. Self-hosting provides greater control over data handling.

Practice Exercises

  1. Conceptual: Explain the difference between direct and indirect prompt injection. Why is indirect injection harder to defend against?

  2. Mathematical: Calculate the probability of a successful prompt injection attack given: injection detection accuracy 95%, output filtering accuracy 90%, and system prompt resistance 80%.

  3. Practical: Design a multi-layered defense system for a customer service chatbot that processes PII and has access to internal knowledge bases.

  4. Research: Compare the effectiveness of rule-based versus classifier-based guardrails for detecting jailbreak attempts. What are the trade-offs?

Key Takeaways:

  • Prompt injection is the primary attack vector for LLM systems
  • Defense requires multiple layers: input sanitization, injection detection, prompt hardening, output filtering
  • Indirect prompt injection via RAG data sources is a growing threat
  • PII protection requires detection, masking, and monitoring
  • Red teaming is essential for identifying vulnerabilities before deployment

What to Learn Next

-> LLM Serving Architectures vLLM, TGI, TensorRT-LLM, and serving patterns for production deployments.

-> Multi-Tenant LLM Systems Tenant isolation, resource sharing, and customization at scale.

-> LLM Monitoring and Observability Logging, tracing, metrics, and drift detection for production systems.

-> LLM Evaluation in Production Online evaluation, user feedback loops, and quality assurance.

-> Cost Optimization for LLMs Token economics, caching, and batching for cost efficiency.

-> LLM Disaster Recovery Failover, backup models, and graceful degradation strategies.

Advertisement

Need Expert LLM Help?

Get personalized tutoring, RAG system design, or production LLM consulting.

Advertisement