LLM Usage

Prompt Engineering — Getting the Most Out of Language Models

Prompt engineering is the art and science of designing effective inputs to LLMs. This guide covers prompting techniques, sampling strategies, and systematic best practices for reliable, high-quality outputs.

Zero-Shot to Few-Shot — Progressive techniques for task framing
Chain-of-Thought — Elicit step-by-step reasoning for complex problems
Systematic Testing — Treat prompts like code with version control

The quality of the output is determined by the quality of the input.

Prompt Engineering

Prompt engineering is the art and science of designing effective inputs to LLMs. This tutorial covers prompting techniques, sampling strategies, and systematic best practices.

Prompting Techniques

Zero-Shot Prompting

The model performs the task based solely on the instruction, with no examples provided. This relies on the model's pre-trained knowledge and generalization ability.

Few-Shot Prompting

Provide demonstrations of the desired input-output mapping. The model learns the pattern from examples and applies it to new inputs. Typically 2-8 examples are sufficient.

Chain-of-Thought Prompting

Instruct the model to show its reasoning process step by step. This dramatically improves performance on arithmetic, logic, and multi-step problems.

Tree-of-Thought Prompting

Explore multiple reasoning branches, evaluate each, and select the best one. Useful for complex planning and decision-making tasks.

System Prompts

System prompts set the model's behavior, role, and constraints. They are prepended to the conversation and guide all subsequent interactions.

Temperature and Sampling

Temperature Scaling

T = 0: Greedy decoding (deterministic)
T = 0.7: Moderate creativity (recommended)
T = 1.0: Sample from model distribution
T > 1.0: High randomness (creative tasks)

Top-k Sampling

Top-p (Nucleus) Sampling

Top-p dynamically adjusts the candidate set size based on the probability distribution.

Sampling Parameter Guide

Task	Temperature	Top-p	Top-k
Code generation	0.0-0.2	0.9	-
Factual QA	0.0-0.3	0.9	-
Creative writing	0.7-1.0	0.9-0.95	50
Brainstorming	0.8-1.2	0.95	100
Translation	0.0-0.3	0.9	-

Sampling Implementation

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
inputs = tokenizer("The future of AI is", return_tensors="pt")

output = model.generate(
    **inputs, max_new_tokens=100,
    temperature=0.7, top_p=0.9, do_sample=True,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Structured Output Prompting

Guide the model to produce structured outputs like JSON, tables, or specific formats. Use explicit format instructions and delimiters to ensure consistent output.

Best Practices

Do:

Be specific and clear about the task
Provide relevant context and constraints
Use delimiters to separate input from instructions
Specify the desired output format
Test with diverse inputs

Do Not:

Assume the model knows your implicit context
Use ambiguous instructions
Overload prompts with too many tasks
Ignore the model output length limits
Use sarcastic or ironic instructions

Practice Exercises

Design a prompt that extracts structured data from unstructured text. Test with 5 different inputs.
Compare zero-shot vs few-shot performance on a classification task. How many examples are needed for peak performance?
Experiment with temperature settings from 0.0 to 1.5. At what point does output quality degrade?
Design a system prompt for a code review assistant. Test it on 3 different code snippets.

Advanced Prompting Techniques

Prompt Chaining

Break complex tasks into smaller prompts executed sequentially. The output of one prompt becomes the input to the next. This improves reliability for multi-step tasks.

Prompt Caching

For repeated queries, cache the system prompt computation to reduce latency and cost. Many LLM APIs now support prompt caching natively.

Meta-Prompting

Use the LLM to generate or optimize prompts. Ask the model to improve your prompt based on desired behavior. This creates a feedback loop for prompt optimization.

Constrained Generation

Use techniques like JSON mode, grammar-constrained decoding, or regex filtering to ensure outputs conform to specific formats. Libraries like Outlines and guidance make this easy.

Prompt Optimization

DSPy Framework

DSPy provides a programmatic approach to prompt optimization. Instead of manually crafting prompts, you define the task signature and let the framework optimize the prompt automatically using teleprompters.

Automatic Prompt Engineering

Use LLMs to generate and evaluate multiple prompt variants. Select the best-performing prompt based on a validation set. This is more systematic than manual prompt engineering.

Prompt Security

Injection Attacks

Prompt injection occurs when user input manipulates the system prompt to override intended behavior. Always validate and sanitize user inputs. Use delimiter-based defenses and instruction hierarchy.

Jailbreak Prevention

Jailbreaking attempts to bypass safety guardrails. Defense strategies include: system prompt hardening, output filtering, content classifiers, and red-team testing.

Prompt Testing and Debugging

Systematic prompt testing is essential for production deployments. Create a test suite with diverse inputs, edge cases, and adversarial examples. Track metrics like accuracy, latency, and cost across prompt versions.

Prompt Version Control

Treat prompts like code. Use version control, review processes, and A/B testing. Document the rationale behind each prompt change and track performance metrics over time.

Common Prompt Pitfalls

Overly specific prompts that do not generalize to new inputs.
Implicit assumptions that the model cannot satisfy.
Contradictory instructions that confuse the model.
Missing edge case handling for unexpected inputs.
Ignoring token limits and truncation behavior.

What to Learn Next

-> In-Context Learning Teaching LLMs new tasks without training—purely through prompts.

-> Chain-of-Thought Reasoning Making LLMs think step by step for complex reasoning problems.

-> RAG System Design Building production-ready retrieval systems for grounded generation.

-> Retrieval-Augmented Generation Combining LLMs with external knowledge for accurate, cited answers.

-> LLM Agent Frameworks Building autonomous agents that reason, plan, and act.

-> Building Production LLM Apps From prototype to production: deploying LLMs at scale.

Prompt Engineering

Prompt Engineering — Getting the Most Out of Language Models

Prompt Engineering

Prompting Techniques

Zero-Shot Prompting

Few-Shot Prompting

Chain-of-Thought Prompting

Tree-of-Thought Prompting

System Prompts

Temperature and Sampling

Temperature Scaling

Top-k Sampling

Top-p (Nucleus) Sampling

Sampling Parameter Guide

Sampling Implementation

Structured Output Prompting

Best Practices

Do:

Do Not:

Practice Exercises

Advanced Prompting Techniques

Prompt Chaining

Prompt Caching

Meta-Prompting

Constrained Generation

Prompt Optimization

DSPy Framework

Automatic Prompt Engineering

Prompt Security

Injection Attacks

Jailbreak Prevention

Prompt Testing and Debugging

Prompt Version Control

Common Prompt Pitfalls

What to Learn Next

Need Expert LLM Help?