Constitutional AI
What is Constitutional AI?
Constitutional AI (CAI) is a method for training AI systems to be helpful and harmless without relying on human feedback for every response. Instead, it uses a set of principles (a "constitution") to guide self-improvement.
The CAI Process
Phase 1: Supervised Learning from Human Feedback (SL-CAI)
constitutional_principles = [
"Choose the response that is most helpful and harmless.",
"Choose the response that is most accurate and truthful.",
"Choose the response that is most respectful and kind.",
"Choose the response that avoids harmful content.",
"Choose the response that maintains user privacy."
]
def generate_critique(model, prompt, response, principles):
critique_prompt = f"""Given the following interaction:
Human: {prompt}
Assistant: {response}
Critique this response based on these principles:
{chr(10).join(principles)}
What issues do you see?"""
return model.generate(critique_prompt)
def generate_revision(model, prompt, response, critique):
revision_prompt = f"""Given the original response and critique:
Human: {prompt}
Assistant: {response}
Critique: {critique}
Please revise the response to address these issues:"""
return model.generate(revision_prompt)
Phase 2: Reinforcement Learning from AI Feedback (RL-CAI)
def ai_preference_learning(model, prompt, response_a, response_b, principles):
preference_prompt = f"""Consider these principles:
{chr(10).join(principles)}
Given the prompt: {prompt}
Response A: {response_a}
Response B: {response_b}
Which response better aligns with the principles? Explain why."""
return model.generate(preference_prompt)
Example Constitution
constitution = {
"helpfulness": [
"Be as helpful as possible while remaining harmless.",
"Provide accurate and relevant information.",
"Complete tasks thoroughly and completely."
],
"harmlessness": [
"Do not assist with illegal or harmful activities.",
"Avoid content that could cause physical or psychological harm.",
"Protect user privacy and confidentiality."
],
"honesty": [
"Do not make up information or pretend to know things you don't.",
"Acknowledge uncertainty when appropriate.",
"Be transparent about limitations."
]
}
Benefits of CAI
| Benefit | Description |
|---|---|
| Scalability | Can generate training data without constant human feedback |
| Consistency | Principles apply uniformly across interactions |
| Transparency | Constitution is explicit and auditable |
| Adaptability | Principles can be updated as needs change |
Summary
Constitutional AI provides a principled approach to AI alignment that scales beyond human feedback capacity while maintaining transparency and consistency.
Next: We'll explore parameter-efficient fine-tuning with LoRA.