Text-to-Image Generation
Diffusion Models
Diffusion models generate images by learning to reverse a noise process. They iteratively denoise random noise into coherent images guided by text conditioning.
from diffusers import StableDiffusionPipeline
import torch
def generate_image(prompt, model_name="stabilityai/stable-diffusion-2-1"):
pipe = StableDiffusionPipeline.from_pretrained(
model_name,
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
image = pipe(
prompt,
num_inference_steps=50,
guidance_scale=7.5
).images[0]
return image
Conditioning Mechanisms
class TextConditionedDiffusion:
def __init__(self, unet, text_encoder, noise_scheduler):
self.unet = unet
self.text_encoder = text_encoder
self.noise_scheduler = noise_scheduler
def encode_text(self, prompt):
text_input = self.text_encoder.tokenizer(
prompt,
padding="max_length",
max_length=77,
return_tensors="pt"
)
text_embeddings = self.text_encoder(text_input.input_ids)
return text_embeddings
def denoise_step(self, latent, timestep, text_embeddings):
noise_pred = self.unet(
latent,
timestep,
encoder_hidden_states=text_embeddings
).sample
return noise_pred
Image Generation Models
| Model | Type | Features |
|---|---|---|
| Stable Diffusion | Diffusion | Open source |
| DALL-E 3 | Diffusion | High quality |
| Midjourney | Diffusion | Aesthetic focus |
| Imagen | Diffusion | Google, Parti |
Prompt Engineering for Images
# Effective image prompts
prompts = {
"photorealistic": "professional photograph, 8k, detailed, sharp focus",
"artistic": "oil painting, masterpiece, detailed brushstrokes",
"cinematic": "cinematic lighting, dramatic, film grain",
"digital_art": "digital art, concept art, trending on artstation"
}
def create_image_prompt(base_prompt, style="photorealistic"):
return f"{base_prompt}, {prompts[style]}, high quality"
Summary
Text-to-image generation has revolutionized creative AI. Understanding diffusion models and prompt engineering is essential for creating effective image generation systems.
Next: We'll explore text-to-video generation.