Advanced Topics

Environmental Impact of LLMs

Training and running LLMs has significant environmental costs. Understanding and mitigating these costs is essential for sustainable AI development.

Carbon Footprint — Training emissions, inference costs, lifecycle analysis
Energy Efficiency — Hardware optimization, algorithmic improvements
Sustainable AI — Renewable energy, model efficiency, responsible deployment

We do not inherit the earth from our ancestors—we borrow it from our children.

Environmental Impact of LLMs

Training and running LLMs has significant environmental costs. Understanding and mitigating these costs is essential for sustainable AI development.

DfAI Carbon Footprint

The carbon footprint of an LLM includes: (1) embodied carbon from hardware manufacturing, (2) training energy consumption, (3) inference energy consumption over the model's lifetime, and (4) cooling infrastructure energy. The total lifecycle emissions are the sum of these components.

Training Carbon Emissions

Estimating Training Emissions

Training Carbon Emissions

C_{\\text{train}} = \\text{TTC} \\times PUE \\times CI_{\\text{grid}}

Here,

$TTC$ =Total Training Compute (FLOPs)
$PUE$ =Power Usage Effectiveness (typically 1.1-1.5)
$CI_{\text{grid}}$ =Carbon Intensity of grid electricity (gCO2/kWh)

Empirical Estimates

Model	Training Compute (FLOPs)	Energy (MWh)	CO₂ (tonnes)	Equivalent
GPT-3 175B	3.14 × 10²³	~1,287	~552	5 cars/year
LLaMA 2 70B	1.7 × 10²⁴	~690	~300	3 cars/year
GPT-4 (est.)	2.1 × 10²⁵	~8,500	~3,600	360 cars/year
Gemini Ultra	~5 × 10²⁵	~20,000	~8,500	850 cars/year

The carbon intensity of electricity varies dramatically by location: ~50 gCO2/kWh in France (nuclear), ~300 gCO2/kWh in the US average, ~600 gCO2/kWh in China (coal-heavy). Training in locations with clean energy reduces emissions by 6-12x.

Embodied Carbon

The carbon cost of AI goes beyond electricity—hardware manufacturing has significant embodied emissions:

DfEmbodied Carbon

Embodied carbon refers to the total greenhouse gas emissions produced during the manufacturing, transportation, and disposal of hardware. A single GPU (e.g., NVIDIA H100) has an estimated embodied carbon of 150-300 kg CO2e, which is amortized over its 3-5 year lifespan.

Component	Embodied Carbon	Lifespan	Annual Carbon
NVIDIA H100 GPU	200 kg CO2e	4 years	50 kg/year
Server rack	500 kg CO2e	5 years	100 kg/year
Data center building	10,000+ tonnes	20 years	500+ tonnes/year
Cooling system	2,000+ tonnes	15 years	133+ tonnes/year

Training vs Inference Emissions

Lifetime Emissions Ratio

R_{\\text{lifetime}} = \\frac{C_{\\text{train}}}{C_{\\text{inference}} \\times N_{\\text{queries}} \\times T_{\\text{avg}}}

Here,

$C_{\text{train}}$ =One-time training emissions
$C_{\text{inference}}$ =Emissions per query per second
$N_{\text{queries}}$ =Total queries over model lifetime
$T_{\text{avg}}$ =Average response length (seconds)

For popular models, inference typically dominates lifetime emissions (70-90%) because models serve millions of queries daily for months or years.

Inference Carbon Footprint

Per-Query Carbon Footprint

C_{\\text{query}} = \\frac{E_{\\text{GPU}} \\times CI_{\\text{grid}}}{Q_{\\text{throughput}}}

Here,

$E_{\text{GPU}}$ =Energy per GPU per second (W)
$CI_{\text{grid}}$ =Grid carbon intensity (gCO2/kWh)
$Q_{\text{throughput}}$ =Queries per GPU per second

Inference Efficiency Comparison

Model	Parameters	Queries/GPU/hour	Energy/1M tokens
GPT-3.5	~20B (est.)	~500	~0.3 kWh
LLaMA 2 70B	70B	~100	~1.2 kWh
GPT-4	~1.8T (est.)	~5	~12 kWh
Claude 3 Opus	~2T (est.)	~3	~18 kWh

Energy Efficiency Improvements

Model-Level Efficiency

Energy-Performance Tradeoff

\\text{Efficiency} = \\frac{\\text{Performance}(\\delta)}{\\text{Energy}(\\delta)}

Here,

$\text{Performance}(\delta)$ =Model accuracy/capability
$\text{Energy}(\delta)$ =Total energy for training + inference
$\delta$ =Model size/compute budget

Key efficiency strategies:

Quantization: 4-bit inference reduces energy by ~4x vs FP16
Pruning: Sparse models (50% sparsity) reduce energy proportionally
Knowledge distillation: Smaller models capture most capability at fraction of energy
Mixture of Experts: Activate only relevant subnetworks per query

Hardware-Level Efficiency

Technology	Energy Reduction	Maturity
GPU optimization (H100 vs A100)	~2x	Production
Sparse tensor cores	~1.5x	Production
Analog AI chips	~10x	Research
Optical computing	~100x	Early research

The most impactful action for reducing LLM environmental impact is not training smaller models, but rather: (1) serving more queries per GPU (throughput optimization), (2) using renewable energy for data centers, and (3) implementing aggressive caching to avoid redundant inference.

Water Consumption

AI data centers also consume significant water for cooling:

DfWater Usage Effectiveness (WUE)

WUE measures the liters of water consumed per kWh of IT energy. Modern data centers have WUE of 0.5-2.0 L/kWh. A large AI training run may consume millions of liters of water for cooling.

Facility	WUE (L/kWh)	Annual Water	Notes
Google (average)	0.5	4.3B gallons	Includes all data centers
Microsoft (average)	0.6	3.5B gallons	Water-stressed regions
Typical AI data center	1.0-1.5	Varies	Air-cooled
Liquid-cooled facility	0.1-0.3	Minimal	Emerging technology

Water consumption is often overlooked in AI environmental assessments. In water-stressed regions, AI data center water use can compete with agricultural and domestic needs.

Sustainable AI Practices

Carbon-Aware Computing

Carbon-Aware Scheduling

t^* = \\arg\\min_{t \\in [T_{\\text{start}}, T_{\\text{end}}]} CI_{\\text{grid}}(t) \\times E_{\\text{compute}}(t)

Here,

$t^*$ =Optimal start time for compute job
$CI_{\text{grid}}(t)$ =Grid carbon intensity at time t
$E_{\text{compute}}(t)$ =Energy consumption at time t

Scheduling compute jobs when the grid is cleanest can reduce emissions by 30-50% without changing the compute itself.

Reporting Standards

Metric	Description	Standard
PUE	Power Usage Effectiveness	ISO 30134-2
CUE	Carbon Usage Effectiveness	ISO 30134-3
WUE	Water Usage Effectiveness	ISO 30134-5
Green Software Score	Combined efficiency metric	Green Software Foundation

Practice Exercises

Conceptual: Explain why inference emissions typically dominate training emissions for production LLMs. Under what circumstances would training emissions dominate?
Mathematical: If a 70B parameter model is served 1M queries/day at 500 tokens average length, and each query uses 0.001 kWh of energy, compute the annual inference carbon footprint assuming a grid intensity of 200 gCO2/kWh.
Practical: Compare the energy consumption of running inference on the same model using FP16, INT8, and INT4 quantization. What is the energy savings at each level?
Research: Design a carbon-aware scheduling system for LLM inference. What are the tradeoffs between latency and carbon footprint?

Key Takeaways:

Training emissions depend on compute, PUE, and grid carbon intensity
Inference typically dominates lifetime emissions (70-90%) for popular models
Quantization, pruning, and distillation reduce energy proportionally
Carbon-aware scheduling can reduce emissions by 30-50%
Reporting standards (PUE, CUE, WUE) enable transparent environmental accounting

What to Learn Next

-> Model Compression Pipeline End-to-end compression: quantization + pruning + distillation.

-> Quantization Techniques Deep dive into quantization for energy efficiency.

-> Future of LLMs Trends toward more efficient AI systems.

-> Hardware-Aware LLM Design Co-designing models and hardware for efficiency.

-> Copyright and Legal Issues Legal frameworks governing AI sustainability.

-> LLM Optimization for Mobile Edge deployment for energy-efficient inference.

Environmental Impact of LLMs

Environmental Impact of LLMs

Environmental Impact of LLMs

DfAI Carbon Footprint

Training Carbon Emissions

Estimating Training Emissions

Training Carbon Emissions

Empirical Estimates

Embodied Carbon

DfEmbodied Carbon

Training vs Inference Emissions

Lifetime Emissions Ratio

Inference Carbon Footprint

Per-Query Carbon Footprint

Inference Efficiency Comparison

Energy Efficiency Improvements

Model-Level Efficiency

Energy-Performance Tradeoff

Hardware-Level Efficiency

Water Consumption

DfWater Usage Effectiveness (WUE)

Sustainable AI Practices

Carbon-Aware Computing

Carbon-Aware Scheduling

Reporting Standards

Practice Exercises

What to Learn Next

Need Expert LLM Help?