Introduction
Optimizers adjust model parameters to minimize the loss function through gradient-based updates.
Built-in Optimizers
from tensorflow.keras import optimizers
# Adam (most common)
model.compile(optimizer='adam', loss='mse')
# SGD
model.compile(optimizer='SGD()', loss='mse')
# AdamW (decoupled weight decay)
model.compile(optimizer=optimizers.AdamW(weight_decay=0.01), loss='mse')
Optimizer Parameters
# Adam with custom settings
adam = optimizers.Adam(
learning_rate=0.001,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-07
)
# SGD with momentum
sgd = optimizers.SGD(
learning_rate=0.01,
momentum=0.9,
nesterov=True
)
# RMSprop
rmsprop = optimizers.RMSprop(
learning_rate=0.001,
rho=0.9,
momentum=0.0
)
Learning Rate Scheduling
# Step decay
lr_schedule = optimizers.schedules.StepDecay(
initial_learning_rate=0.1,
decay_steps=10000,
decay_rate=0.5
)
# Exponential decay
lr_schedule = optimizers.schedules.ExponentialDecay(
initial_learning_rate=0.1,
decay_steps=10000,
decay_rate=0.95
)
model.compile(optimizer=optimizers.Adam(lr_schedule), loss='mse')
Gradient Clipping
# Clip by norm
optimizer = optimizers.Adam(clipnorm=1.0)
# Clip by value
optimizer = optimizers.Adam(clipvalue=0.5)
Practice Problems
- Compare Adam vs SGD
- Tune learning rate
- Implement learning rate schedule
- Use gradient clipping
- Add weight decay