Optimizers

Deep LearningKerasFree Lesson

Advertisement

Introduction

Optimizers adjust model parameters to minimize the loss function through gradient-based updates.

Built-in Optimizers

from tensorflow.keras import optimizers

# Adam (most common)
model.compile(optimizer='adam', loss='mse')

# SGD
model.compile(optimizer='SGD()', loss='mse')

# AdamW (decoupled weight decay)
model.compile(optimizer=optimizers.AdamW(weight_decay=0.01), loss='mse')

Optimizer Parameters

# Adam with custom settings
adam = optimizers.Adam(
    learning_rate=0.001,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-07
)

# SGD with momentum
sgd = optimizers.SGD(
    learning_rate=0.01,
    momentum=0.9,
    nesterov=True
)

# RMSprop
rmsprop = optimizers.RMSprop(
    learning_rate=0.001,
    rho=0.9,
    momentum=0.0
)

Learning Rate Scheduling

# Step decay
lr_schedule = optimizers.schedules.StepDecay(
    initial_learning_rate=0.1,
    decay_steps=10000,
    decay_rate=0.5
)

# Exponential decay
lr_schedule = optimizers.schedules.ExponentialDecay(
    initial_learning_rate=0.1,
    decay_steps=10000,
    decay_rate=0.95
)

model.compile(optimizer=optimizers.Adam(lr_schedule), loss='mse')

Gradient Clipping

# Clip by norm
optimizer = optimizers.Adam(clipnorm=1.0)

# Clip by value
optimizer = optimizers.Adam(clipvalue=0.5)

Practice Problems

  1. Compare Adam vs SGD
  2. Tune learning rate
  3. Implement learning rate schedule
  4. Use gradient clipping
  5. Add weight decay

Advertisement

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement