Introduction
Reinforcement learning trains agents to make sequential decisions through rewards.
OpenAI Gym
import gym
env = gym.make("CartPole-v1")
state = env.reset()
for episode in range(10):
state = env.reset()
total_reward = 0
done = False
while not done:
action = env.action_space.sample() # Random action
state, reward, done, info = env.step(action)
total_reward += reward
print(f"Episode {episode}: Total reward = {total_reward}")
Q-Learning
import numpy as np
q_table = np.zeros((state_space, action_space))
alpha = 0.1 # Learning rate
gamma = 0.95 # Discount factor
epsilon = 0.1 # Exploration rate
# Epsilon-greedy action selection
if np.random.random() < epsilon:
action = env.action_space.sample()
else:
action = np.argmax(q_table[state])
# Update Q-value
q_table[state, action] = q_table[state, action] + alpha * (
reward + gamma * np.max(q_table[new_state]) - q_table[state, action]
)
Practice Problems
- Run environments from OpenAI Gym
- Implement Q-learning algorithm
- Create custom environment
- Train agent to solve task
- Visualize learning progress