Reinforcement Learning

Introduction

Reinforcement learning trains agents to make sequential decisions through rewards.

OpenAI Gym

import gym

env = gym.make("CartPole-v1")
state = env.reset()

for episode in range(10):
    state = env.reset()
    total_reward = 0
    done = False
    
    while not done:
        action = env.action_space.sample()  # Random action
        state, reward, done, info = env.step(action)
        total_reward += reward
    
    print(f"Episode {episode}: Total reward = {total_reward}")

Q-Learning

import numpy as np

q_table = np.zeros((state_space, action_space))
alpha = 0.1  # Learning rate
gamma = 0.95  # Discount factor
epsilon = 0.1  # Exploration rate

# Epsilon-greedy action selection
if np.random.random() < epsilon:
    action = env.action_space.sample()
else:
    action = np.argmax(q_table[state])

# Update Q-value
q_table[state, action] = q_table[state, action] + alpha * (
    reward + gamma * np.max(q_table[new_state]) - q_table[state, action]
)

Practice Problems

Run environments from OpenAI Gym
Implement Q-learning algorithm
Create custom environment
Train agent to solve task
Visualize learning progress

Introduction

OpenAI Gym

Q-Learning

Practice Problems

Need Expert Python Help?