Python Web Scraping — BeautifulSoup & Beyond

Web scraping extracts data from websites. It is used for data collection, price monitoring, and research.

Learning Objectives

Parse HTML with BeautifulSoup
Handle pagination and dynamic content
Respect robots.txt and rate limits
Store scraped data efficiently

BeautifulSoup Basics

from bs4 import BeautifulSoup
import requests

response = requests.get('https://example.com')
soup = BeautifulSoup(response.text, 'html.parser')

# Find elements
title = soup.find('h1').text
links = soup.find_all('a')
for link in links:
    print(link.get('href'), link.text)

# CSS selectors
items = soup.select('div.product > h2.title')
prices = soup.select('.price')

Structured Scraping

def scrape_products(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    products = []
    for card in soup.select('.product-card'):
        product = {
            'name': card.select_one('.title').text.strip(),
            'price': float(card.select_one('.price').text.strip('$')),
            'rating': float(card.select_one('.rating').text),
            'url': card.select_one('a')['href']
        }
        products.append(product)

    return products

Handling Pagination

def scrape_all_pages(base_url):
    all_products = []
    page = 1

    while True:
        url = f"{base_url}?page={page}"
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')

        products = soup.select('.product')
        if not products:
            break

        for product in products:
            all_products.append(parse_product(product))

        page += 1
        time.sleep(1)  # Be polite

    return all_products

Key Takeaways

Use BeautifulSoup for HTML parsing
Always add delays between requests
Check robots.txt before scraping
Use CSS selectors for precise element targeting
Store data as JSON or CSV for easy analysis

Python Web Scraping — BeautifulSoup & Beyond

Python Web Scraping — BeautifulSoup & Beyond

Learning Objectives

BeautifulSoup Basics

Structured Scraping

Handling Pagination

Key Takeaways

Need Expert Python Help?