Python Web Scraping — BeautifulSoup & Beyond

Python AdvancedWeb ScrapingFree Lesson

Advertisement

Python Web Scraping — BeautifulSoup & Beyond

Web scraping extracts data from websites. It is used for data collection, price monitoring, and research.

Learning Objectives

  • Parse HTML with BeautifulSoup
  • Handle pagination and dynamic content
  • Respect robots.txt and rate limits
  • Store scraped data efficiently

BeautifulSoup Basics

from bs4 import BeautifulSoup
import requests

response = requests.get('https://example.com')
soup = BeautifulSoup(response.text, 'html.parser')

# Find elements
title = soup.find('h1').text
links = soup.find_all('a')
for link in links:
    print(link.get('href'), link.text)

# CSS selectors
items = soup.select('div.product > h2.title')
prices = soup.select('.price')

Structured Scraping

def scrape_products(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    products = []
    for card in soup.select('.product-card'):
        product = {
            'name': card.select_one('.title').text.strip(),
            'price': float(card.select_one('.price').text.strip('$')),
            'rating': float(card.select_one('.rating').text),
            'url': card.select_one('a')['href']
        }
        products.append(product)

    return products

Handling Pagination

def scrape_all_pages(base_url):
    all_products = []
    page = 1

    while True:
        url = f"{base_url}?page={page}"
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')

        products = soup.select('.product')
        if not products:
            break

        for product in products:
            all_products.append(parse_product(product))

        page += 1
        time.sleep(1)  # Be polite

    return all_products

Key Takeaways

  1. Use BeautifulSoup for HTML parsing
  2. Always add delays between requests
  3. Check robots.txt before scraping
  4. Use CSS selectors for precise element targeting
  5. Store data as JSON or CSV for easy analysis

Advertisement

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement