Python Regular Expressions — Pattern Matching Mastery

Regular expressions describe search patterns in text. They are essential for validation, extraction, and text transformation.

Learning Objectives

Use the re module for pattern matching
Master quantifiers, groups, and anchors
Apply lookahead and lookbehind assertions
Solve real-world text processing problems

re Module Basics

import re

text = "The price is $42.50 and $19.99"

# Find all numbers
numbers = re.findall(r'\d+\.?\d*', text)
print(numbers)  # ['42.50', '19.99']

# Match and extract
match = re.search(r'\$(\d+\.?\d*)', text)
if match:
    print(match.group(0))  # $42.50 (full match)
    print(match.group(1))  # 42.50  (first group)

Pattern Syntax

# Quantifiers
r'a+'       # One or more 'a'
r'a*'       # Zero or more 'a'
r'a?'       # Zero or one 'a'
r'a{3}'     # Exactly 3 'a's
r'a{2,4}'   # 2 to 4 'a's

# Character classes
r'\d'       # Digit [0-9]
r'\w'       # Word character [a-zA-Z0-9_]
r'\s'       # Whitespace
r'[aeiou]'  # Any vowel
r'[^aeiou]' # Not a vowel

# Anchors
r'^Hello'   # Starts with
r'world$'   # Ends with
r'\bword\b' # Word boundary

# Groups
r'(abc)'           # Capturing group
r'(?:abc)'         # Non-capturing group
r'(?P<name>abc)'   # Named group

Common Patterns

# Email validation
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
bool(re.match(email_pattern, 'user@example.com'))  # True

# Phone number (US)
phone_pattern = r'(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'

# URL extraction
url_pattern = r'https?://(?:www\.)?[\w.-]+\.[a-zA-Z]{2,}(?:/[\w./?%&=-]*)?'
urls = re.findall(url_pattern, "Visit https://example.com or http://test.org/path")

# Date formats
date_pattern = r'\d{4}[-/]\d{2}[-/]\d{2}'

Lookahead and Lookbehind

# Positive lookahead: followed by X
r'\d+(?= dollars)'   # 42 in "42 dollars"

# Negative lookahead: NOT followed by X
r'\d+(?! dollars)'   # 42 in "42 euros"

# Positive lookbehind: preceded by X
r'(?<=\$)\d+'        # 42 in "$42"

# Negative lookbehind: NOT preceded by X
r'(?<!\$)\d+'        # 42 in "42 dollars" but not "$42"

Substitution

text = "Hello World, hello Python"

# Replace all occurrences
result = re.sub(r'hello', 'Hi', text, flags=re.IGNORECASE)
# "Hi World, Hi Python"

# Replace with function
def double_number(match):
    return str(int(match.group()) * 2)

result = re.sub(r'\d+', double_number, "I have 3 cats and 5 dogs")
# "I have 6 cats and 10 dogs"

Key Takeaways

Use raw strings r'...' for regex patterns
\d digits, \w word chars, \s whitespace
Groups () capture matched text
Lookahead/lookbehind for context-sensitive matching
re.sub() for pattern-based replacement

Python Regular Expressions — Pattern Matching Mastery

Python Regular Expressions — Pattern Matching Mastery

Learning Objectives

re Module Basics

Pattern Syntax

Common Patterns

Lookahead and Lookbehind

Substitution

Key Takeaways

Need Expert Python Help?