Python Regular Expressions — Pattern Matching Mastery

Python AdvancedRegular ExpressionsFree Lesson

Advertisement

Python Regular Expressions — Pattern Matching Mastery

Regular expressions describe search patterns in text. They are essential for validation, extraction, and text transformation.

Learning Objectives

  • Use the re module for pattern matching
  • Master quantifiers, groups, and anchors
  • Apply lookahead and lookbehind assertions
  • Solve real-world text processing problems

re Module Basics

import re

text = "The price is $42.50 and $19.99"

# Find all numbers
numbers = re.findall(r'\d+\.?\d*', text)
print(numbers)  # ['42.50', '19.99']

# Match and extract
match = re.search(r'\$(\d+\.?\d*)', text)
if match:
    print(match.group(0))  # $42.50 (full match)
    print(match.group(1))  # 42.50  (first group)

Pattern Syntax

# Quantifiers
r'a+'       # One or more 'a'
r'a*'       # Zero or more 'a'
r'a?'       # Zero or one 'a'
r'a{3}'     # Exactly 3 'a's
r'a{2,4}'   # 2 to 4 'a's

# Character classes
r'\d'       # Digit [0-9]
r'\w'       # Word character [a-zA-Z0-9_]
r'\s'       # Whitespace
r'[aeiou]'  # Any vowel
r'[^aeiou]' # Not a vowel

# Anchors
r'^Hello'   # Starts with
r'world$'   # Ends with
r'\bword\b' # Word boundary

# Groups
r'(abc)'           # Capturing group
r'(?:abc)'         # Non-capturing group
r'(?P<name>abc)'   # Named group

Common Patterns

# Email validation
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
bool(re.match(email_pattern, 'user@example.com'))  # True

# Phone number (US)
phone_pattern = r'(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'

# URL extraction
url_pattern = r'https?://(?:www\.)?[\w.-]+\.[a-zA-Z]{2,}(?:/[\w./?%&=-]*)?'
urls = re.findall(url_pattern, "Visit https://example.com or http://test.org/path")

# Date formats
date_pattern = r'\d{4}[-/]\d{2}[-/]\d{2}'

Lookahead and Lookbehind

# Positive lookahead: followed by X
r'\d+(?= dollars)'   # 42 in "42 dollars"

# Negative lookahead: NOT followed by X
r'\d+(?! dollars)'   # 42 in "42 euros"

# Positive lookbehind: preceded by X
r'(?<=\$)\d+'        # 42 in "$42"

# Negative lookbehind: NOT preceded by X
r'(?<!\$)\d+'        # 42 in "42 dollars" but not "$42"

Substitution

text = "Hello World, hello Python"

# Replace all occurrences
result = re.sub(r'hello', 'Hi', text, flags=re.IGNORECASE)
# "Hi World, Hi Python"

# Replace with function
def double_number(match):
    return str(int(match.group()) * 2)

result = re.sub(r'\d+', double_number, "I have 3 cats and 5 dogs")
# "I have 6 cats and 10 dogs"

Key Takeaways

  1. Use raw strings r'...' for regex patterns
  2. \d digits, \w word chars, \s whitespace
  3. Groups () capture matched text
  4. Lookahead/lookbehind for context-sensitive matching
  5. re.sub() for pattern-based replacement

Advertisement

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement