Handling Missing Data

Data SciencePandasFree Lesson

Advertisement

Introduction

Real-world data often contains missing values. Pandas provides tools to detect, analyze, and handle them.

Detecting Missing Data

df.isnull()        # True where missing
df.notnull()       # True where present
df.isnull().sum()  # Count per column

# Find rows with missing
df[df["column"].isnull()]

Removing Missing Data

df.dropna()                  # Drop any row with NA
df.dropna(how="all")         # Drop only if all NA
df.dropna(thresh=2)          # Keep rows with 2+ values
df.dropna(subset=["col1", "col2"])  # Consider specific columns

Filling Missing Data

df.fillna(0)                          # Fill with 0
df["col"].fillna(df["col"].mean())    # Fill with mean
df.fillna(method="ffill")             # Forward fill
df.fillna(method="bfill")             # Backward fill
df.interpolate()                      # Interpolate

Practice Problems

  1. Identify columns with most missing values
  2. Fill missing ages with median
  3. Forward fill time series gaps
  4. Drop rows with critical missing data
  5. Impute missing values intelligently

Advertisement

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement