Introduction
Pandas provides powerful tools for data cleaning, transformation, and analysis.
Adding and Modifying Columns
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
# Add new column
df["c"] = df["a"] + df["b"]
# Conditional column
df["large"] = df["a"] > 1
# Apply function
df["double"] = df["a"].apply(lambda x: x * 2)
Filtering
# Boolean mask
df[df["a"] > 1]
# Multiple conditions
df[(df["a"] > 1) & (df["b"] < 6)]
# Query method
df.query("a > 1 and b < 6")
Sorting
df.sort_values("a")
df.sort_values("a", ascending=False)
df.sort_values(["a", "b"], ascending=[True, False])
Handling Duplicates
df.drop_duplicates() # Remove duplicate rows
df.drop_duplicates(subset=["a"]) # Based on column
df.duplicated().sum() # Count duplicates
Practice Problems
- Filter DataFrame by multiple criteria
- Add computed columns
- Sort by multiple columns
- Remove duplicates intelligently
- Use apply with custom functions