Introduction
Pandas provides multiple ways to combine DataFrames based on their relationships.
Merge
# SQL-style join
left = pd.DataFrame({"key": ["A", "B", "C"], "value": [1, 2, 3]})
right = pd.DataFrame({"key": ["A", "B", "D"], "value2": [10, 20, 30]})
pd.merge(left, right, on="key")
pd.merge(left, right, on="key", how="left")
pd.merge(left, right, on="key", how="outer")
pd.merge(left, right, on="key", how="inner")
Concat
# Stack DataFrames
pd.concat([df1, df2]) # Vertical
pd.concat([df1, df2], axis=1) # Horizontal
# Ignore index
pd.concat([df1, df2], ignore_index=True)
# Add only missing columns
pd.concat([df1, df2], join="inner")
Join
# Index-based join
left.set_index("key").join(right.set_index("key"))
Practice Problems
- Merge customers and orders DataFrames
- Concatenate monthly sales data
- Perform left join to keep all customers
- Combine DataFrames with different columns
- Handle duplicate keys in merge