Introduction
Pandas provides multiple methods for selecting data from DataFrames. Understanding the difference between loc (label-based), iloc (position-based), and boolean indexing is essential for efficient data access. Proper selection techniques are fundamental to data manipulation and form the basis for more complex data operations.
Key Concepts
- loc: Label-based selection using row and column names
- iloc: Integer position-based selection
- Boolean indexing: Filter using conditions
- at/iat: Fast scalar access for single values
- Query method: SQL-like query string syntax
- Multi-index selection: Handling hierarchical indexes
Python Implementation
import pandas as pd
import numpy as np
df = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"age": [25, 30, 35, 40],
"score": [85, 90, 78, 92],
"city": ["NYC", "LA", "NYC", "SF"]
}, index=["a", "b", "c", "d"])
# loc - label-based selection
row_a = df.loc["a"] # Single row by label
subset_loc = df.loc["a":"c"] # Slice by labels
cell_value = df.loc["a", "name"] # Single cell
# iloc - position-based selection
first_row = df.iloc[0] # First row by position
subset_iloc = df.iloc[0:2] # Slice by position
cell_pos = df.iloc[0, 1] # Single cell by position
# Boolean indexing
adults = df[df["age"] > 30] # Simple condition
high_scorers = df[(df["score"] > 80) & (df["age"] < 35)]
# isin for filtering
nyc_la = df[df["city"].isin(["NYC", "LA"])]
# Query method
result = df.query("age > 30 and score > 80")
# Using at/iat for fast scalar access
scalar_at = df.at["a", "name"] # Label-based
scalar_iat = df.iat[0, 0] # Position-based
# Select columns
names = df.loc[:, "name"] # All rows, name column
names_iloc = df.iloc[:, 0] # First column
# Filter with string contains
filtered = df[df["name"].str.contains("li")]
# Using where for conditional replacement
replaced = df.where(df > 80, "Fail")
When to Use
- Extracting specific rows or columns
- Filtering data based on conditions
- Selecting data for machine learning
- Performance-critical scalar access
- Working with multi-index DataFrames
- Building dynamic data pipelines
Key Takeaways
- Use loc for label-based selection and iloc for position-based
- Boolean indexing is powerful for filtering with complex conditions
- at and iat provide the fastest scalar access
- Query method offers readable SQL-like filtering syntax
- Chaining selections can be replaced with single operations for efficiency