dplyr Filter

Data ManipulationdplyrFree Lesson

Advertisement

Introduction

The filter() function in dplyr is used to subset rows based on conditions. It's essential for data analysis.

Basic Filtering

library(dplyr)

df <- tibble(
  name = c("Alice", "Bob", "Charlie", "David"),
  age = c(25, 30, 35, 40),
  score = c(85, 90, 78, 92)
)

# Single condition
filter(df, age > 30)

# Multiple conditions (AND)
filter(df, age > 25 & score > 80)

# Multiple conditions (OR)
filter(df, age < 30 | age > 35)

Comparison Operators

df <- tibble(x = 1:10)

filter(df, x == 5)       # Equal
filter(df, x != 5)       # Not equal
filter(df, x > 5)        # Greater than
filter(df, x >= 5)       # Greater or equal
filter(df, x < 5)        # Less than
filter(df, x <= 5)       # Less or equal
filter(df, x %in% c(1, 2, 3))  # In

String Filtering

df <- tibble(name = c("Alice", "Bob", "Charlie"))

# String matching
filter(df, str_starts(name, "A"))
filter(df, str_detect(name, "li"))
filter(df, name %in% c("Alice", "Bob"))

NA Handling

df <- tibble(
  x = c(1, 2, NA, 4, NA)
)

# Filter out NA
filter(df, !is.na(x))

# Filter for NA
filter(df, is.na(x))

Summary

filter() is essential for row selection. Combine conditions for complex filtering logic.

Advertisement

Need Expert R Programming Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement