R Vectors — The Fundamental Data Structure
Learning Objectives
By the end of this tutorial, you will be able to:
- Create vectors using
c(),:,seq(), andrep() - Index vectors by position, name, and logical conditions
- Apply vectorized operations and understand recycling rules
- Work with named vectors as lightweight dictionaries
- Master vector sorting, filtering, and aggregation
- Understand vectorization vs. loops in R
What Is a Vector?
A vector is R's most fundamental data structure — an ordered collection of elements of the same type. R is vectorized: operations work on entire vectors at once, without explicit loops.
# A simple vector
x <- c(1, 2, 3, 4, 5)
print(x)
# [1] 1 2 3 4 5
# Check structure
str(x)
# num [1:5] 1 2 3 4 5
length(x) # [1] 5
class(x) # [1] "numeric"
Creating Vectors
Using c() — Combine
# Numeric
numbers <- c(1, 2, 3, 4, 5)
# Character
fruits <- c("apple", "banana", "cherry")
# Logical
flags <- c(TRUE, FALSE, TRUE, TRUE)
# Mixed types — R coerces to most flexible type
mixed <- c(1, "two", TRUE)
# [1] "1" "2" "TRUE" (all become character)
Using : — Sequences
1:10
# [1] 1 2 3 4 5 6 7 8 9 10
10:1
# [1] 10 9 8 7 6 5 4 3 2 1
# Decimal steps
seq(0, 1, by = 0.25)
# [1] 0.00 0.25 0.50 0.75 1.00
Using seq() — Custom Sequences
# seq(from, to, by)
seq(1, 10, by = 2)
# [1] 1 3 5 7 9
# seq(from, to, length.out)
seq(0, 1, length.out = 5)
# [1] 0.00 0.25 0.50 0.75 1.00
# seq(from, to, along.with)
x <- c(10, 20, 30, 40, 50)
seq(0, 1, along.with = x)
# [1] 0.00 0.25 0.50 0.75 1.00
Using rep() — Repetition
# Repeat a single value
rep(0, 5)
# [1] 0 0 0 0 0
# Repeat a vector
rep(c(1, 2), 3)
# [1] 1 2 1 2 1 2
# Repeat each element
rep(c(1, 2), each = 3)
# [1] 1 1 1 2 2 2
# Repeat with different times
rep(c("a", "b"), times = c(3, 1))
# [1] "a" "a" "a" "b"
Special Vectors
# Empty vector
numeric(0) # numeric(0)
character(0) # character(0)
# Filled vectors
numeric(5) # [1] 0 0 0 0 0
character(3) # [1] "" "" ""
logical(4) # [1] FALSE FALSE FALSE FALSE
# Random vectors
rnorm(5) # 5 random normal values
runif(5) # 5 random uniform values
sample(1:10, 5) # 5 random samples from 1:10
# NULL — empty
NULL
length(NULL) # [1] 0
Indexing Vectors
Position Indexing
x <- c(10, 20, 30, 40, 50)
# Single element
x[1] # [1] 10
x[3] # [1] 30
x[length(x)] # [1] 50 (last element)
# Negative index = exclude
x[-1] # [1] 20 30 40 50
x[-3] # [1] 10 20 40 50
# Multiple elements
x[c(1, 3, 5)] # [1] 10 30 50
x[c(1, 1, 1)] # [1] 10 10 10
# Range
x[2:4] # [1] 20 30 40
Named Indexing
x <- c(a = 10, b = 20, c = 30, d = 40)
# By name
x["b"] # [1] 20
x[c("a", "c")] # [1] 10 30
# names() function
names(x) # [1] "a" "b" "c" "d"
names(x) <- c("first", "second", "third", "fourth")
x
# first second third fourth
# 10 20 30 40
Logical Indexing
x <- c(10, 20, 30, 40, 50)
# Logical vector — same length as x
x[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
# [1] 10 30 50
# From conditions
x > 25
# [1] FALSE FALSE TRUE TRUE TRUE
x[x > 25]
# [1] 30 40 50
# Combined conditions
x[x > 20 & x < 50]
# [1] 30 40
# which() — indices where TRUE
which(x > 25)
# [1] 3 4 5
# %in% — membership
x[x %in% c(20, 40)]
# [1] 20 40
Vectorized Operations
R operations work element-wise on vectors:
x <- c(1, 2, 3, 4, 5)
y <- c(10, 20, 30, 40, 50)
# Arithmetic
x + y # [1] 11 22 33 44 55
x * y # [1] 10 40 90 160 250
x^2 # [1] 1 4 9 16 25
# Comparison
x > 3 # [1] FALSE FALSE FALSE TRUE TRUE
x == y # [1] FALSE FALSE FALSE FALSE FALSE
# Logical
c(TRUE, FALSE, TRUE) & c(TRUE, TRUE, FALSE)
# [1] TRUE FALSE FALSE
# Math functions
sqrt(x) # [1] 1.000000 1.414214 1.732051 2.000000 2.236068
log(x) # [1] 0.000000 0.693147 1.098612 1.386294 1.609438
exp(x) # [1] 2.718282 7.389056 20.085537 54.598150 148.413159
abs(-5) # [1] 5
Recycling Rule
When vectors have different lengths, R repeats the shorter one:
# Scalar recycling
x <- c(1, 2, 3, 4, 5)
x + 10
# [1] 11 12 13 14 15
# Unequal length recycling
c(1, 2, 3) + c(10, 20)
# [1] 11 22 13 (10, 20, 10 recycled)
# Warning if not multiple
c(1, 2, 3, 4) + c(10, 20)
# Warning: longer object length is not a multiple of shorter object length
# [1] 11 22 13 24
Named Vectors
Named vectors work like dictionaries or hash maps:
# Create named vector
scores <- c(Alice = 95, Bob = 87, Charlie = 92)
scores
# Alice Bob Charlie
# 95 87 92
# Access by name
scores["Bob"] # [1] 87
scores[c("Alice", "Charlie")] # [1] 95 92
# Add names after creation
x <- c(10, 20, 30)
names(x) <- c("a", "b", "c")
# Set names dynamically
x <- 1:5
names(x) <- paste0("item", 1:5)
x
# item1 item2 item3 item4 item4
# 1 2 3 4 5
# Operations preserve names
scores * 2
# Alice Bob Charlie
# 190 174 184
# Convert to list for more flexibility
as.list(scores)
Useful Vector Functions
Summary Statistics
x <- c(10, 20, 30, 40, 50)
sum(x) # [1] 150
mean(x) # [1] 30
median(x) # [1] 30
min(x) # [1] 10
max(x) # [1] 50
range(x) # [1] 10 50
sd(x) # [1] 15.81139
var(x) # [1] 250
prod(x) # [1] 12000000
# Summary gives multiple stats
summary(x)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 10.00 20.00 30.00 30.00 40.00 50.00
# With NA
x <- c(10, NA, 30, NA, 50)
sum(x) # [1] NA
sum(x, na.rm = TRUE) # [1] 90
mean(x, na.rm = TRUE) # [1] 30
Sorting
x <- c(30, 10, 40, 20, 50)
# sort() — returns new vector
sort(x) # [1] 10 20 30 40 50
sort(x, decreasing = TRUE) # [1] 50 40 30 20 10
# order() — returns indices
order(x) # [1] 2 4 1 3 5
x[order(x)] # [1] 10 20 30 40 50
# names
scores <- c(Alice = 95, Bob = 87, Charlie = 92)
sort(scores) # Names preserved
# Bob Charlie Alice
# 87 92 95
Filtering
x <- c(10, 20, 30, 40, 50)
# which() — get indices
which(x > 25) # [1] 3 4 5
# Filter by condition
x[x > 25] # [1] 30 40 50
# Filter by indices
x[c(1, 3, 5)] # [1] 10 30 50
# Filter with %in%
x[x %in% c(20, 40)] # [1] 20 40
# Filter with multiple conditions
x[x > 20 & x < 50] # [1] 30 40
x[x < 20 | x > 40] # [1] 10 50
Set Operations
a <- c(1, 2, 3, 4, 5)
b <- c(4, 5, 6, 7, 8)
union(a, b) # [1] 1 2 3 4 5 6 7 8
intersect(a, b) # [1] 4 5
setdiff(a, b) # [1] 1 2 3
setdiff(b, a) # [1] 6 7 8
# Check equality
setequal(a, b) # [1] FALSE
setequal(c(1,2), c(2,1)) # [1] TRUE
Unique and Duplicates
x <- c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4)
unique(x) # [1] 1 2 3 4
duplicated(x) # [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE
x[!duplicated(x)] # [1] 1 2 3 4
table(x) # Frequency table
# x
# 1 2 3 4
# 1 2 3 4
duplicated(c("a", "b", "a")) # [1] FALSE FALSE TRUE
Vectorization vs Loops
Vectorized operations are faster and more idiomatic in R:
# Slow: loop approach
x <- 1:1000000
result_loop <- numeric(length(x))
for (i in seq_along(x)) {
result_loop[i] <- x[i]^2
}
# Fast: vectorized approach
result_vec <- x^2
# All.equal to verify
all.equal(result_loop, result_vec) # [1] TRUE
# Benchmark
system.time(for (i in 1:1000000) x[i]^2)
# user system elapsed
# ...
system.time(x^2)
# user system elapsed
# ... (much faster)
Practice Exercises
Exercise 1: Vector Calculator
Given x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100):
- Find all values greater than 50
- Calculate the sum of values between 30 and 70
- Count how many values are divisible by 3
- Find the two largest values
Solution
x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# 1. Values greater than 50
x[x > 50]
# [1] 60 70 80 90 100
# 2. Sum of values between 30 and 70
sum(x[x >= 30 & x <= 70])
# [1] 250
# 3. Count divisible by 3
sum(x %% 3 == 0)
# [1] 3
# 4. Two largest values
sort(x, decreasing = TRUE)[1:2]
# [1] 100 90
Exercise 2: Named Vector Operations
Create a named vector of student grades, then:
- Find the average grade
- Find the student with the highest grade
- Add a new student
- Sort grades from highest to lowest
Solution
grades <- c(Alice = 95, Bob = 87, Charlie = 92, Diana = 98, Eve = 85)
# 1. Average grade
mean(grades)
# [1] 91.4
# 2. Student with highest grade
names(grades[which.max(grades)])
# [1] "Diana"
# 3. Add a new student
grades["Frank"] <- 88
grades
# 4. Sort grades
sort(grades, decreasing = TRUE)
# Diana Alice Charlie Frank Bob Eve
# 98 95 92 88 87 85
Exercise 3: Frequency Counter
Write a function that returns the frequency of each unique value in a vector.
Solution
freq_counter <- function(x) {
result <- table(x)
result[order(result, decreasing = TRUE)]
}
# Test
data <- c("a", "b", "a", "c", "b", "a", "d", "b", "a")
freq_counter(data)
# a b c d
# 4 3 1 1
Key Takeaways
- Vectors are R's fundamental data structure — all data starts as vectors
- Use
c()to create vectors,:andseq()for sequences,rep()for repetition - Indexing starts at 1, not 0
- Logical indexing is powerful —
x[x > 5]filters by condition - Vectorization beats loops — operations work element-wise automatically
- Recycling — shorter vectors repeat to match longer ones
- Named vectors act as lightweight dictionaries
which()gives indices,which.max()andwhich.min()give extreme indices- Use
na.rm = TRUEto handle missing values in summaries
Next: Learn about R Lists — collections of different types.