R Vectors — The Fundamental Data Structure

Learning Objectives

By the end of this tutorial, you will be able to:

Create vectors using c(), :, seq(), and rep()
Index vectors by position, name, and logical conditions
Apply vectorized operations and understand recycling rules
Work with named vectors as lightweight dictionaries
Master vector sorting, filtering, and aggregation
Understand vectorization vs. loops in R

What Is a Vector?

A vector is R's most fundamental data structure — an ordered collection of elements of the same type. R is vectorized: operations work on entire vectors at once, without explicit loops.

# A simple vector
x <- c(1, 2, 3, 4, 5)
print(x)
# [1] 1 2 3 4 5

# Check structure
str(x)
#  num [1:5] 1 2 3 4 5

length(x)    # [1] 5
class(x)     # [1] "numeric"

Creating Vectors

Using `c()` — Combine

# Numeric
numbers <- c(1, 2, 3, 4, 5)

# Character
fruits <- c("apple", "banana", "cherry")

# Logical
flags <- c(TRUE, FALSE, TRUE, TRUE)

# Mixed types — R coerces to most flexible type
mixed <- c(1, "two", TRUE)
# [1] "1"    "2"    "TRUE"  (all become character)

Using `:` — Sequences

1:10
# [1]  1  2  3  4  5  6  7  8  9 10

10:1
# [1] 10  9  8  7  6  5  4  3  2  1

# Decimal steps
seq(0, 1, by = 0.25)
# [1] 0.00 0.25 0.50 0.75 1.00

Using `seq()` — Custom Sequences

# seq(from, to, by)
seq(1, 10, by = 2)
# [1] 1 3 5 7 9

# seq(from, to, length.out)
seq(0, 1, length.out = 5)
# [1] 0.00 0.25 0.50 0.75 1.00

# seq(from, to, along.with)
x <- c(10, 20, 30, 40, 50)
seq(0, 1, along.with = x)
# [1] 0.00 0.25 0.50 0.75 1.00

Using `rep()` — Repetition

# Repeat a single value
rep(0, 5)
# [1] 0 0 0 0 0

# Repeat a vector
rep(c(1, 2), 3)
# [1] 1 2 1 2 1 2

# Repeat each element
rep(c(1, 2), each = 3)
# [1] 1 1 1 2 2 2

# Repeat with different times
rep(c("a", "b"), times = c(3, 1))
# [1] "a" "a" "a" "b"

Special Vectors

# Empty vector
numeric(0)      # numeric(0)
character(0)    # character(0)

# Filled vectors
numeric(5)      # [1] 0 0 0 0 0
character(3)    # [1] "" "" ""
logical(4)      # [1] FALSE FALSE FALSE FALSE

# Random vectors
rnorm(5)        # 5 random normal values
runif(5)        # 5 random uniform values
sample(1:10, 5) # 5 random samples from 1:10

# NULL — empty
NULL
length(NULL)    # [1] 0

Indexing Vectors

Position Indexing

x <- c(10, 20, 30, 40, 50)

# Single element
x[1]        # [1] 10
x[3]        # [1] 30
x[length(x)] # [1] 50 (last element)

# Negative index = exclude
x[-1]       # [1] 20 30 40 50
x[-3]       # [1] 10 20 40 50

# Multiple elements
x[c(1, 3, 5)]   # [1] 10 30 50
x[c(1, 1, 1)]   # [1] 10 10 10

# Range
x[2:4]       # [1] 20 30 40

Named Indexing

x <- c(a = 10, b = 20, c = 30, d = 40)

# By name
x["b"]       # [1] 20
x[c("a", "c")]  # [1] 10 30

# names() function
names(x)     # [1] "a" "b" "c" "d"
names(x) <- c("first", "second", "third", "fourth")
x
# first second  third fourth
#    10     20     30     40

Logical Indexing

x <- c(10, 20, 30, 40, 50)

# Logical vector — same length as x
x[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
# [1] 10 30 50

# From conditions
x > 25
# [1] FALSE FALSE  TRUE  TRUE  TRUE

x[x > 25]
# [1] 30 40 50

# Combined conditions
x[x > 20 & x < 50]
# [1] 30 40

# which() — indices where TRUE
which(x > 25)
# [1] 3 4 5

# %in% — membership
x[x %in% c(20, 40)]
# [1] 20 40

Vectorized Operations

R operations work element-wise on vectors:

x <- c(1, 2, 3, 4, 5)
y <- c(10, 20, 30, 40, 50)

# Arithmetic
x + y        # [1] 11 22 33 44 55
x * y        # [1] 10 40 90 160 250
x^2          # [1]  1  4  9 16 25

# Comparison
x > 3        # [1] FALSE FALSE FALSE  TRUE  TRUE
x == y       # [1] FALSE FALSE FALSE FALSE FALSE

# Logical
c(TRUE, FALSE, TRUE) & c(TRUE, TRUE, FALSE)
# [1]  TRUE FALSE FALSE

# Math functions
sqrt(x)       # [1] 1.000000 1.414214 1.732051 2.000000 2.236068
log(x)        # [1] 0.000000 0.693147 1.098612 1.386294 1.609438
exp(x)        # [1]  2.718282  7.389056 20.085537 54.598150 148.413159
abs(-5)       # [1] 5

Recycling Rule

When vectors have different lengths, R repeats the shorter one:

# Scalar recycling
x <- c(1, 2, 3, 4, 5)
x + 10
# [1] 11 12 13 14 15

# Unequal length recycling
c(1, 2, 3) + c(10, 20)
# [1] 11 22 13  (10, 20, 10 recycled)

# Warning if not multiple
c(1, 2, 3, 4) + c(10, 20)
# Warning: longer object length is not a multiple of shorter object length
# [1] 11 22 13 24

Named Vectors

Named vectors work like dictionaries or hash maps:

# Create named vector
scores <- c(Alice = 95, Bob = 87, Charlie = 92)
scores
#  Alice    Bob Charlie
#     95     87      92

# Access by name
scores["Bob"]       # [1] 87
scores[c("Alice", "Charlie")]  # [1] 95 92

# Add names after creation
x <- c(10, 20, 30)
names(x) <- c("a", "b", "c")

# Set names dynamically
x <- 1:5
names(x) <- paste0("item", 1:5)
x
# item1 item2 item3 item4 item4
#     1     2     3     4     5

# Operations preserve names
scores * 2
#  Alice    Bob Charlie
#    190     174     184

# Convert to list for more flexibility
as.list(scores)

Useful Vector Functions

Summary Statistics

x <- c(10, 20, 30, 40, 50)

sum(x)       # [1] 150
mean(x)      # [1] 30
median(x)    # [1] 30
min(x)       # [1] 10
max(x)       # [1] 50
range(x)     # [1] 10 50
sd(x)        # [1] 15.81139
var(x)       # [1] 250
prod(x)      # [1] 12000000

# Summary gives multiple stats
summary(x)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#   10.00   20.00   30.00   30.00   40.00   50.00

# With NA
x <- c(10, NA, 30, NA, 50)
sum(x)           # [1] NA
sum(x, na.rm = TRUE)  # [1] 90
mean(x, na.rm = TRUE) # [1] 30

Sorting

x <- c(30, 10, 40, 20, 50)

# sort() — returns new vector
sort(x)              # [1] 10 20 30 40 50
sort(x, decreasing = TRUE)  # [1] 50 40 30 20 10

# order() — returns indices
order(x)             # [1] 2 4 1 3 5
x[order(x)]          # [1] 10 20 30 40 50

# names
scores <- c(Alice = 95, Bob = 87, Charlie = 92)
sort(scores)         # Names preserved
#  Bob Charlie  Alice
#   87      92      95

Filtering

x <- c(10, 20, 30, 40, 50)

# which() — get indices
which(x > 25)        # [1] 3 4 5

# Filter by condition
x[x > 25]            # [1] 30 40 50

# Filter by indices
x[c(1, 3, 5)]        # [1] 10 30 50

# Filter with %in%
x[x %in% c(20, 40)]  # [1] 20 40

# Filter with multiple conditions
x[x > 20 & x < 50]   # [1] 30 40
x[x < 20 | x > 40]   # [1] 10 50

Set Operations

a <- c(1, 2, 3, 4, 5)
b <- c(4, 5, 6, 7, 8)

union(a, b)          # [1] 1 2 3 4 5 6 7 8
intersect(a, b)      # [1] 4 5
setdiff(a, b)        # [1] 1 2 3
setdiff(b, a)        # [1] 6 7 8

# Check equality
setequal(a, b)       # [1] FALSE
setequal(c(1,2), c(2,1))  # [1] TRUE

Unique and Duplicates

x <- c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4)

unique(x)            # [1] 1 2 3 4
duplicated(x)        # [1] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
x[!duplicated(x)]    # [1] 1 2 3 4

table(x)             # Frequency table
# x
# 1 2 3 4
# 1 2 3 4

duplicated(c("a", "b", "a"))  # [1] FALSE FALSE  TRUE

Vectorization vs Loops

Vectorized operations are faster and more idiomatic in R:

# Slow: loop approach
x <- 1:1000000
result_loop <- numeric(length(x))
for (i in seq_along(x)) {
  result_loop[i] <- x[i]^2
}

# Fast: vectorized approach
result_vec <- x^2

# All.equal to verify
all.equal(result_loop, result_vec)  # [1] TRUE

# Benchmark
system.time(for (i in 1:1000000) x[i]^2)
#    user  system elapsed
#   ...

system.time(x^2)
#    user  system elapsed
#   ...  (much faster)

Practice Exercises

Exercise 1: Vector Calculator

Given x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100):

Find all values greater than 50
Calculate the sum of values between 30 and 70
Count how many values are divisible by 3
Find the two largest values

Solution

x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)

# 1. Values greater than 50
x[x > 50]
# [1]  60  70  80  90 100

# 2. Sum of values between 30 and 70
sum(x[x >= 30 & x <= 70])
# [1] 250

# 3. Count divisible by 3
sum(x %% 3 == 0)
# [1] 3

# 4. Two largest values
sort(x, decreasing = TRUE)[1:2]
# [1] 100  90

Exercise 2: Named Vector Operations

Create a named vector of student grades, then:

Find the average grade
Find the student with the highest grade
Add a new student
Sort grades from highest to lowest

Solution

grades <- c(Alice = 95, Bob = 87, Charlie = 92, Diana = 98, Eve = 85)

# 1. Average grade
mean(grades)
# [1] 91.4

# 2. Student with highest grade
names(grades[which.max(grades)])
# [1] "Diana"

# 3. Add a new student
grades["Frank"] <- 88
grades

# 4. Sort grades
sort(grades, decreasing = TRUE)
# Diana  Alice Charlie  Frank     Bob     Eve
#    98     95      92      88      87      85

Exercise 3: Frequency Counter

Write a function that returns the frequency of each unique value in a vector.

Solution

freq_counter <- function(x) {
  result <- table(x)
  result[order(result, decreasing = TRUE)]
}

# Test
data <- c("a", "b", "a", "c", "b", "a", "d", "b", "a")
freq_counter(data)
# a b c d
# 4 3 1 1

Key Takeaways

Vectors are R's fundamental data structure — all data starts as vectors
Use c() to create vectors, : and seq() for sequences, rep() for repetition
Indexing starts at 1, not 0
Logical indexing is powerful — x[x > 5] filters by condition
Vectorization beats loops — operations work element-wise automatically
Recycling — shorter vectors repeat to match longer ones
Named vectors act as lightweight dictionaries
which() gives indices, which.max() and which.min() give extreme indices
Use na.rm = TRUE to handle missing values in summaries

Next: Learn about R Lists — collections of different types.

R Vectors — The Fundamental Data Structure

R Vectors — The Fundamental Data Structure

Learning Objectives

What Is a Vector?

Creating Vectors

Using c() — Combine

Using : — Sequences

Using seq() — Custom Sequences

Using rep() — Repetition

Special Vectors

Indexing Vectors

Position Indexing

Named Indexing

Logical Indexing

Vectorized Operations

Recycling Rule

Named Vectors

Useful Vector Functions

Summary Statistics

Sorting

Filtering

Set Operations

Unique and Duplicates

Vectorization vs Loops

Practice Exercises

Exercise 1: Vector Calculator

Exercise 2: Named Vector Operations

Exercise 3: Frequency Counter

Key Takeaways

Need Expert R Programming Help?

Using `c()` — Combine

Using `:` — Sequences

Using `seq()` — Custom Sequences

Using `rep()` — Repetition