R Vectors — The Fundamental Data Structure

R BasicsVectorsFree Lesson

Advertisement

R Vectors — The Fundamental Data Structure

Learning Objectives

By the end of this tutorial, you will be able to:

  • Create vectors using c(), :, seq(), and rep()
  • Index vectors by position, name, and logical conditions
  • Apply vectorized operations and understand recycling rules
  • Work with named vectors as lightweight dictionaries
  • Master vector sorting, filtering, and aggregation
  • Understand vectorization vs. loops in R

What Is a Vector?

A vector is R's most fundamental data structure — an ordered collection of elements of the same type. R is vectorized: operations work on entire vectors at once, without explicit loops.

# A simple vector
x <- c(1, 2, 3, 4, 5)
print(x)
# [1] 1 2 3 4 5

# Check structure
str(x)
#  num [1:5] 1 2 3 4 5

length(x)    # [1] 5
class(x)     # [1] "numeric"

Creating Vectors

Using c() — Combine

# Numeric
numbers <- c(1, 2, 3, 4, 5)

# Character
fruits <- c("apple", "banana", "cherry")

# Logical
flags <- c(TRUE, FALSE, TRUE, TRUE)

# Mixed types — R coerces to most flexible type
mixed <- c(1, "two", TRUE)
# [1] "1"    "2"    "TRUE"  (all become character)

Using : — Sequences

1:10
# [1]  1  2  3  4  5  6  7  8  9 10

10:1
# [1] 10  9  8  7  6  5  4  3  2  1

# Decimal steps
seq(0, 1, by = 0.25)
# [1] 0.00 0.25 0.50 0.75 1.00

Using seq() — Custom Sequences

# seq(from, to, by)
seq(1, 10, by = 2)
# [1] 1 3 5 7 9

# seq(from, to, length.out)
seq(0, 1, length.out = 5)
# [1] 0.00 0.25 0.50 0.75 1.00

# seq(from, to, along.with)
x <- c(10, 20, 30, 40, 50)
seq(0, 1, along.with = x)
# [1] 0.00 0.25 0.50 0.75 1.00

Using rep() — Repetition

# Repeat a single value
rep(0, 5)
# [1] 0 0 0 0 0

# Repeat a vector
rep(c(1, 2), 3)
# [1] 1 2 1 2 1 2

# Repeat each element
rep(c(1, 2), each = 3)
# [1] 1 1 1 2 2 2

# Repeat with different times
rep(c("a", "b"), times = c(3, 1))
# [1] "a" "a" "a" "b"

Special Vectors

# Empty vector
numeric(0)      # numeric(0)
character(0)    # character(0)

# Filled vectors
numeric(5)      # [1] 0 0 0 0 0
character(3)    # [1] "" "" ""
logical(4)      # [1] FALSE FALSE FALSE FALSE

# Random vectors
rnorm(5)        # 5 random normal values
runif(5)        # 5 random uniform values
sample(1:10, 5) # 5 random samples from 1:10

# NULL — empty
NULL
length(NULL)    # [1] 0

Indexing Vectors

Position Indexing

x <- c(10, 20, 30, 40, 50)

# Single element
x[1]        # [1] 10
x[3]        # [1] 30
x[length(x)] # [1] 50 (last element)

# Negative index = exclude
x[-1]       # [1] 20 30 40 50
x[-3]       # [1] 10 20 40 50

# Multiple elements
x[c(1, 3, 5)]   # [1] 10 30 50
x[c(1, 1, 1)]   # [1] 10 10 10

# Range
x[2:4]       # [1] 20 30 40

Named Indexing

x <- c(a = 10, b = 20, c = 30, d = 40)

# By name
x["b"]       # [1] 20
x[c("a", "c")]  # [1] 10 30

# names() function
names(x)     # [1] "a" "b" "c" "d"
names(x) <- c("first", "second", "third", "fourth")
x
# first second  third fourth
#    10     20     30     40

Logical Indexing

x <- c(10, 20, 30, 40, 50)

# Logical vector — same length as x
x[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
# [1] 10 30 50

# From conditions
x > 25
# [1] FALSE FALSE  TRUE  TRUE  TRUE

x[x > 25]
# [1] 30 40 50

# Combined conditions
x[x > 20 & x < 50]
# [1] 30 40

# which() — indices where TRUE
which(x > 25)
# [1] 3 4 5

# %in% — membership
x[x %in% c(20, 40)]
# [1] 20 40

Vectorized Operations

R operations work element-wise on vectors:

x <- c(1, 2, 3, 4, 5)
y <- c(10, 20, 30, 40, 50)

# Arithmetic
x + y        # [1] 11 22 33 44 55
x * y        # [1] 10 40 90 160 250
x^2          # [1]  1  4  9 16 25

# Comparison
x > 3        # [1] FALSE FALSE FALSE  TRUE  TRUE
x == y       # [1] FALSE FALSE FALSE FALSE FALSE

# Logical
c(TRUE, FALSE, TRUE) & c(TRUE, TRUE, FALSE)
# [1]  TRUE FALSE FALSE

# Math functions
sqrt(x)       # [1] 1.000000 1.414214 1.732051 2.000000 2.236068
log(x)        # [1] 0.000000 0.693147 1.098612 1.386294 1.609438
exp(x)        # [1]  2.718282  7.389056 20.085537 54.598150 148.413159
abs(-5)       # [1] 5

Recycling Rule

When vectors have different lengths, R repeats the shorter one:

# Scalar recycling
x <- c(1, 2, 3, 4, 5)
x + 10
# [1] 11 12 13 14 15

# Unequal length recycling
c(1, 2, 3) + c(10, 20)
# [1] 11 22 13  (10, 20, 10 recycled)

# Warning if not multiple
c(1, 2, 3, 4) + c(10, 20)
# Warning: longer object length is not a multiple of shorter object length
# [1] 11 22 13 24

Named Vectors

Named vectors work like dictionaries or hash maps:

# Create named vector
scores <- c(Alice = 95, Bob = 87, Charlie = 92)
scores
#  Alice    Bob Charlie
#     95     87      92

# Access by name
scores["Bob"]       # [1] 87
scores[c("Alice", "Charlie")]  # [1] 95 92

# Add names after creation
x <- c(10, 20, 30)
names(x) <- c("a", "b", "c")

# Set names dynamically
x <- 1:5
names(x) <- paste0("item", 1:5)
x
# item1 item2 item3 item4 item4
#     1     2     3     4     5

# Operations preserve names
scores * 2
#  Alice    Bob Charlie
#    190     174     184

# Convert to list for more flexibility
as.list(scores)

Useful Vector Functions

Summary Statistics

x <- c(10, 20, 30, 40, 50)

sum(x)       # [1] 150
mean(x)      # [1] 30
median(x)    # [1] 30
min(x)       # [1] 10
max(x)       # [1] 50
range(x)     # [1] 10 50
sd(x)        # [1] 15.81139
var(x)       # [1] 250
prod(x)      # [1] 12000000

# Summary gives multiple stats
summary(x)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#   10.00   20.00   30.00   30.00   40.00   50.00

# With NA
x <- c(10, NA, 30, NA, 50)
sum(x)           # [1] NA
sum(x, na.rm = TRUE)  # [1] 90
mean(x, na.rm = TRUE) # [1] 30

Sorting

x <- c(30, 10, 40, 20, 50)

# sort() — returns new vector
sort(x)              # [1] 10 20 30 40 50
sort(x, decreasing = TRUE)  # [1] 50 40 30 20 10

# order() — returns indices
order(x)             # [1] 2 4 1 3 5
x[order(x)]          # [1] 10 20 30 40 50

# names
scores <- c(Alice = 95, Bob = 87, Charlie = 92)
sort(scores)         # Names preserved
#  Bob Charlie  Alice
#   87      92      95

Filtering

x <- c(10, 20, 30, 40, 50)

# which() — get indices
which(x > 25)        # [1] 3 4 5

# Filter by condition
x[x > 25]            # [1] 30 40 50

# Filter by indices
x[c(1, 3, 5)]        # [1] 10 30 50

# Filter with %in%
x[x %in% c(20, 40)]  # [1] 20 40

# Filter with multiple conditions
x[x > 20 & x < 50]   # [1] 30 40
x[x < 20 | x > 40]   # [1] 10 50

Set Operations

a <- c(1, 2, 3, 4, 5)
b <- c(4, 5, 6, 7, 8)

union(a, b)          # [1] 1 2 3 4 5 6 7 8
intersect(a, b)      # [1] 4 5
setdiff(a, b)        # [1] 1 2 3
setdiff(b, a)        # [1] 6 7 8

# Check equality
setequal(a, b)       # [1] FALSE
setequal(c(1,2), c(2,1))  # [1] TRUE

Unique and Duplicates

x <- c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4)

unique(x)            # [1] 1 2 3 4
duplicated(x)        # [1] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
x[!duplicated(x)]    # [1] 1 2 3 4

table(x)             # Frequency table
# x
# 1 2 3 4
# 1 2 3 4

duplicated(c("a", "b", "a"))  # [1] FALSE FALSE  TRUE

Vectorization vs Loops

Vectorized operations are faster and more idiomatic in R:

# Slow: loop approach
x <- 1:1000000
result_loop <- numeric(length(x))
for (i in seq_along(x)) {
  result_loop[i] <- x[i]^2
}

# Fast: vectorized approach
result_vec <- x^2

# All.equal to verify
all.equal(result_loop, result_vec)  # [1] TRUE

# Benchmark
system.time(for (i in 1:1000000) x[i]^2)
#    user  system elapsed
#   ...

system.time(x^2)
#    user  system elapsed
#   ...  (much faster)

Practice Exercises

Exercise 1: Vector Calculator

Given x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100):

  1. Find all values greater than 50
  2. Calculate the sum of values between 30 and 70
  3. Count how many values are divisible by 3
  4. Find the two largest values

Solution

x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)

# 1. Values greater than 50
x[x > 50]
# [1]  60  70  80  90 100

# 2. Sum of values between 30 and 70
sum(x[x >= 30 & x <= 70])
# [1] 250

# 3. Count divisible by 3
sum(x %% 3 == 0)
# [1] 3

# 4. Two largest values
sort(x, decreasing = TRUE)[1:2]
# [1] 100  90

Exercise 2: Named Vector Operations

Create a named vector of student grades, then:

  1. Find the average grade
  2. Find the student with the highest grade
  3. Add a new student
  4. Sort grades from highest to lowest

Solution

grades <- c(Alice = 95, Bob = 87, Charlie = 92, Diana = 98, Eve = 85)

# 1. Average grade
mean(grades)
# [1] 91.4

# 2. Student with highest grade
names(grades[which.max(grades)])
# [1] "Diana"

# 3. Add a new student
grades["Frank"] <- 88
grades

# 4. Sort grades
sort(grades, decreasing = TRUE)
# Diana  Alice Charlie  Frank     Bob     Eve
#    98     95      92      88      87      85

Exercise 3: Frequency Counter

Write a function that returns the frequency of each unique value in a vector.

Solution

freq_counter <- function(x) {
  result <- table(x)
  result[order(result, decreasing = TRUE)]
}

# Test
data <- c("a", "b", "a", "c", "b", "a", "d", "b", "a")
freq_counter(data)
# a b c d
# 4 3 1 1

Key Takeaways

  • Vectors are R's fundamental data structure — all data starts as vectors
  • Use c() to create vectors, : and seq() for sequences, rep() for repetition
  • Indexing starts at 1, not 0
  • Logical indexing is powerfulx[x > 5] filters by condition
  • Vectorization beats loops — operations work element-wise automatically
  • Recycling — shorter vectors repeat to match longer ones
  • Named vectors act as lightweight dictionaries
  • which() gives indices, which.max() and which.min() give extreme indices
  • Use na.rm = TRUE to handle missing values in summaries

Next: Learn about R Lists — collections of different types.

Advertisement

Need Expert R Programming Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement