Introduction
K-means clustering partitions data into k clusters based on similarity. It's an unsupervised learning algorithm.
Implementing K-Means
# Scale data
df_scaled <- scale(df)
# Fit k-means
set.seed(42)
kmeans_model <- kmeans(df_scaled, centers = 3)
# Clusters
kmeans_model$cluster
# Cluster centers
kmeans_model$centers
# Within-cluster sum of squares
kmeans_model$withinss
Finding Optimal K
# Elbow method
wss <- sapply(1:10, function(k) {
kmeans(df_scaled, k)$withinss
})
plot(1:10, wss, type = "b")
# Silhouette method
library(factoextra)
fviz_nbclust(df_scaled, kmeans, method = "silhouette")
Visualization
library(factoextra)
fviz_cluster(kmeans_model, data = df_scaled)
Summary
K-means groups similar observations. Use elbow or silhouette to find optimal k.