Unsupervised Learning#
teaching: 20 exercises: 0 questions:
“What is Unsupervised Learning in Machine Learning model” objectives:
“Learn how to use K-mean clustering in ML model” keypoints:
“K-mean”
K-means clustering#
Explanation of K-means clustering method:#
Given a set of data, we choose K=2 clusters to be split into:
First select 2 random centroids (denoted as red and blue X)
For both centroids, compute the distance to all the points and compare with each other. 2 groups are created with shorter distance to 2 centroids
Now recompute the new centroids of the 2 groups (using mean value of all points in the same groups):
Compute the distance between 2 new centroids and all the points. We have 2 new groups:
Repeat the last 2 steps until no more new centroids created. The model reach equilibrium:
Example with K=3#
Implementation#
Run the following in console
install.packages("factoextra")
Run the following in R code
library(ggplot2)
library(factoextra)
library(purrr)
data(iris)
ggplot(iris,aes(x=Sepal.Length,y=Petal.Width))+
geom_point(aes(color=Species))
set.seed(123)
km <- kmeans(iris[,3:4],3,nstart=20)
table(km$cluster,iris$Species)
fviz_cluster(km,data=iris[,3:4])
How to find optimal K values:#
Elbow approach#
Similar to KNN method for supervised learning, for K-means approach, we are able to use Elbow approach to find the optimal K values.
The Elbow approach ues the Within-Cluster Sum of Square (WSS) to measure the compactness of the clusters:
The optimal K-values can be found from the Elbow using method=”wss”:
fviz_nbclust(iris[,3:4], kmeans, method = "wss")