Unsupervised Learning#

teaching: 20 exercises: 0 questions:

  • “What is Unsupervised Learning in Machine Learning model” objectives:

  • “Learn how to use K-mean clustering in ML model” keypoints:

  • “K-mean”

K-means clustering#

Explanation of K-means clustering method:#

  • Given a set of data, we choose K=2 clusters to be split into:

image

  • First select 2 random centroids (denoted as red and blue X)

image

  • For both centroids, compute the distance to all the points and compare with each other. 2 groups are created with shorter distance to 2 centroids

image

  • Now recompute the new centroids of the 2 groups (using mean value of all points in the same groups):

image

  • Compute the distance between 2 new centroids and all the points. We have 2 new groups:

image

  • Repeat the last 2 steps until no more new centroids created. The model reach equilibrium:

image

Example with K=3#

image

image

Implementation#

  • Run the following in console

install.packages("factoextra")
  • Run the following in R code

library(ggplot2)
library(factoextra)
library(purrr)
data(iris)
ggplot(iris,aes(x=Sepal.Length,y=Petal.Width))+
      geom_point(aes(color=Species))
set.seed(123)
km <- kmeans(iris[,3:4],3,nstart=20)

table(km$cluster,iris$Species)
fviz_cluster(km,data=iris[,3:4])

image

How to find optimal K values:#

Elbow approach#

  • Similar to KNN method for supervised learning, for K-means approach, we are able to use Elbow approach to find the optimal K values.

  • The Elbow approach ues the Within-Cluster Sum of Square (WSS) to measure the compactness of the clusters: image

The optimal K-values can be found from the Elbow using method=”wss”:

fviz_nbclust(iris[,3:4], kmeans, method = "wss")

image