Unsupervised Learning

Unsupervised Learning#

teaching: 20 exercises: 0 questions:

“What is Unsupervised Learning in Machine Learning model” objectives:
“Learn how to use K-mean clustering in ML model” keypoints:
“K-mean”

K-means clustering#

Explanation of K-means clustering method:#

Given a set of data, we choose K=2 clusters to be split into:

First select 2 random centroids (denoted as red and blue X)

For both centroids, compute the distance to all the points and compare with each other. 2 groups are created with shorter distance to 2 centroids

Now recompute the new centroids of the 2 groups (using mean value of all points in the same groups):

Compute the distance between 2 new centroids and all the points. We have 2 new groups:

Repeat the last 2 steps until no more new centroids created. The model reach equilibrium:

Example with K=3#

Implementation#

Run the following in console

install.packages("factoextra")

Run the following in R code

library(ggplot2)
library(factoextra)
library(purrr)
data(iris)
ggplot(iris,aes(x=Sepal.Length,y=Petal.Width))+
      geom_point(aes(color=Species))
set.seed(123)
km <- kmeans(iris[,3:4],3,nstart=20)

table(km$cluster,iris$Species)
fviz_cluster(km,data=iris[,3:4])

How to find optimal K values:#

Elbow approach#

Similar to KNN method for supervised learning, for K-means approach, we are able to use Elbow approach to find the optimal K values.
The Elbow approach ues the Within-Cluster Sum of Square (WSS) to measure the compactness of the clusters:

The optimal K-values can be found from the Elbow using method=”wss”:

fviz_nbclust(iris[,3:4], kmeans, method = "wss")

Unsupervised Learning

Contents

Unsupervised Learning#

K-means clustering#

Explanation of K-means clustering method:#

Example with K=3#

Implementation#

How to find optimal K values:#

Elbow approach#