Skip to article frontmatterSkip to article content

DBSCAN and OPTICS with R

City University of Hong Kong

This jupyter notebook demonstrates how to cluster the iris2D dataset using density-based methods. It uses the language R and can be run live using an R kernel.

Setup

The following load and create the iris2D data set:

data("iris") # load the iris data set
x <- as.matrix(iris[,1:2]) # load the input attributes: sepal width and length
plot(x)

DBSCAN and OPTICS are implemented in the following package:

library(dbscan) # for DBSCAN and OPTICS
help(package="dbscan") # More information about the package

DBSCAN

DBSCAN is implement by the function dbscan:

?dbscan

To apply DBSCAN to the iris data set with ε=0.3\varepsilon=0.3 and minPts=4\text{minPts} = 4:

db <- dbscan(x, eps = .3, minPts = 4)
db

To visualize the clustering solution, we can plot the points in different clusters with different colors:

pairs(x, col = db$cluster + 1L)

YOUR ANSWER HERE

For each data point, we can calculate the local outlier factor (LOF), which quantifies how much a point is locally an outlier using the reachability distance:

lof <- lof(x, minPts=5)
pairs(x, cex = lof) # ploting the points scaled relative to the LOF score.

OPTICS

OPTICS is implemented by the function optics:

?optics

To apply OPTICS with ε=1\varepsilon=1 and minPts=4\text{minPts} = 4:

opt <- optics(x, eps=1, minPts = 4)
plot(opt)
opt

We can identify the clusters with a threshold, say 0.3, on the reachability distance:

opt <- extractDBSCAN(opt, eps_cl = .3)
plot(opt)
# YOUR CODE HERE
fail()
plot(opt)
hullplot(x,opt)
opt