Sale!

CS 458 Project 4 Solved

Original price was: $40.00.Current price is: $35.00. $29.75

Category:

Description

5/5 - (1 vote)

P4-1. Hierarchical Clustering Dendrogram

(a) Randomly generate the following data points:
import numpy as np
np.random.seed(0)
X1 = np.random.randn(50,2)+[2,2]
X2 = np.random.randn(50,2)+[6,10]
X3 = np.random.randn(50,2)+[10,2]
X = np.concatenate((X1,X2,X3))

(b) Use sklearn.cluster.AgglomerativeClustering to cluster the points generated in (a). Plot
your Dendrogram using different linkage{“ward”, “complete”, “average”, “single”}.
Instructions: Set distance_threshold=0, n_clusters=None in AgglomerativeClustering. The
default metric used to compute the linkage is ‘euclidean’, so you do not need to change this
parameter.

CS 458 Project 4

P4-2. Clustering structured dataset

(a) Generate a swiss roll dataset:
from sklearn.datasets import make_swiss_roll
# Generate data (swiss roll dataset)
n_samples = 1500
noise = 0.05
X, _ = make_swiss_roll(n_samples, noise=noise)
# Make it thinner
X[:, 1] *= .5

(b) Use sklearn.cluster.AgglomerativeClustering to cluster the points generated in (a), where
you set the parameters as n_clusters=6, connectivity=connectivity, linkage=’ward’, where
from sklearn.neighbors import kneighbors_graph
connectivity = kneighbors_graph(X, n_neighbors=10, include_self=False)

Plot the clustered data in a 3D figure and use different colors for different clusters in your figure.
(c) Use sklearn.cluster.DBSCAN to cluster the points generated in (a). Plot the clustered data in
a 3D figure and use different colors different clusters in your figure. Discuss and compare the
results of DBSCAN with the results in (b).

P4-3. Clustering the handwritten digits data

Use the hand-written digits dataset embedded in scikit-learn:
from sklearn import datasets
digits = datasets.load_digits()
(a) Use the following methods to cluster the data:
 K-Means (sklearn.cluster.KMeans)
 DBSCAN (sklearn.cluster.DBSCAN)

Optimize the parameters of these methods.
(b) Evaluate these methods based on the labels of the data and discuss which method gives you
the best results in terms of accuracy.

CS 458 Project 4