Sale!

CS 458 Project 1 Solved

Original price was: $40.00.Current price is: $35.00.

Category:

Description

5/5 - (1 vote)

P1-1. Curse of Dimensionality.

Reproduce a figure similar to the figure in slide 37 in Chapter 2, i.e.,
(a) Generate 1000 points following a uniform distribution under a given dimension, and then
compute difference between max and min distance between any pair of points. Hint: Refer to the
tutorial “Introduction to Numpy and Pandas” on how to generate random points.

(b) Repeat (a) for different dimensions from 2 to 50.
Plot log10
max−min
min
under different number of dimensions.

P1-2. The Iris Dataset (https://en.wikipedia.org/wiki/Iris_flower_data_set)

The Iris dataset is embedded in scikit-learn. You can install scikit-learn by following the
instructions (https://scikit-learn.org/stable/install.html). Then you can load the Iris dataset using
the following codes:
from sklearn import datasets
iris = datasets.load_iris()
The Iris dataset consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal
and sepal length, stored in a 150×4 numpy.ndarray.

CS 458 Project 1

Tasks:

a) Data Visualization. Duplicate the following figure using scatter plot.
b) Find the best discretization for the petal length and the petal width that can best separate the
Iris data and plot a figure similar to the figure in slide 54 in Chapter 2. For each flower type, list
in a table how many data samples are correctly separated and how many are not correctly
separated.

P1-3. Principal Component Analysis for The Iris Dataset

You can use PCA embedded in scikit-learn by the following code:
from sklearn.decomposition import PCA

Tasks:

a) Use the Iris dataset and plot all the samples in a figure using Sepal Length and Sepal Width,
i.e., xlabel(‘Sepal length’) and ylabel(‘Sepal width’).
b) The Iris dataset has 4 attributes (sepal length, sepal width, petal length, and petal width). Use
PCA to reduce the dimension of the dataset from 4 to 2.

Plot all the samples after the
dimensionality reduction in a 2D figure. Compare this figure with the figure in (a) and discuss
whether you can better separate the data samples after the dimensionality reduction.

CS 458 Project 1