Sale!

Assignment 1 CS771 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (9 votes)

Q1. For this problem, we will be working with the automobile dataset from the UCI repository. Using
this dataset,
(a) train a k-nearest neighbors regression model, and report its validation set performance using
root mean squared error. (15 points)
(b) find an optimal k for this model using cross-validation (10 points)
(c) Introduce L0 regularization into this setup and retrain the model (5 points) and,
(d) check whether L0 regularization improves generalization and which are the most important
features identified by the model for predicting prices. Comment on your findings drawing upon realworld intuitions about car prices. (10 points)
Note: You don’t have to use all the features in the dataset, if you don’t want to. Also, the choice of
the distance function is entirely up to you.
Q2. For this problem, we will be working with the census income dataset from the UCI repository.
Using this dataset,
(a) train a decision tree classification model using information gain as the splitting criterion and
using only single feature decision stumps at all non-leaf nodes and majority votes at leaf nodes, and
report its validation set performance using % accuracy (15 points)
(b) use cross-validation to optimize the tree hyperparameters (10 points)
(c) Improve on the best test set performance this classifier has to offer with a better version that
uses more complex splitting criteria than single-feature decision stumps (10 points)