Sale!

CS 6375 Assignment 2 Machine Learning solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (3 votes)

Inducing Decision Trees
In this homework you will implement and test the decision tree learning
algorithm (See Mitchell, Chapter 3). You can use either C/C++, Java or
Python to implement your algorithms. Your C/C++ implementations should
compile on Linux gcc/g++ compilers. Remember, you must write your
own code.
Building a Decision Tree model
The main idea in this homework is to implement the ID3 decision tree algorithm presented in class. You will use the training dataset to learn the
tree, validation dataset to validate and prune the tree, and finally the testing dataset to test the tree. For the training phase, you can load the entire
dataset in the root node and recursively find out the best attribute at each
internal node.
The terminating condition can be one of the following:
• No more attributes to split on. In this case, you should find the majority class, and label it as that class. For example, if you reach a leaf
node with 10 instances of class + and 2 instances of class -, it should
be labeled class +.
• The node is pure i.e. it has only one class. In this case, it is easy to
infer the class of instances.
1 CS 6375
Remember to use only the training set files to learn the tree. Also, be sure
to number the nodes of your tree as this will make it easier to identify which
nodes to prune. For example, you can label the nodes as shown in figure 1
Figure 1: Example of labeling decision tree nodes
Data provided
You will be using the following datasets for programming and testing:
• Dataset 1 – Uploaded on eLearning. It’s divided into 3 parts: training (for learning the model), validation (for pruning), test (for testing
performance).
• Dataset 2 – Uploaded on eLearning. It’s divided into 3 parts: training (for learning the model), validation (for pruning), test (for testing
performance).
Pruning your model
After you have constructed your model i.e. decision tree, it is time to see if
pruning would help your model. This means checking if there is any overfitting to the training data. You will do this by randomly deleting the subtree
below a node. Note that if you delete a node, you delete all the child nodes
of that node i.e. the sub-tree starting at that node, and not that node. In
that case, that node becomes a leaf node. The classification at that node
will be done by taking the majority vote e.g. if that node contains 10 instances of class 1 and 5 instances of class 2, then it will classify instances as
2
class 1.In this assignment, you will read in an argument that indicates how
many nodes to prune. You will then randomly select that many number of
nodes and remove the sub-tree below that node. For example, if the value of
the argument is 3, then you will randomly delete sub-trees below 3 random
nodes.
Outputting your model
Implement a function to print the decision tree to standard output. We will
use the following format.
wesley = 0 :
| honor = 0 :
| | barclay = 0 : 1
| | barclay = 1 : 0
| honor > 0 :
| | tea = 0 : 0
| | tea = 1 : 1
wesley = 1 : 0
According to this tree, if wesley = 0 and honor = 0 and barclay = 0, then
the class value of the corresponding instance should be 1. In other words, the
value appearing before a colon is an attribute value, and the value appearing
after a colon is a class value.
Testing your model
The final step would be to test your pruned model on the test dataset. This
means finding the accuracy on the test dataset. The actual class labels for
each of the instances in the test dataset are provided. You have to compute
the accuracy as follows:
Accuracy = Number of instances correctly classified
Total number of instances
Running your program
Your code should be runnable from the command line. It should accept the
following arguments (in-order)
3
1. Number of nodes to prune.
2. The path of the training set
3. The path of the validation set
4. The path of the test set
5. A Boolean (0 or 1) indicating whether to print the model (after pruning)
As an example,
10 1
Output of your program
It should output the accuracies on the test set for decision trees constructed.
What to Turn in
• Your code and a Readme file for compiling the code.
On the two datasets provided:
• Report the accuracy on the test set for decision trees constructed for 5
different values of “number of nodes to prune parameter”. One of the
values should be 0.