Sale!

Text Analytics Assignment 3 Classification solved

Original price was: $35.00.Current price is: $30.00.

Download Details:

  • Name: Assignment-3-Machine-Learning-and-Text-Classification-tmxlic.zip
  • Type: zip
  • Size: 787.00 B

Category:

Description

5/5 - (8 votes)

This assignment will give you hands-on experience in building text classification models, using the application of email spam filtering. The target variable represents whether an email is either spam (1) or non-spam (0). Follow the directions and answer following questions.

Question 1

Explore different ways to improve the classification performance (accuracy or expected cost). You can consider Do the following:

  1. Feature representation: Compare 3 feature representations; binary vs. frequency vs. tf-idf
  2. Classifier: compare 3 classifiers of your choice such as decision trees, neural nets, etc.
  3. OPTIONAL: Feature selection: different feature/attribute selection methods or parameters (extra credit)

Report the evaluation results of your model using split training and testing. Report the following:

  • Precision and Recall by Class
  • Confusion Matrix.

Question 2

Calculate the total cost and expected cost (per email) based on the confusion matrix you obtained in question. Assume the cost for each mis-classified email from Spam to Non-spam is 5, and from Non-spam to Spam is 100.

[Hint: be careful with the dimensions of the confusion matrix: which are the “actuals” and which are the “predictions”?]

Based on your observation, please analyze which combination of feature and classifier is the best.

 

Question 3 (Extra credit)

Run 10-fold cross-validation instead of split sample. Does your conclusion still hold? If the observation is different, could you analyze the cause?