Sale!

Programming Assignment 1 COMP 550 Solution

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (5 votes)

Programming Assignment 1 COMP 550

Sentiment Analysis

You will train models that classify a sentence into either a positive or negative sentiment. These sentences
come from a movie review dataset constructed by the authors of this paper:

Bo Pang and Lillian Lee, Seeing stars:

Exploiting class relationships for sentiment categorization with respect
to rating scales, Proceedings of ACL 2005.

The goal of this assignment is to give you experience in using existing tools for machine learning and natural
language processing to solve a classification task. Before you attempt this assignment, you will need to
install Python 3 on the machine you plan to work on, as well as the following Python packages and their
dependencies:
ˆ NLTK: http://www.nltk.org/
ˆ NumPy: http://www.numpy.org/
ˆ scikit-learn: http://scikit-learn.org/stable/

Download the corpus of text available in the attached file. This corpus is a collection of movie review
sentences that are separated into positive and negative polarity. Your task is to train classifiers to distinguish
them.

Data storage and format

The raw text files are stored in rt-polarity.neg for the negative cases, and rt-polarity.pos for the positive
cases.

Research question

The main research question being asked by your experiments will be: what preprocessing decisions work well
for sentence-level sentiment classification?

Preprocessing and feature extraction
Your responsibility is to design and run the correct experiments in order to answer the research question
above. You must explore at least 3 preprocessing decisions that we have discussed in class. You may use
scikit-learn’s feature extraction module to help you, as well as any other tool from NLTK or NumPy. Reading
scikit-learn’s documentation will be of great help in your experimentation.

Setting up the experiments

Design and implement experiments to draw reasonable conclusions about the research question above. This
will require creating subsets of the dataset as we discussed in class. There are multiple correct ways to
set up your experiments (as well as many incorrect ways). Stick to the logistic regression model for this
assignment.

Report

Write a short report on your method and results, carefully document i) the problem setup, ii) your experimental procedure, iii) the range of parameter settings that you tried, iv) the results and conclusions, and
v) the limitations of your study. It should be no more than 1.5 pages long. Report on the performance in
terms of accuracy, and speculate on the successes and failures of the models.

Your assignment will be marked on i) how well it satisfies the requirements stated in this handout, ii) whether
your experiments adequately and correctly address the research question, iii) how well written your report is.
It will NOT be marked based on the performance that you achieve with your models on this dataset.

Submitting code

Submit your code in a file named “a1.py”.

What To Submit

Submit your report as a single pdf on myCourses called “a1-answers.pdf”. In addition, you should submit
one plaintext file with your source code called “a1.py”. All work should be submitted to myCourses under
the Assignment 1 folder.
Page 2