Sale!

CISC 6210 Homework No. 2 Natural Language Processing solved

Original price was: $35.00.Current price is: $30.00.

Category:

Description

5/5 - (4 votes)

Tasks
1. Creating an N-Gram Model
You will build a simple n-gram language model that can be used to generate
random text resembling a source document.
In the NgramModel class, you should have the following functions:
a. __init__(self, n), which stores the order of the model and initializes any
necessary internal variables. n is the order of the ngram model.
b. update(self, text), which computes the n-grams for the input text and
updates the internal information. The input text is padded with ‘.’ as the
prefix.
c. get_vocab(self) and size_vocab(self) are about the vocab (this is the set
of all words used by this model).
d. prob(self, context, word), which accepts an (n-1)-length word string
representing a context, a word, and returns the probability of that word
occurring, given the preceding context.
Unseen ngram problem: If you encounter a novel context, and the word
exists in the vocabulary, the probability of any given word should be 1/the
size of the vocab. If the word is unknown, the probability of the unknown
word should be 1/(the size of vocab+1).
e. len_text(self), len_ngram(self), word_freq(self, word), and
ngram_freq(self, gram) are functions about n-gram models and internal
counts.
Unknown word problem: the probability of the unknown word should be
1/(size of vocab+1)
f.generate_text (self, context, min_length, max_length), which utilizes the
n-gram model to generate sentences in a way that the probability of each
n-gram is according to the n-gram’s frequency in the n-gram model.
If no valid starting context is given – length of the context is smaller than
n-1, the generator always randomly picks a n-gram unit, otherwise, the
generator will follow the last n-1 words in the given starting context to
select the following word. The generator takes in other parameters such as
the minimum length requirement, maximum length requirement. Then the
output sentence, whose length is between the minimum requirement and
maximum requirement, is returned.
g. perplexity(self, text) which calculate the perplexity of the given text
with n-gram model. The given text is padded with ‘.’ as prefix.
2. Read the given dataset from the link, create a 3-gram model for it.
Generate a sentence of length of 30 with the given context – ‘our business’.
3. For the given context ‘make you a sword for me’, compare the perplexity of
unigram, bigram and trigram.