CS 6320 Project 1 Recognizing Textual Entailment solution

\$30.00

Category:

Description

Natural Language Processing
For Project-1, you will implement a deep learning model that recognizes the textual entailment relation between two sentences.
Here, we are given two sentences: the premise, denoted by the letter t and the hypothesis, denoted by the letter h. We say
that the premise entails the hypothesis i.e. t → h if the meaning of h can be inferred from the meaning of t [1]. To motivate
the problem, consider some examples given below:
(1)
t: Eating lots of foods that are a good source of fiber may keep your blood glucose from rising too fast after you eat.
h: Fiber improves blood sugar control.
t → h as the meaning of h can be inferred from t.
(2)
t: Scientists at the Genome Institute of Singapore (GIS) have discovered the complete genetic sequence of a coronavirus
isolated from a Singapore patient with SARS.
h: Singapore scientists reveal that SARS virus has undergone genetic changes
t →/ h as the meaning of h cannot be inferred from t.
The task of textual entailment is set up as a binary classification problem where, given the premise and the hypothesis, the
goal is to classify the relation between them as Entails or Not Entails.
For conducting your experiments, you will use the RTE-1 dataset [2] that is provided as an addendum to this homework.
The dataset contains 2 XML files: a train file and test file. The entailment relations are contained within pair tags in both
files; examples of which are provided below:
As you can observe, each pair tag contains the premise t and the hypothesis h contained within t and h tags respectively.
The value attribute of the pair tag is a boolean indicating whether t → h or not.
1 CS 6320
Model Architecture
The architecture of our model for textual entailment is fairly simple. It contains the following layers:
1. Embedding layer: This layer transforms the integer-encoded representations of the sentences into dense vectors.
2. Recurrent layer: This is a stacked bi-directional LSTM layer that takes in the vector representation from the Embedding
layer and outputs another vector.
3. Fully connected layer: This layer transforms the output of the RNN into a vector of 2 dimensions. (one corresponding
to each label i.e. Entails and Not Entails)
A schematic showing the architecture of the model is provided below:
We define the forward pass of our network as follows:
Let t = {t1, t2, …tn} denote the premise and h = {h1, h2, …hn} denote the hypothesis. We first obtain the dense vector
representations for both sentences by passing them through the same embedding layer. Let et = {et1
, et2
, …etn
} denote the
vector representations for the premise and eh = {eh1
, eh2
, …ehn
} denote the vector representations for the hypothesis where
each vector eti and ehi
is of dimension d1. Next, we pass the vector representations through the same LSTM to obtain
temporal sequences rt and rh respectively, each vector having dimension d2. The vectors rt and rh are concatenated together
to obtain the vector rth Finally, this concatenated representation rth is passed through the fully connected layer to get vector
fth, with dimension d3 = 2 (this is because you have 2 labels as discussed previously).
Implementation and Execution Framework
You are free to use any API like PyTorch, TensorFlow, DyNet, Caffe, etc. (with Python) for implementing this model. We
recommend you use either TensorFlow or PyTorch for implementing your model as there is lot of help available online for
writing code in these frameworks.
For executing your code, you will use Google Colab, a virtual environment provided by Google that allows you to edit and
2 CS 6320
Here, we outline the tasks to be performed for this project.
Task – 1: Prepare dataset (15 points)
Write a method that takes in the path to (train or test) xml file as input and outputs three lists, one containing the lists
of tokens for the premise, one containing the lists of tokens for the hypothesis, and one containing the label. For example,
consider the given input and expected output:
Input file:

O r acle had f o u g h t t o keep the forms from bein g r e l e a s e d</ t>
O r acle r e l e a s e d a c o n f i d e n t i a l document
</ p ai r>

iTunes s o f t w a r e has s e e n s t r o n g s a l e s i n Europe</ t>
Poor s a l e s f o r iTunes i n Europe
</ p ai r>
Output:
# p r emi s e s l i s t o f l i s t s
p = [ [ ’ O r acle ’ , ’ had ’ , ’ f o u g h t ’ , ’ t o ’ , ’ keep ’ , ’ the ’ , ’ forms ’ , ’ from ’ , ’ bein g ’ , ’ r e l e a s e d ’ ] , [ ’
i t u n e s ’ , ’ s o f t w a r e ’ , ’ has ’ , ’ s e e n ’ , ’ s t r o n g ’ , ’ s a l e s ’ , ’ i n ’ , ’ eu r ope ’ ] ]
# h y p o t h e s e s l i s t o f l i s t s
h = [ [ ’ o r a c l e ’ , ’ r e l e a s e d ’ , ’ a ’ , ’ c o n f i d e n t i a l ’ , ’ document ’ ] , [ ’ poor ’ , ’ s a l e s ’ , ’ f o r ’ , ’ i t u n e s ’ ,
’ i n ’ , ’ eu r ope ’ ] ]
# l i s t o f l a b e l s / c l a s s e s
h = [ ’TRUE’ , ’FALSE ’ ]
Next, write a function to integer-encode the premises and hypotheses. In other words, replace each word in both lists by a
unique integer (you may want to save this as dictionary to be used later). Likewise, integer-encode the labels also. For the
same example given previously, the output will be:
# p r emi s e s l i s t o f l i s t s
p = [ [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 ] , [ 1 1 , 1 2 , 1 3 , 1 4 , 1 5 , 1 6 , 1 7 , 1 8 ] ]
# h y p o t h e s e s l i s t o f l i s t s
h = [ [ 1 , 1 0 , 1 9 , 2 0 , 2 1 ] , [ 2 2 , 1 6 , 2 3 , 1 1 , 1 7 , 1 8 ] ]
# l i s t o f l a b e l s / c l a s s e s
h = [ 1 , 2 ]
Neural networks work only when all inputs are of uniform length. Pad zeros at the end of premise and hypothesis lists so
that they are of uniform length. For example, if the max allowed length is set to 10, the lists will change to:
# p r emi s e s l i s t o f l i s t s
p = [ [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 ] , [ 1 1 , 1 2 , 1 3 , 1 4 , 1 5 , 1 6 , 1 7 , 1 8 , 0 , 0 ] ]
# h y p o t h e s e s l i s t o f l i s t s
h = [ [ 1 , 1 0 , 1 9 , 2 0 , 2 1 , 0 , 0 , 0 , 0 , 0 ] , [ 2 2 , 1 6 , 2 3 , 1 1 , 1 7 , 1 8 , 0 , 0 , 0 , 0 ] ]
3 CS 6320
Task – 2: Preparing the inputs for training/testing (10 points)
Look into how to create batches for training and testing your model. For example, if you are using PyTorch, you can look into
TensorDatasets and DataLoaders for effective training and testing. Note that while training, you will use the RandomSampler
and for testing, you will use the SequentialSampler class.
Task – 3: Define the model (20 points)
Create the model, following the architectural specifications provided in the previous section. Be careful when defining the
parameters of each layer in the model. You may want to use the token-integer mapping dictionary saved previously to define
the size of the embedding layer.
Task – 4: Train and Test the model (25 points)
Look into how the model can be trained and tested. Define a suitable loss and optimizer function. Define suitable values for
different hyper-parameters such as learning rate, number of epochs and batch size. To test the model, you may use scikitlearn’s classification report to get the precision, recall, f-score and accuracy values. Additionally, also report the throughput
of your model (in seconds) at the time of inference.
Task – 5: Prepare a report (10 points)
Prepare a report summarizing your results. Specifically, observe the effect of hyper-parameters such as number of LSTM
layers considered, embedding dimension, hidden dimension of the LSTM layers, etc. on model performance and throughput.
To get a better understanding of how results are analyzed/summarized, consider the reference paper provided on the webpage.
To Submit
Submit the following:
1. Your source code (either as a Python notebook or regular Python file)
2. Instructions on how to run the code
3. Report