Sale!

STAT 480 Homework 4 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (7 votes)

Exercises for All Students
1) Create a function computeMsgLLR2 which implements the following log of ratios of products of
probabilities formula for the log likelihood ratio statistic:
log�(∏ 𝑃𝑃(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝| 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠)) 𝑖𝑖𝑖𝑖 𝑚𝑚𝑚𝑚𝑚𝑚 /(∏ 𝑃𝑃(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝| ℎ𝑎𝑎𝑎𝑎)) 𝑖𝑖𝑖𝑖 𝑚𝑚𝑚𝑚𝑚𝑚 � +
log((∏𝑛𝑛𝑛𝑛𝑛𝑛 𝑃𝑃(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎| 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠) 𝑖𝑖𝑖𝑖 𝑚𝑚𝑚𝑚𝑚𝑚 )/(∏𝑛𝑛𝑛𝑛𝑛𝑛 𝑃𝑃(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎| ℎ𝑎𝑎𝑎𝑎) 𝑖𝑖𝑖𝑖 𝑚𝑚𝑚𝑚𝑚𝑚 ))
Compare the results from this definition with the results from the computeMsgLLR function used
in the text which used the sum of differences of log probabilities.
Specifically, compare accuracy for this formula compared to the one used in class (Hint: to estimate
relative accuracy you should look at (observed-expected)/expected, and treat the results from
computeMsgLLR2 as observed and the results from computeMsgLLR as expected) and note
any issues that arise with non-representable numbers (e.g. very large or very small intermediate
results that result in infinite, incorrect 0, or not a number results from your function).
2) Do exercise Q.13 from page 167 of Data Science in R: A Case Studies Approach to Computational
Reasoning and Problem Solving, by Deborah Nolan and Duncan Temple Lang. Within the exercise,
construct two functions: one that counts the number of yelling lines, and one that gives the
percentage.
3) Check that the hour feature in emailDF gives valid values for all of the email messages. Then
perform descriptive analysis to compare this feature for spam and ham, and comment on the
possibility of using this feature to classify email.
Additional Exercises for Graduate Students
4) Do exercise Q.14 from page 167 of Data Science in R: A Case Studies Approach to Computational
Reasoning and Problem Solving, by Deborah Nolan and Duncan Temple Lang.