Sale!

EE 559 Homework 4 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (4 votes)

1. The pdf of a Γ(2, 1) random variable is p(z) = z exp(−z), z > 0, and the pmf of a
Poisson random variable X is pX(x) = λ
x
e
−λ/x!, λ > 0, x = 0, 1, . . .. Assuming that
X1, X2, . . . , Xn is an i.i.d Poisson sample given that λ has a Γ(2, 1) prior distribution,
find the MAP estimate of λ and prove that what you find is actually a value that
maximizes the posterior. (10 pts)
2. Assume that you have an i.i.d sample from a population with Poisson pmf, i.e. pX(x) =
λ
x
e
−λ/x!, λ > 0, x = 0, 1, . . .. Calculate the MLE of λ and its asymptotic distribution
by calculating Fisher information and compare the results with those of the Central
Limit Theorem. (10 pts)
3. Assume that Y = β0 + β1X1 + · · · + βpXp + , where  ∼ N (0, σ2
). Show that the
MLE and least squares estimates of the β vector are the same, which means MLE is
also BLUE according to Gauss-Markov. Remember that the likelihood function here
is based on the conditional density p(Y |X1, . . . , Xp). (10 pts)
4. Find the MAP estimate of β under the assumption that Y = β0+β1X1+· · ·+βpXp+,
where  ∼ N (0, σ2
) and that the prior distribution of (independent) βi
, i = 1, 2, . . . , p
is N (0, σ2/λ). Interpret your results. (15 pts)
5. Find the MAP estimate of β under the assumption that Y = β0+β1X1+· · ·+βpXp+,
where  ∼ N (0, σ2
) and that the prior distribution of (independent) βi
, i = 1, 2, . . . , p
is Lap(0, σ2/λ). Interpret your results. (15 pts)
6. In the least squares problem, assume that the singular value decomposition of X is
UΣVT
.
(a) Show that the vector of predicted values is: (10 pts)
yb = XβbRidge =
X
p
j=1
uj
σ
2
j
σ
2
j + λ
u
T
j y
where uj are the columns of U. Conclude that greater amount of shrinkage is
applied to basis vectors uj that have smaller singular values σj
, for a fixed λ ≥ 0.
(b) Use SVD to show that (10 pts)
tr[X(XTX + λI)
−1XT
] = X
p
j=1
σ
2
j
σ
2
j + λ
This quantity is equal to the degrees of freedom p when λ = 0 and is called the
effective degrees of freedom for the Ridge-regularized model.
7. Time Series Classification Part 1: Feature Creation/Extraction
An interesting task in machine learning is classification of time series. In this problem,
we will classify the activities of humans based on time series obtained by a Wireless
Sensor Network.
1
Homework 4 EE 559,
(a) Download the AReM data from: https://archive.ics.uci.edu/ml/datasets/
Activity+Recognition+system+based+on+Multisensor+data+fusion+\%28AReM\
%29 . The dataset contains 7 folders that represent seven types of activities. In
each folder, there are multiple files each of which represents an instant of a human
performing an activity.1 Each file containis 6 time series collected from activities
of the same person, which are called avg rss12, var rss12, avg rss13, var rss13,
vg rss23, and ar rss23. There are 88 instances in the dataset, each of which contains 6 time series and each time series has 480 consecutive values.
(b) Keep datasets 1 and 2 in folders bending1 and bending 2, as well as datasets 1,
2, and 3 in other folders as test data and other datasets as train data.
(c) Feature Extraction
Classification of time series usually needs extracting features from them. In this
problem, we focus on time-domain features.
i. Research what types of time-domain features are usually used in time series
classification and list them (examples are minimum, maximum, mean, etc).
ii. Extract the time-domain features minimum, maximum, mean, median, standard deviation, first quartile, and third quartile for all of the 6 time series
in each instance. You are free to normalize/standardize features or use them
directly.2
(20 pts)
Your new dataset will look like this:
Instance min1 max1 mean1 median1 · · · 1st quart6 3rd quart6
1
2
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .
.
.
.
.
.
.
88
where, for example, 1st quart6, means the first quartile of the sixth time series
in each of the 88 instances.
iii. Estimate the standard deviation of each of the time-domain features you
extracted from the data. Then, use Python’s bootstrapped or any other
method to build a 90% bootsrap confidence interval for the standard deviation
of each feature. (10)
iv. Use your judgement to select the three most important time-domain features
(one option may be min, mean, and max).
v. Assume that you want to use the training set to classify bending from other
activities, i.e. you have a binary classification problem. Depict scatter plots
of the features you specified in 7(c)iv extracted from time series 1, 2, and 6 of
each instance, and use color to distinguish bending vs. other activities. (See
p. 129 of the ISLR textbook).3
(10 pts)
1Some of the data files need very minor cleaning. You can do it by Excel or Python.
2You are welcome to experiment to see if they make a difference.
3You are welcome to repeat this experiment with other features as well as with time series 3, 4, and 5 in
each instance.
2
Homework 4 EE 559,
8. Time Series Classification Part 2: Binary and Multiclass Classification
Important Note: You will NOT submit this part with Homework 4. It
will be the programming assignment of Homework 5. However, because it
uses the features you extracted from time series data in Homework 4, and
because some of you may want to start using your features to build models
earlier, you are provided with the instructions of the next programming
assignment. Thus, you may want to submit the code for Homework 4 with
Homework 5 again, since it might need the feature creation code. Also, since
this part involves building various models, you are strongly recommended
to start as early as you can.
(a) Binary Classification Using Logistic Regression4
i. Break each time series in your training set into two (approximately) equal
length time series. Now instead of 6 time series for each of the training
instances, you have 12 time series for each training instance. Repeat the
experiment in 7(c)v, i.e depict scatter plots of the features extracted from both
parts of the time series 1,2, and 12. Do you see any considerable difference
in the results with those of 7(c)v?
ii. Break each time series in your training set into l ∈ {1, 2, . . . , 20} time series
of approximately equal length and use logistic regression5
to solve the binary
classification problem, using time-domain features. Remember that breaking
each of the time series does not change the number of instances. It only
changes the number of features for each instance. Calculate the p-values for
your logistic regression parameters in each model corresponding to each value
of l and refit a logistic regression model using your pruned set of features.6
Alternatively, you can use backward selection using sklearn.feature selection
or glm in R. Use 5-fold cross-validation to determine the best value of the pair
(l, p), where p is the number of features used in recursive feature elimination.
Explain what the right way and the wrong way are to perform cross-validation
in this problem.7 Obviously, use the right way! Also, you may encounter the
problem of class imbalance, which may make some of your folds not having
any instances of the rare class. In such a case, you can use stratified cross
validation. Research what it means and use it if needed.
In the following, you can see an example of applying Python’s Recursive
Feature Elimination, which is a backward selection algorithm, to logistic regression.
4Some logistic regression packages have a built-in L2 regularization. To remove the effect of L2 regularization, set λ = 0 or set the budget C → ∞ (i.e. a very large value).
5
If you encountered instability of the logistic regression problem because of linearly separable classes,
modify the Max-Iter parameter in logistic regression to stop the algorithm immaturely and prevent from its
instability.
6R calculates the p-values for logistic regression automatically. One way of calculating them in Python
is to call R within Python. There are other ways to obtain the p-values as well.
7This is an interesting problem in which the number of features changes depending on the value of the
parameter l that is selected via cross validation. Another example of such a problem is Principal Component
Regression, where the number of principal components is selected via cross validation.
3
Homework 4 EE 559
# R e c u r si v e Fea tu re Elimi na tio n
from s k l e a r n import d a t a s e t s
from s k l e a r n . f e a t u r e s e l e c t i o n import RFE
from s k l e a r n . li n e a r m o d el import L o g i s t i c R e g r e s s i o n
# loa d the i r i s d a t a s e t s
d a t a s e t = d a t a s e t s . l o a d i r i s ( )
# c r e a t e a ba se c l a s s i f i e r used to e v al u a t e a s u b s e t of a t t r i b u t e s
model = L o g i s t i c R e g r e s s i o n ( )
# c r e a t e the RFE model and s e l e c t 3 a t t r i b u t e s
r f e = RFE( model , 3 )
r f e = r f e . f i t ( d a t a s e t . data , d a t a s e t . t a r g e t )
# summarize the s e l e c t i o n of the a t t r i b u t e s
p r i n t ( r f e . s u p po r t )
p r i n t ( r f e . r a n ki n g )
iii. Report the confusion matrix and show the ROC and AUC for your classifier
on train data. Report the parameters of your logistic regression βi
’s as well
as the p-values associated with them.
iv. Test the classifier on the test set. Remember to break the time series in
your test set into the same number of time series into which you broke your
training set. Remember that the classifier has to be tested using the features
extracted from the test set. Compare the accuracy on the test set with the
cross-validation accuracy you obtained previously.
v. Do your classes seem to be well-separated to cause instability in calculating
logistic regression parameters?
vi. From the confusion matrices you obtained, do you see imbalanced classes?
If yes, build a logistic regression model based on case-control sampling and
adjust its parameters. Report the confusion matrix, ROC, and AUC of the
model.
(b) Binary Classification Using L1-penalized logistic regression
i. Repeat 8(a)ii using L1-penalized logistic regression,8
i.e. instead of using pvalues for variable selection, use L1 regularization. Note that in this problem,
you have to cross-validate for both l, the number of time series into which you
break each of your instances, and λ, the weight of L1 penalty in your logistic
regression objective function (or C, the budget). Packages usually perform
cross-validation for λ automatically.9
ii. Compare the L1-penalized with variable selection using p-values. Which one
performs better? Which one is easier to implement?
(c) Multi-class Classification (The Realistic Case)
i. Find the best l in the same way as you found it in 8(b)i to build an L1-
penalized multinomial regression model to classify all activities in your train8For L1-penalized logistic regression, you may want to use normalized/standardized features
9Using the package Liblinear is strongly recommended.
4
Homework 4 EE 559,
ing set.10 Report your test error. Research how confusion matrices and ROC
curves are defined for multiclass classification and show them for this problem
if possible.11
ii. Repeat 8(c)i using a Na¨ıve Bayes’ classifier. Use both Gaussian and Multinomial pdfs and compare the results.
iii. Create p Principal Components from features extracted from features you
extracted from l time series. Cross validate on the (l, p) pair to build a Na¨ıve
Bayes’ classifier based on the PCA features to classify all activities in your
data set. Report your test error and plot the scatterplot of the classes in your
training data based on the first and second principal components you found
from features extracted from l time series, where l is the value you found
using cross-validation. Show confusion matrices and ROC curves.
iv. Which method is better for multi-class classification in this problem?
10New versions of scikit learn allow using L1-penalty for multinomial regression.
11For example, the pROC package in R does the job.
5 EE 559