Sale!

EE 559 Homework 3 solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (8 votes)

1. Assune that in a c-class classification problem, we have k features X1, X2, . . . , Xk
that are independent conditioned on the class label and Xj
|ωi ∼ Gamma(pi
, λj ), i.e.
pXj |ωi
(xj
|ωi) = 1
Γ(pi)
λ
pi
j x
pi−1
j
e
−λjxj
, pi
, λj > 0. (30 pts)
(a) Determine the Bayes’ optimal classifier’s decision rule making the general assumption that the prior probability of the classes are different.
(b) When are the decision boundaries linear functions of x1, x2, . . . , xk?
(c) Assuming that p1 = 4, p2 = 2, c = 2, k = 4, λ1 = λ3 = 1, λ2 = λ4 = 2, and that
the prior probabilites of each class are equal, classify x = (0.1, 0.2, 0.3, 4).
(d) Assuming that p1 = 3.2, p2 = 8, c = 2, k = 1, λ1 = 1, and that the prior probabilites of each class are equal, find the decision boundary x = x

. Also, find the
probability of type-1 and type-2 errors.
(e) Assuming that p1 = p2 = 4, c = 2, k = 2, λ1 = 8, λ2 = 0.3, and P(ω1) =
1/4, P(ω2) = 3/4, find the decision boundary f(x1, x2) = 0.
2. Assume that in a c-class classification problem, there are k conditionally independent features and Xi
|ωj ∼ Lap(mij , λi), i.e. pXi|ωj
(xi
|ωj ) = λi
2
e
−λi|xi−mij |
, λi > 0, i ∈
{1, 2, . . . , k}, j ∈ {1, 2, . . . , c}. Assuming that the prior class probabilities are equal,
show that the minimum error rate classifier is also a minimum weighted Manhattan distance (or weighted L1-distance) classifier. When does the minimum error rate classifier
becomes the minimum Manhattan distance classifier? (15 pts)
3. The class-conditional density functions of a discrete random variable X for four pattern
classes are shown below: (20 pts)
x p(x|ω1) p(x|ω2) p(x|ω3) p(x|ω4)
1 1/3 1/2 1/6 2/5
2 1/3 1/4 1/3 2/5
3 1/3 1/4 1/2 1/5
The loss function λ(αi
|ωj ) is summarized in the following table, where action αi means
decide pattern class ωi
:
ω1 ω2 ω3 ω4
α1 0 2 3 4
α2 1 0 1 8
α3 3 2 0 2
α4 5 3 1 0
Assume P(ω1) = 1/10, P(ω2) = 1/5, P(ω3) = 1/2, P(ω4) = 1/5.
(a) Compute the conditional risk for each action as:
R(αi
|x) = P4
j=1 λ(αi
|ωj )p(ωj
|x)
1
Homework 3 EE 559,
(b) Compute the overall risk R as:
R =
P3
i=1 R(α(xi)|xi)p(xi)
where α(xi) is the decision rule minimizing the conditional risk for xi
.
4. The following data set was collected to classify people who evade taxes:
Tax ID Refund Marital Status Taxable Income Evade
1 Yes Single 122 K No
2 No Married 77 K No
3 No Married 106 K No
4 No Single 88 K Yes
5 Yes Divorced 210 K No
6 No Single 72 K No
7 Yes Married 117 K No
8 No Married 60 K No
9 No Divorced 90 K Yes
10 No Single 85 K Yes
Considering relevant features in the table (only one feature is not relevant), assume
that the features are conditionally independent. (25 pts)
(a) Estimate prior class probabilities.
(b) For continuous feature(s), assume conditional Gaussianity and estimate class conditional pdfs p(x|ωi). Use Maximum Likelihood Estimates.
(c) For each discrete feature X, assume that the number of instances in class ωi
for
which X = xj
is nji and the number of instances in class ωi
is ni
. Estimate the
probability mass pX|ωi
(xj
|ωi) = P(X = xj
|ωi) as nji/ni
for each discrete feature.
Is this a valid estimate of the pmf?
(d) There is an issue with using the estimate you calculated in 4c. Explain why the
laplace correction (nji+1)/(ni+l), where l is the number of levels X can assume,1
solves the problem with the estimate given in 4c. Is this a valid estimate of the
pmf?
(e) Estimate the minimum error rate decision rule for classifying tax evasion using
Laplace correction.
5. Programming Part: Breast Cancer Prognosis
The goal of this assignment is to determine the prognosis of breast cancer patients
using the features extrracted from digital images of Fine Needle Aspirates (FNA) of
a breast mass. You will work with the Wisconsin Prognostic Breast Cancer data set,
WPBC. There are 34 attributes in the data set: the first attribute is a patient ID, the
second is an outcome variable that shows whether the cancer recurred after two years
or not (N for Non-recurrent, R for Recurrent), the third variable is also an income
1For example, if X ∈ {apple, orange, pear, peach, blueberry}, then d = 5.
2
Homework 3 EE 559,
variable that shows the time to recurrence. The other 30 attributes are the features
that you will work with to build a diagnosis tool for breast cancer.
Ten real-valued features are calculated for each nucleus in the digital image of the FNA
of a breast mass.2 They are:
• radius (mean of distances from center to points on the perimeter)
• texture (standard deviation of gray-scale values)
• perimeter
• area
• smoothness (local variation in radius lengths)
• compactness (perimeter2 / area – 1.0)
• concavity (severity of concave portions of the contour)
• concave points (number of concave portions of the contour)
• symmetry
• fractal dimension “coastline approximation” – 1)
Ther mean, standard deviation, and the mean of three largest values for each image
has been computed, to represent each image using 3 × 10 features.
Additionally, the diameter of the excised tumor in centimeters and the number of
positive axillary lymph nodes are also given in the data set.
Important Note: Time to recurrence (third attribute) should not be used for classification, otherwise, you will be able to perfectly classify!
There are 198 instances in the data set, 151 of which are nonrecurrent, and 47 are
recurrent.
(a) Download the WPBC data from: https://archive.ics.uci.edu/ml/datasets/
Breast+Cancer+Wisconsin+(Diagnostic).
(b) Select the first 130 non-recurrent cases and the first 37 recurrent cases as your
training set. Add record #197 in the data set to your training set as well. (10
pts)
(c) There are four instances in your training set that are missing the lymph node
feature (denoted as ?). This is not a very severe issue, so replace the missing
features with the median of the lymph node feature in your training set. (5 pts)
(d) Binary Classification Using Na¨ıve Bayes’ Classifiers
i. Solve the problem using a Na¨ıve Bayes’ classifier. Use Gaussian class conditional distributions. Report the confusion matrix, ROC, precision, recall, F1
score, and AUC for both the train and test data sets. (10 pts)
2For more details see: https://www.researchgate.net/publication/2512520_Nuclear_Feature_
Extraction_For_Breast_Tumor_Diagnosis.
3
Homework 3 EE 559
ii. This data set is rather imbalanced. Balance your data set using SMOTE,
by downsampling the common class in the training set to 90 instances and
upsampling the uncommon class to 90 instances. Use k = 5 nearest neighbors
in SMOTE. Remember not to change the balance of the test set. Report the
confusion matrix, ROC, precision, recall, F1 score, and AUC for both the
train and test data sets. Does SMOTE help? (10 pts)
(e) (Extra practice, will not be graded) Solve the regression problem of estimating
time to recurrence (third attribute) using the next 32 attributes. You can use
KNN regression. To do it in a principled way, select 20% of data points each class
in your training set to choose the best k ∈ 1, 2, . . . , 20, and the rest 80% as the
new training set. Report your MSE on the test set using the k you found and the
whole training set (not only the new training set!). For simplicity, use Euclidean
Distance. Repeat this process when you apply SMOTE to your new training set
to only upsample the rare class and make the data completely balanced. Does
SMOTE help in reducing the MSE?
4 EE 559