## Description

1. Kernel regression. Kernel regression predicts a value d corresponding to value x as

ˆd(x) = PN

i=1 αiK(x, xi

) where the measured data is (d

i

, xi

), i = 1, 2, . . . N and K(u, v)

is the kernel function. We will assume Gaussian kernels, K(u, v) = exp (−(u − v)

2/(2σ

2

)).

Scripts are provided to help you explore properties of kernel regression with respect to

the kernel parameter σ and ridge regression parameter λ.

a) Run the regression script with σ = 0.04 and λ = 0.01. Figure 1 displays several

of the kernels K(x, xi

). What is the value x

i associated with the kernel having

the third peak from the left? What property of the kernel is determined by x

i

?

What property is determined by σ?

b) Run the regression script for the following choices of regularization and kernel

parameters:

i. λ = 0.01, σ = 0.04

ii. λ = 0.01, σ = 0.2

iii. λ = 0.01, σ = 1

iv. λ = 1, σ = 0.04

v. λ = 1, σ = 0.2

(Note that you need to rerun the entire script each time to ensure the random

number generator is reset and you obtain identical data.) You may choose additional cases if it helps you understand the nature of the solution. Discuss how λ

and σ affect the characteristics of the kernel regression to the measured data, and

support your conclusions with rationale and plots.

c) What principle could you apply to select appropriate values for λ and σ?

2. Kernel Classification. The kernel classification script performs classification using

the squared error loss using the Gaussian kernel K(u, v) = exp (−||u − v||2

2

/(2σ

2

)).

The code is set up to use N=500 training samples.

The code creates a contour plot of the predicted class, before thresholding (i.e, before

applying the sign function).

Run the code for the following values of the kernel parameter σ.

a) σ = 5

b) σ = 0.05

1 of 2

c) σ = 0.005

Use the results to discuss the impact of the kernel parameter σ. Is there a downside

to choosing a very small value for σ? Run additional values for σ if needed.

3. SVM. You use a kernel-based support vector machine for binary classification with

labels d

i = {+1, −1}. Given training features and labels (x

i

, di

), i = 1, 2, . . . , N you

use a kernal K(u, v) and design the classifier weights α as

αˆ = arg min

α

X

N

i=1

1 − d

iX

N

j=1

αjK(x

i

, x

j

)

!

+

+ λ

X

N

i=1

Xn

j=1

αiαjK(x

i

, x

j

)

a) Assume the optimization problem has been solved to obtain the weights α. Express the classification procedure for a measured feature x.

b) Suppose N = 1000 and αi = 0, i = 1, 2, . . . , 99, 102, 103, . . . , 1000. Identify the

support vectors and write the classification procedure in terms of the support

vectors.

2 of 2