## Description

1. A script is available to train two neurons using stochastic gradient descent to solve

two different classification problems. The two classifier structures are shown below.

Here we use a logistic activation function σ(z) = (1 + e

−z

)

−1

. The code generates

training data and labels corresponding to two decision boundaries: x

i

2 = −2x

i

1 + 0.2,

and x

i

2 = 5(x

i

1

)

3

.

a) Do you expect that a single neuron will be able to accurately classify data from

case 1? Why or why not? Explain the impact of the bias term associated with

w1,0.

b) Do you expect that a single neuron will be able to accurately classify data from

case 2? Why or why not? Explain the impact of the bias term associated with

w2,0.

c) Run SGD for one epoch. This means you cycle through all the training data one

time, in random order. Repeat this five times and find the average number of

errors in cases 1 and 2.

d) Run SGD over twenty epochs. This means you cycle through all the training

data twenty times, in random order. Repeat this five times and find the average

number of errors in cases 1 and 2.

e) Explain the differences in classification performance for the two cases that result

with both one and twenty epochs.

2. This remainder of this activity uses a three-layer neural network with three input nodes

and two output nodes to solve two classification problems. We will vary the number

1 of 2

of hidden nodes. The figure below depicts the structure when there are two hidden

nodes.

A second script is available that generates training data and trains the network using

SGD assuming a logistic activation function σ(z) = (1 + e

−z

)

−1

.

a) Use M = 2 hidden nodes and ten epochs in SGD. Run this four or five times and

comment on the performance of the two classifiers and whether it varies from run

to run.

b) Repeat M = 2 but use 100 epochs in SGD. (You may use fewer epochs if it takes

more than a minute or two per run.) Run this several times and comment on the

performance of the classifiers and whether it varies from run to run.

c) Recall the two-layer network results from the previous problem. How do the

possible decision boundaries change when you add a hidden layer?

d) Now use M = 3 hidden nodes and run 100 epochs of SGD (or as many as you

can compute). Does going from two to three hidden nodes affect classifier performance?

e) Repeat the previous part for M = 4 hidden nodes and comment on classifier

performance.

2 of 2