Sale!

# Assignment 4 CSCE 523 Solved

Original price was: \$40.00.Current price is: \$35.00.

Category:

## Uncertainty and Machine Learning

1. (10 points) Bayes Rule: Seventy percent of the aircraft that disappear in
flight through the Bermuda Triangle are recovered (P(r) = 0.7). Sixty
percent of the recovered aircraft have an emergency locator (P(e|r) =
0.60).

Unfortunately, 90% of the aircraft not recovered do not have such a
locator. Suppose that an aircraft with a locator has disappeared. What is
the probability that it will be recovered (P(r|e))?

2. (20 points) Shown below is a Bayes network representing the risk of
flooding sewers (FS) in a city as dependent on rainfall (R), falling leaves
(FL), thunderstorm (TS), and autumn (A). Use the conditional probabilities
below to determine the conditional probabilities of a thunderstorm for the
observable scenarios FS  A, FS  A, FS  A, and FS  A.

FL: P(FL | TS  A) = 0.8
P(FL | TS  A) = 0.2
P(FL | TS  A) = 0.05
P(FL | TS  A) = 0.01
R: P(R | TS) = 0.7
P(R | TS) = 0.1

FS: P(FS | FL  R) = 0.5
P(FS | FL  R) = 0.2
P(FS | FL  R) = 0.05
P(FS | FL  R) = 0.01
TS: P(TS) = 0.5
A: P(A) = 0.33
Thunderstorm

(TS)
Autumn
(A)
Rain
(R)
Falling Leaves
(FL)
Flooding Sewers
(FS)

3. (20 points) Link sees Sheik on the horizon. Sheik is fighting with magic and
has all her hearts. Link wants to determine Sheik’s potential for defeating
the enemy and whether he should enter the fray.

Shown below is a risk analysis Bayesian network that Link plans to use. His
risk drivers are the type of weapon in use and the enemy faced. Using
certain weapons on specific enemies can improve effectiveness in battle.

The effectiveness affects the number of hearts the individual has, as well
as the number of bombs they may have left. His question is what is the
probability of success given magic and hearts: Pr(S|W=magic,H=true)?
W: Pr(W={magic,sword}) = <0.5,0.5>
E: Pr(E={poe,darknut,lizalfos}) = <0.33,0.33,0.33>
S:

Success W=magic W=sword
E=p E=d E=l E=p E=d E=l
true 0.2 0.8 0.3 0.8 0.2 0.2
false 0.8 0.2 0.7 0.2 0.8 0.8
Weapon
(W)
Bombs
(B)
Enemy

(E)
Success
(S)
Hearts
(H)
Risk Drivers
Control
Risk Indicators
H:
B:

4. (20 points) For this problem, you need to build a Bayes network in problem
2 in JavaBayes. Using the Bayes network, double check your solutions for
problem 2. And change the table for Rain to:
R: P(R | TS) = 0.9
P(R | TS) = 0.3

What are the conditional probabilities of thunderstorm (TS) given the
observable scenarios FL, and FL. Turn-in your Bayes network file with

5. (30 points) Ahhh, life at AFIT- it’s not just a job, it’s an AI problem! You
wake up to the sunrise after studying for a final all night. You find yourself
oh, you only have fifteen minutes to get to your exam.

Unfortunately, you
still must negotiate a maze of cargo between here and the classroom. Which
brings us to the problem: how many steps does it take to reach your
destination?
There are two challenges for you:

## Part I:

This challenge requires you to calculate V(s) for a given map, and output
the MEU path. The calculation of the MEU should be conducted by
performing value or policy iteration (your choice).

This is a gridworld in which a cell is either occupied or empty, and the agent
may move, North, South, East, or West, 1 square at a time. The cost of
moving is 1.0. When the agent moves, there is a probability of 0.90 that
they will end in the state that they want to be in and a probability of 0.07
that they remain in the current state, and a probability of 0.03 that they go
backwards a square (i.e. if they were headed West, they would instead go
1 square East).

The world is fully-observable, the agent knows where its
location and the locations of the goal and obstacles.
Part II:
Hearts S=true S=false
true 0.9 0.2
false 0.1 0.8
Bombs S=true S=false
true 0.8 0.2
false 0.2 0.8

For this challenge, solve the same problem using reinforcement learning.
Use TD(0)/SARSA, and TD(). Solve the problem for V(s) not Q(s,a). The
values should converge to those close to Part I.

Turn-in should include your code (no language stipulation), and a write-up
which draws comparisons between the solutions. Include a results graph
indicating the path length vs iteration (plot based on the same start location,
and curves should go down). Testing should use world sizes of 25×25 and
50×50.

DataFile Format Example:
3 3
O O O
G V V
V V V
World x size, World y size

Each map location either O – obstacle, V – vacant, or G – goal
Have the agent start location be manually inserted or randomly generated
at the users request.
Take a look at using the BURLAP (http://burlap.cs.brown.edu/) library as a
starting point. You will need to install maven and hook it into your IDE.
Turn-ins: a write-up of your solution, with a graph comparing the
convergence between approaches. Be sure to discuss your parameter
settings, and how you identified the values you used (a parameter sweep
is not unwarranted).

Reinforcement Learning Review
For this problem, you need to learn the best state values for each state in
the domain. In order to learn these functions, you will need to make use of
TD-. In order for the weights to be learned you will need to run your
program several times.

To learn the state values using TD- you will need to consider the problem
as a set of state positions s0, s1, …, sT. The reward for the goal position sT
can be your choice, I suggest 100.
The theoretical/forward view of TD() evaluates the target values for the
states s0, s1, …, sT-1 as
      
 

 

   
1
1

arg 1 1 1
T t
k
t
T t
t k
k
t
t et V s   V s  V s

Where t is the index of the state, and  is a value between 0.0 and 1.0.
The issue with applying this representation is that it is a forward view and
needs to be converted to an implementable backward view.

The way this is
handled is by adding the concept of eligibility traces, e(s), (per Sutton)
which maintain the reward decay. The resulting algorithm is shown in Figure
2.
   
  
 
  
 
until is terminal

1

For all
1

Take action , observe reward, , and next state,
Repeat (for each step of game)
Initialize and 0

Repeat (for each game)
Initialize arbitrarily and 0,
arg
1 1
1 1
t
t et
t t
t t t
t t t
s

t t
e s e s
V s V s e s
s
e s e s
r V s V s
a r s
s t

V s e s s S
 

 
 
  

  
 
 


 

Figure 2: Backward view of TD().
Note that in the backward view of TD() that all of the states and all of the
eligibility trace state values are updated every single step of the learning
iteration. In the interest of ease of implementation, we will assume that the
only eligibility trace states we must track are those states that we visit
during the course of an iteration (i.e. start to goal) and that all other
eligibility trace state values are 0.

This means that we must only track and
update the states for each game. The reason that this assumption works is
that in most cases, the decay values cause e(s) to decrease at a high
enough rate that repeat visits to the state within a short time frame are
unlikely.

Java Code Stub Reading a maze file:
import java.io.File;
import java.io.IOException;
// This section merely parses the input file into an
array.

char x[] = new char[1]; //temp variable used for
file parsing
// open maze file
// one can change the maze file here (make sure
they are in the same directory)
File myFile = new File(“maze1”);

String line = null;

String[] valueStr = new
String(line).trim().split(“\\s+”); // split into two
integers

int length = Integer.parseInt(valueStr[0]);
int width = Integer.parseInt(valueStr[1]);

newMaze = new String[length][width]; // create

maze array

//System.out.println(length + ” ” + width);

//Parse rest of file into an array
for (int i = 0; i < length; i++) {
for (int j = 0; j < length; j++) {