Sale!

EE 466000 Homework 1: Multi-Armed Bandit solved

Original price was: $35.00.Current price is: $30.00. $25.50

Category:

Description

5/5 - (8 votes)

Goal
The goal of this assignment helps you get familiar with basic action-value based methods in multiarmed bandit problems.
Todo
β€’ Implement the algorithm:
βœ“ πœ€-Greedy
β€’ Get familiar with basic Python syntax.
Details
β€’ Problem description
o Implement a 6-armed bandit problem with π‘žβˆ—
(1) = 0.3, π‘žβˆ—
(2) = βˆ’5, π‘žβˆ—
(3) = 5,
π‘žβˆ—
(4) = βˆ’1.1, π‘žβˆ—
(5) = 1, π‘žβˆ—
(6) = 0.
o When a learning method applied to the problem, the actual reward, Rt, was given by
a normal distribution with mean qβˆ—(At) and variance 1.
β€’ File description
o hw1.ipynb: Since it’s the first homework, we will provide more instructions in
this file, please follow the instructions to complete your homework.
o The bandit environment is used in this assignment. You will implement normal
distribution to randomly generate the reward function of each bandit. You should
modify the step function in this class.
o In the class learning, you will implement πœ€-Greedy action selection and update
the action values. Please modify the chooseAction and updateValue function
in this class to complete your homework.
o We strongly recommend to implement evaluation function or plotting function by
your own to get familiar with plotting mechanism in Python. We provide an basic
plotting as your reference.
β€’ After you’ve done all the algorithms, you should implement plotting function on your own
to analyze different settings.
β€’ Please write a README file to explain how to run your code if you implemented extra
functions.
Requirements and Installation
β€’ Python version: 3.7
β€’ Please run pip install [library_name] to install necessary libraries.
Report
β€’ Title, name, student ID
β€’ Implementation
βœ“ In πœ€-Greedy, how do you select action if the action values are equal?
βœ“ Briefly describe your implementation.
β€’ Experiments and Analysis
βœ“ Get average rewards curves of different settings over 1000 steps and average the
result of 30 learning process into a figure.
β–ͺ Vary πœ€ value with 0, 0.2, 0.8. What happens? Why? Please plot the curves
into a figure.
βœ“ Is there any way to always get the best result when πœ€ = 0? How?
Reminder
β€’ Please upload your code main.py and report.pdf to iLMS before 4/4 (Sun) 23:59. No late
submission allowed.
β€’ DO NOT zip your code into a single file.
β€’ Please do not copy&paste the code from your classmates.