99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

COMP9414代做、代寫Python程序設計

時間:2024-07-21  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 24 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:
1https://www.gymlibrary.dev/environments/toy text/taxi/
1
env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 14 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.

請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp





 

掃一掃在手機打開當前頁
  • 上一篇:COMP9021代做、代寫python設計程序
  • 下一篇:COMP6008代做、代寫C/C++,Java程序語言
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發動機性能
    挖掘機濾芯提升發動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
    合肥機場巴士2號線
    合肥機場巴士2號線
    合肥機場巴士1號線
    合肥機場巴士1號線
  • 短信驗證碼 豆包 幣安下載 AI生圖 目錄網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          9000px;">

                中文av一区二区| 精品蜜桃在线看| 欧美亚洲国产bt| 国产精品亲子乱子伦xxxx裸| 亚洲成人久久影院| 成人激情免费视频| 欧美一区二区高清| 最新国产成人在线观看| 国产精品小仙女| 久久久亚洲国产美女国产盗摄| 一区二区在线电影| 99久久99久久精品免费看蜜桃| 精品久久五月天| 青草av.久久免费一区| 欧美狂野另类xxxxoooo| 婷婷久久综合九色综合绿巨人| 91蜜桃网址入口| 亚洲国产一区视频| 91精品国产91综合久久蜜臀| 久久久久一区二区三区四区| 一卡二卡三卡日韩欧美| 欧美精品一区二区三区四区| 色综合天天天天做夜夜夜夜做| 精品欧美一区二区久久| 久久精品99国产精品| 精品国产91乱码一区二区三区| 国产在线精品视频| 亚洲自拍偷拍欧美| 欧美va亚洲va香蕉在线| 成人不卡免费av| 日韩高清不卡一区二区三区| 国产偷v国产偷v亚洲高清| 99久久伊人久久99| 欧美aaaaa成人免费观看视频| 欧美日韩一级片在线观看| 美女一区二区在线观看| 中文字幕欧美激情一区| 欧美日韩成人在线一区| 国产v综合v亚洲欧| 久久99久久99小草精品免视看| 久久久久久久久久久99999| 色综合天天性综合| 成人午夜碰碰视频| 国产一区在线精品| 久热成人在线视频| 日韩电影一区二区三区四区| 国产欧美日韩视频在线观看| 欧美日韩电影在线播放| 欧美午夜在线观看| 欧美亚洲综合久久| 99国产精品国产精品毛片| 国产成人在线视频网址| 青青青伊人色综合久久| 亚洲成人三级小说| 亚洲成人自拍网| 久久久91精品国产一区二区精品| 欧美午夜精品理论片a级按摩| 高潮精品一区videoshd| 国产**成人网毛片九色| 毛片不卡一区二区| 国产成人午夜视频| 91麻豆国产福利精品| 91久久奴性调教| 欧美日韩另类国产亚洲欧美一级| 欧美日韩卡一卡二| 亚洲国产精品精华液2区45| 亚洲乱码国产乱码精品精98午夜 | 亚洲欧美在线aaa| 成人毛片在线观看| 日韩女优毛片在线| 欧美tickling网站挠脚心| 国产喂奶挤奶一区二区三区| 日韩码欧中文字| 日本va欧美va瓶| 日本韩国欧美国产| 久久久精品免费网站| 亚洲成人7777| 91麻豆免费看| ...xxx性欧美| 国产在线精品一区在线观看麻豆| 99久久精品国产观看| 国产亚洲成aⅴ人片在线观看 | 91国偷自产一区二区使用方法| 日韩女优视频免费观看| 精品一区二区三区视频在线观看| 成人综合在线视频| 性感美女久久精品| 亚洲国产日日夜夜| 日本成人在线一区| 久久精品国产澳门| 九九久久精品视频| 岛国精品一区二区| 国产一区二区在线电影| 国产在线精品一区在线观看麻豆| 日韩一区欧美二区| 国产精品一区专区| 成人丝袜18视频在线观看| bt欧美亚洲午夜电影天堂| 91豆麻精品91久久久久久| 欧美日韩国产大片| 91精品国产综合久久精品图片| 欧美不卡激情三级在线观看| 国产欧美日韩三级| 亚洲在线一区二区三区| 另类人妖一区二区av| 国产**成人网毛片九色| 日韩精品一级中文字幕精品视频免费观看| 国产蜜臀97一区二区三区| 美国欧美日韩国产在线播放| 国产成人丝袜美腿| 久久综合精品国产一区二区三区| 国产婷婷精品av在线| 一区二区三区不卡视频在线观看| 日本人妖一区二区| 制服丝袜一区二区三区| 最新热久久免费视频| www.日韩av| 亚洲一区二区三区视频在线播放| 图片区日韩欧美亚洲| 国产99精品视频| 在线播放一区二区三区| 中文字幕亚洲综合久久菠萝蜜| 视频一区中文字幕| 欧亚洲嫩模精品一区三区| 国产亚洲欧美一级| 日韩1区2区3区| 欧美日韩精品欧美日韩精品一| 久久精品视频在线免费观看| 蜜臀久久99精品久久久久久9 | 中文字幕制服丝袜成人av | 欧洲一区二区av| 欧美激情一区二区三区蜜桃视频| 亚洲高清不卡在线| 粉嫩一区二区三区性色av| 精品精品欲导航| 美腿丝袜亚洲综合| 91 com成人网| 午夜精品一区在线观看| 欧美日韩综合在线免费观看| 一区二区久久久久久| 色偷偷久久人人79超碰人人澡 | 日本一区二区三区国色天香| 久久 天天综合| 日韩色在线观看| 伦理电影国产精品| 欧美丰满美乳xxx高潮www| 天堂一区二区在线免费观看| 欧美精品乱人伦久久久久久| 丝袜美腿亚洲综合| 欧美一区二区三区免费观看视频| 视频一区二区中文字幕| 欧美一区二区啪啪| 狠狠狠色丁香婷婷综合激情| 久久精品在这里| 成人精品鲁一区一区二区| 中文字幕精品一区 | 韩国欧美一区二区| 久久久久久久久伊人| 国产激情一区二区三区四区| 中文字幕一区不卡| 欧美亚日韩国产aⅴ精品中极品| 亚洲成人一区在线| 26uuu精品一区二区三区四区在线| 国产精品综合在线视频| 国产精品久久久久久妇女6080| 在线观看av一区二区| 蜜桃精品视频在线| 欧美国产亚洲另类动漫| 国产成人亚洲综合a∨婷婷| 亚洲综合在线第一页| 欧美成人欧美edvon| 成人黄页在线观看| 亚洲国产精品麻豆| 精品国产乱码久久久久久老虎 | 国产视频一区二区三区在线观看 | 日本韩国欧美三级| 激情综合网天天干| 亚洲免费观看高清完整版在线观看熊 | 亚洲成人一区在线| 久久这里只有精品首页| 欧美视频一区二区三区在线观看 | 日韩午夜电影在线观看| 96av麻豆蜜桃一区二区| 国产精品一区二区在线观看网站| 亚洲高清视频的网址| 国产日产欧产精品推荐色| 欧美日韩一区二区三区高清 | 韩国午夜理伦三级不卡影院| 成人欧美一区二区三区白人| 精品国产电影一区二区| 在线播放一区二区三区| 色久综合一二码| 激情都市一区二区| 一区二区三区**美女毛片| 久久综合久色欧美综合狠狠| 欧美日韩在线播| 蜜桃视频在线观看一区二区| 国产精品女同一区二区三区| 欧美日韩一区二区三区四区五区 | 婷婷夜色潮精品综合在线|