99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

ECE 498代寫、代做Python設(shè)計(jì)編程
ECE 498代寫、代做Python設(shè)計(jì)編程

時(shí)間:2024-11-15  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯(cuò)



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:IEMS5731代做、代寫java設(shè)計(jì)編程
  • 下一篇:ENGG1110代做、R編程語言代寫
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    出評(píng) 開團(tuán)工具
    出評(píng) 開團(tuán)工具
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    海信羅馬假日洗衣機(jī)亮相AWE  復(fù)古美學(xué)與現(xiàn)代科技完美結(jié)合
    海信羅馬假日洗衣機(jī)亮相AWE 復(fù)古美學(xué)與現(xiàn)代
    合肥機(jī)場巴士4號(hào)線
    合肥機(jī)場巴士4號(hào)線
    合肥機(jī)場巴士3號(hào)線
    合肥機(jī)場巴士3號(hào)線
    合肥機(jī)場巴士2號(hào)線
    合肥機(jī)場巴士2號(hào)線
    合肥機(jī)場巴士1號(hào)線
    合肥機(jī)場巴士1號(hào)線
  • 短信驗(yàn)證碼 豆包 幣安下載 AI生圖 目錄網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          9000px;">

                欧美视频一二三区| 国产一区二区三区综合| 一卡二卡欧美日韩| 国产精品综合网| 91精品国产色综合久久不卡电影| 亚洲男人的天堂一区二区| 亚洲精品成人少妇| 99久久久无码国产精品| 宅男在线国产精品| 日日欢夜夜爽一区| 欧美成人精品高清在线播放 | 国产精品66部| 日韩精品在线网站| 国产成人免费视频网站 | 国产v综合v亚洲欧| 日韩欧美一级在线播放| 久久草av在线| 国产精品美女久久久久久久久久久 | 99久久婷婷国产精品综合| 国产精品美女久久久久久久| 久久婷婷久久一区二区三区| 亚洲免费在线视频一区 二区| 一本一道久久a久久精品| 日韩欧美第一区| 丁香婷婷综合五月| 亚洲欧美国产77777| 欧美高清你懂得| 国产成都精品91一区二区三| 久久久久久久一区| 99精品黄色片免费大全| 亚洲影视在线观看| 欧美日韩高清一区二区三区| 国产日本亚洲高清| 欧美日韩国产123区| 国产激情偷乱视频一区二区三区| 亚洲精品乱码久久久久久久久 | 欧美群妇大交群中文字幕| 麻豆一区二区在线| 色天天综合久久久久综合片| 精品99999| 日韩精品91亚洲二区在线观看| 精品国产亚洲在线| 丁香六月久久综合狠狠色| 午夜精品福利视频网站| 国产亚洲欧洲997久久综合 | 精品国产欧美一区二区| 日本精品免费观看高清观看| 九九视频精品免费| 亚洲午夜电影在线观看| 欧美三级电影精品| 国产91富婆露脸刺激对白| 天天综合天天做天天综合| 中文字幕一区二区三区精华液| 国产成人免费视频网站| 五月综合激情日本mⅴ| 69堂精品视频| 91国产免费观看| 成人午夜免费av| 天天av天天翘天天综合网| **网站欧美大片在线观看| 欧美日韩亚州综合| av激情成人网| 国产成人午夜高潮毛片| 国产一区二区中文字幕| 韩国av一区二区三区四区| 日韩av电影天堂| 亚洲成a人v欧美综合天堂下载| 精品久久国产字幕高潮| 91性感美女视频| 久久国产剧场电影| 久久99精品久久久久久国产越南| 美女久久久精品| 久久er精品视频| 国产不卡视频在线播放| 国产成人在线视频免费播放| 寂寞少妇一区二区三区| 国产在线精品视频| 国产精品一线二线三线| 丁香桃色午夜亚洲一区二区三区| 亚洲一级二级三级| 中文一区二区完整视频在线观看| 久久视频一区二区| 中文一区在线播放| 亚洲视频一区在线观看| 亚洲一区影音先锋| 天堂一区二区在线| 久久色视频免费观看| 久久综合久久综合久久| 精品国产一区二区在线观看| 日韩片之四级片| 欧美日韩一区三区四区| 欧美老女人第四色| 精品捆绑美女sm三区| 国产精品五月天| 亚洲第一福利一区| 国产在线精品一区二区三区不卡| 欧美激情艳妇裸体舞| 成人激情av网| 久久99精品国产麻豆不卡| 欧美精品在欧美一区二区少妇| 精品一区二区精品| 国内外成人在线视频| 成人自拍视频在线| 欧美另类videos死尸| 91麻豆国产香蕉久久精品| 麻豆成人久久精品二区三区红| 日韩精品一区二区三区老鸭窝 | 91在线视频官网| 男人的天堂亚洲一区| 丝袜亚洲另类丝袜在线| 一区二区成人在线观看| 天天做天天摸天天爽国产一区 | 国产精品福利av| 亚洲欧美在线aaa| 亚洲人成网站色在线观看| 欧美蜜桃一区二区三区| 日本高清免费不卡视频| 国产尤物一区二区| 久久er精品视频| 5566中文字幕一区二区电影| www欧美成人18+| 国产精品一二三区在线| 久久久久久久久久电影| 成人一道本在线| 久久精品视频在线看| 久久美女高清视频| 免费日本视频一区| 麻豆高清免费国产一区| 国产成人午夜片在线观看高清观看| 国产a精品视频| 日本韩国欧美三级| 欧美乱熟臀69xxxxxx| 久久精品一区二区| 久久精品一区四区| 国产精品每日更新| 色妞www精品视频| 亚洲一区二区三区不卡国产欧美| av一本久道久久综合久久鬼色| 亚洲国产精品激情在线观看| 99精品国产91久久久久久| 国产精品乱码久久久久久| 成人av在线看| 亚洲视频图片小说| 91麻豆精品一区二区三区| 亚洲一区二区三区视频在线播放| 91国产视频在线观看| 亚洲一本大道在线| 欧美巨大另类极品videosbest | 国内精品国产三级国产a久久| 欧美一区二区三级| 国产成人精品亚洲日本在线桃色| 亚洲同性同志一二三专区| 欧美性猛交xxxx乱大交退制版| 五月天国产精品| 国产亚洲欧洲一区高清在线观看| 色综合久久99| 国产美女精品在线| 一区二区久久久久| 精品国产欧美一区二区| 99久久精品国产麻豆演员表| 天天影视色香欲综合网老头| 亚洲国产精品高清| 4438x亚洲最大成人网| 国产馆精品极品| 日本在线不卡一区| 亚洲女同一区二区| 久久蜜桃一区二区| 7777女厕盗摄久久久| av中文一区二区三区| 久国产精品韩国三级视频| 亚洲国产裸拍裸体视频在线观看乱了| 久久蜜桃av一区精品变态类天堂| 欧美日韩国产一区| 91视频91自| 成人性色生活片| 久草在线在线精品观看| 亚洲一二三四区不卡| 国产精品夫妻自拍| 国产欧美视频一区二区三区| 日韩精品一区二区三区三区免费| 欧美性生活影院| 一本久久a久久免费精品不卡| 国产一区美女在线| 美女在线观看视频一区二区| 亚洲最新视频在线观看| 综合精品久久久| 中文乱码免费一区二区| 国产日韩欧美制服另类| 久久影音资源网| 欧美哺乳videos| 欧美一区二区高清| 欧美猛男男办公室激情| 国产成人精品免费| 国产精品综合在线视频| 久久99精品国产麻豆婷婷| 久久激情五月婷婷| 久久国产夜色精品鲁鲁99| 麻豆国产欧美一区二区三区| 久久精品国产一区二区三| 日韩高清在线一区|