99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產(chǎn)生活服務(wù)合肥教育合肥招聘合肥旅游文化藝術(shù)合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務(wù)合肥法律

ECE 498代寫、代做Python設(shè)計(jì)編程
ECE 498代寫、代做Python設(shè)計(jì)編程

時(shí)間:2024-11-15  來源:合肥網(wǎng)hfw.cc  作者:hfw.cc 我要糾錯(cuò)



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:IEMS5731代做、代寫java設(shè)計(jì)編程
  • 下一篇:ENGG1110代做、R編程語言代寫
  • 無相關(guān)信息
    合肥生活資訊

    合肥圖文信息
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    急尋熱仿真分析?代做熱仿真服務(wù)+熱設(shè)計(jì)優(yōu)化
    出評(píng) 開團(tuán)工具
    出評(píng) 開團(tuán)工具
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    挖掘機(jī)濾芯提升發(fā)動(dòng)機(jī)性能
    海信羅馬假日洗衣機(jī)亮相AWE  復(fù)古美學(xué)與現(xiàn)代科技完美結(jié)合
    海信羅馬假日洗衣機(jī)亮相AWE 復(fù)古美學(xué)與現(xiàn)代
    合肥機(jī)場巴士4號(hào)線
    合肥機(jī)場巴士4號(hào)線
    合肥機(jī)場巴士3號(hào)線
    合肥機(jī)場巴士3號(hào)線
    合肥機(jī)場巴士2號(hào)線
    合肥機(jī)場巴士2號(hào)線
    合肥機(jī)場巴士1號(hào)線
    合肥機(jī)場巴士1號(hào)線
  • 短信驗(yàn)證碼 豆包 幣安下載 AI生圖 目錄網(wǎng)

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          在线午夜精品| 国产一区成人| 欧美精品成人91久久久久久久| 欧美一区二区三区免费在线看| 亚洲综合丁香| 欧美韩日一区| 国产一区久久久| 在线亚洲免费视频| 久久另类ts人妖一区二区| 欧美成va人片在线观看| 欧美午夜一区| 亚洲国产日韩在线| 亚洲欧美日韩中文在线制服| 美女主播一区| 国产女主播在线一区二区| 亚洲电影免费在线观看| 午夜精品久久久久久久男人的天堂| 久久激情网站| 欧美成人免费观看| 性欧美video另类hd性玩具| 午夜久久一区| 欧美成人免费网| 亚洲国产精品va| 久久久久久穴| 一区二区三区在线视频免费观看| 亚洲一区二区三区免费观看| 欧美日韩一区国产| 亚洲蜜桃精久久久久久久| 欧美成人精品一区二区| **性色生活片久久毛片| 久久久久网址| 亚洲精品免费一二三区| 久久久精品免费视频| 国产视频在线一区二区| 久久成人免费视频| 亚洲欧洲日产国产综合网| 玖玖玖国产精品| 亚洲欧洲一区二区天堂久久| 欧美mv日韩mv国产网站app| 亚洲欧洲日产国产网站| 欧美日韩精品在线播放| 亚洲在线电影| 在线欧美日韩国产| 久久九九热免费视频| 国产日韩1区| 欧美成人午夜激情在线| 在线视频日本亚洲性| 国产一区二区三区在线观看免费| 久久综合狠狠综合久久激情| 日韩视频在线一区二区| 国产亚洲成人一区| 欧美三级韩国三级日本三斤| 久久在线免费观看| 欧美一级在线亚洲天堂| aⅴ色国产欧美| 狠狠色丁香婷婷综合| 国产精品永久免费视频| 欧美日韩亚洲激情| 美女精品在线| 麻豆精品视频| 欧美 日韩 国产精品免费观看| 亚洲男人的天堂在线| 亚洲精品免费一二三区| 亚洲国产精品美女| 亚洲精品乱码久久久久久按摩观 | 激情文学一区| 国产美女一区| 狠狠久久五月精品中文字幕| 国产精品欧美日韩一区| 国产日韩欧美精品综合| 国产欧美午夜| 亚洲精品免费网站| 国产综合久久| 欧美日韩国产在线一区| 欧美激情一区二区三区四区| 欧美高清在线播放| 国产精品海角社区在线观看| 国产精品亚洲不卡a| 激情久久久久久久久久久久久久久久| 国内久久婷婷综合| av成人国产| 久久婷婷麻豆| 欧美午夜精品久久久| 国产一区二区三区网站 | 久久av资源网站| 欧美国产日韩精品| 国内精品**久久毛片app| 亚洲国产一区二区三区青草影视| 国产精品免费网站| 亚洲国产色一区| 久久久99免费视频| 欧美性片在线观看| 亚洲电影av在线| 久久久久久**毛片大全| 欧美日韩专区| 亚洲美女在线观看| 免费中文字幕日韩欧美| 海角社区69精品视频| 亚洲永久精品国产| 国产精品乱子久久久久| 亚洲精品一级| 狂野欧美激情性xxxx| 国产日韩欧美精品| 欧美一级日韩一级| 国产欧美日韩视频| 久久综合给合| 亚洲日本国产| 国产精品不卡在线| 销魂美女一区二区三区视频在线| 国产精品久久久久久久久免费| 日韩视频一区二区三区在线播放| 欧美v国产在线一区二区三区| 国产主播一区二区三区四区| 久久久福利视频| 亚洲国产日韩欧美在线99| 模特精品裸拍一区| 亚洲一区二区三区免费视频| 国产精品永久| 免费成人黄色片| 亚洲午夜一区二区三区| 国产日韩精品一区| 欧美理论大片| 久久久精品国产免费观看同学 | 久久久久久久尹人综合网亚洲| 狠狠色综合网站久久久久久久| 欧美巨乳波霸| 久久久99精品免费观看不卡| 一区在线播放| 国产欧美日韩中文字幕在线| 久久婷婷人人澡人人喊人人爽| 欧美紧缚bdsm在线视频| 亚洲一区二区三区午夜| 国产欧美日韩视频一区二区| 久久久不卡网国产精品一区| 亚洲精品人人| 亚洲区一区二| 亚洲高清123| 欧美三级精品| 欧美成人有码| 狼人天天伊人久久| 久久久人成影片一区二区三区观看| 日韩天堂在线视频| 91久久久久| 亚洲免费观看| 日韩亚洲精品视频| 亚洲乱码国产乱码精品精98午夜| 雨宫琴音一区二区在线| 亚洲国内高清视频| 亚洲精品乱码久久久久久久久| 亚洲日韩中文字幕在线播放| 亚洲国产天堂久久国产91| 亚洲福利视频网站| 亚洲精品在线一区二区| 亚洲天堂免费观看| 午夜在线观看欧美| 麻豆乱码国产一区二区三区| 欧美成人午夜| 国产精品一区二区三区四区| 国产综合色一区二区三区| 亚洲第一天堂av| 亚洲综合三区| 免费欧美高清视频| 欧美亚洲成人网| 亚洲国产日韩欧美综合久久 | 免费在线日韩av| 国产欧美一区二区精品忘忧草| 在线播放视频一区| 午夜精品视频| 欧美日韩在线另类| 亚洲黄色免费网站| 久久久噜噜噜久久人人看| 国产精品99免费看 | 极品av少妇一区二区| 亚洲视频电影图片偷拍一区| 欧美fxxxxxx另类| 极品尤物久久久av免费看| 一区二区三区精品在线| 老司机免费视频久久| 激情av一区二区| 久久精品99国产精品| 国产香蕉久久精品综合网| 午夜一区二区三区在线观看| 欧美日韩一区二区三区在线看| 一本久道久久综合中文字幕| 久久精品中文字幕一区二区三区| 国产欧美精品一区二区色综合| 日韩午夜电影| 欧美成人免费网| 亚洲人成毛片在线播放| 欧美日韩国产大片| 99re热这里只有精品视频| 欧美日韩午夜激情| 亚洲欧美国产一区二区三区| 亚洲欧美日韩国产成人| 欧美精品久久久久a| 99热免费精品在线观看| 欧美日本成人| 午夜一区二区三区在线观看| 国产精品国码视频| 久久久久久高潮国产精品视|