4月份成立的基础模型组,对标qwen deepseek等,一千张卡,20人用目前在做后训练,后面会做预训练手写一个MHA重点考察了qkv的维度转换输入维度 batchsize,sequence lenth, emb_dimreshape + permute后的维度,3,batchsize,num_head,sequence length,head_dim qkv = self.qkv(x).reshape(batchsize,sequence length,3,num_head,head_dim).permute(2,0,3,1,4) q,k,v=qkv[0],qkv[1],qkv[2]