U ~d+@sdZddlZddlmZddlZddlmZddlmZde e ddd Z Gd d d ej Z Gd d d ej ZGdddej ZGdddej ZGdddej ZGdddej Zd ddZd!ddZd"ddZGdddej ZdS)#z Mostly copy-paste from timm library. https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py N)partial) trunc_normal_F) drop_probtrainingcCsd|dks |s|Sd|}|jdfd|jd}|tj||j|jd}||||}|S)Nrr)r)dtypedevice)shapendimtorchrandrr floor_div)xrrZ keep_probr Z random_tensoroutputrt/apps/aws-distributed-training-workshop-pcluster/head-node-scripts/without_snakemake/pyscripts/vision_transformer.py drop_paths rcs*eZdZdZdfdd ZddZZS)DropPathz^Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). Ncstt|||_dSN)superr__init__r)selfr __class__rrr)szDropPath.__init__cCst||j|jSr)rrrrrrrrforward-szDropPath.forward)N__name__ __module__ __qualname____doc__rr __classcell__rrrrr&srcs0eZdZddejdffdd ZddZZS)MlpNrcsNt|p|}|p|}t|||_||_t|||_t||_dSr) rrnnLinearfc1actfc2Dropoutdrop)r in_featureshidden_features out_features act_layerr+rrrr2s z Mlp.__init__cCs6||}||}||}||}||}|Sr)r'r(r+r)rrrrr;s      z Mlp.forward)rr r!r%GELUrrr#rrrrr$1s r$cs&eZdZd fdd ZddZZS) AttentionFNrcsft||_||}|p"|d|_tj||d|d|_t||_t|||_ t||_ dS)Ngbias) rr num_headsscaler%r&qkvr* attn_dropproj proj_drop)rdimr6qkv_biasqk_scaler9r;head_dimrrrrEs  zAttention.__init__c Cs|j\}}}||||d|j||jddddd}|d|d|d}}}||dd|j} | jdd} || } | |dd|||}| |}| |}|| fS) Nr3rrr<) r r8reshaper6permute transposer7softmaxr9r:r;) rrBNCr8qkvattnrrrrPs .    zAttention.forward)r2FNrr)rr r!rrr#rrrrr1Ds r1cs<eZdZddddddejejffdd Zd ddZZS) Block@FNrc spt| ||_t||||||d|_|dkr:t|nt|_| ||_ t ||} t || | |d|_ dS)N)r6r=r>r9r;r)r,r-r/r+) rrnorm1r1rOrr%Identityrnorm2intr$mlp) rr<r6 mlp_ratior=r>r+r9rr/ norm_layerZmlp_hidden_dimrrrr`s    zBlock.__init__cCsH|||\}}|r|S|||}|||||}|Sr)rOrRrrVrT)rrreturn_attentionyrOrrrrks z Block.forward)F) rr r!r%r0 LayerNormrrr#rrrrrP_s   rPcs*eZdZdZd fdd Zdd ZZS) PatchEmbedz Image to Patch Embedding r3csDt||||}||_||_||_tj||||d|_dS)N) kernel_sizestride)rrimg_size patch_size num_patchesr%Conv2dr:)rrbrcin_chans embed_dimrdrrrrws  zPatchEmbed.__init__cCs*|j\}}}}||ddd}|S)Nr@r)r r:flattenrG)rrrIrKHWrrrrszPatchEmbed.forward)r]r^r3r_rrrrrr\ts r\csteZdZdZdgdddddddd d d d d ejffd d ZddZddZddZ ddZ ddZ dddZ Z S)VisionTransformerz Vision Transformer r]r^r3rr_ rQFNrc s t|_|_t|d||d|_|jj}tt dd|_ tt d|d|_ tj d|_ddt d| |Dtf ddt|D|_|_|dkrt|nt|_t|j dd t|j dd ||jdS) Nr)rbrcrfrgr)pcSsg|] }|qSr)item).0rrrr sz.VisionTransformer.__init__..c s*g|]"}t|d qS)) r<r6rWr=r>r+r9rrX)rP)roi attn_drop_rateZdpr drop_ratergrWrXr6r>r=rrrps{Gz?std)rr num_featuresrgr\ patch_embedrdr% Parameterr zeros cls_token pos_embedr*pos_droplinspace ModuleListrangeblocksnormr&rSheadrapply _init_weights)rrbrcrf num_classesrgdepthr6rWr=r>rtrsdrop_path_raterXkwargsrdrrrrrs*    zVisionTransformer.__init__cCsrt|tjrBt|jddt|tjrn|jdk rntj|jdn,t|tjrntj|jdtj|jddS)Nrurvrg?) isinstancer%r&rweightr5init constant_r[rmrrrrs  zVisionTransformer._init_weightsc CsD|jdd}|jjdd}||kr4||kr4|jS|jdddf}|jddddf}|jd}||jj} ||jj} | d| d} } tjj|dtt |tt || dddd| t || t |fdd}t| |jd krt| |jdkst | dddd dd|}tj|d|fdd S) NrrrCg?r3r@bicubic) scale_factormoderBrD)r r}ryrcr% functional interpolaterErUmathsqrtrFAssertionErrorviewr cat unsqueeze) rrwhZnpatchrJZclass_pos_embedZpatch_pos_embedr<Zw0h0rrrinterpolate_pos_encodings$   .,z*VisionTransformer.interpolate_pos_encodingcCsV|j\}}}}||}|j|dd}tj||fdd}|||||}||S)NrCrrD)r ryr|expandr rrr~)rrrIncrrZ cls_tokensrrrprepare_tokenss  z VisionTransformer.prepare_tokenscCs8||}|jD] }||}q||}|dddfS)Nr)rrr)rrblkrrrrs     zVisionTransformer.forwardcCsN||}t|jD]4\}}|t|jdkr8||}q||ddSqdS)NrT)rY)r enumeraterlen)rrrqrrrrget_last_selfattentions   z(VisionTransformer.get_last_selfattentionrcCsP||}g}t|jD]2\}}||}t|j||kr|||q|Sr)rrrrappendr)rrnrrqrrrrget_intermediate_layerss z)VisionTransformer.get_intermediate_layers)r)rr r!r"r%r[rrrrrrrr#rrrrrks"   rkr^c Ks,tf|dddddttjddd|}|S) Nrlr3rATư>epsrcrgrr6rWr=rXrkrr%r[rcrmodelrrrvit_tinys rc Ks,tf|dddddttjddd|}|S) NirlrATrrrrrrrr vit_smalls rc Ks,tf|dddddttjddd|}|S)Nr_rlrATrrrrrrrrvit_bases rcs.eZdZd fdd Zdd Zd d ZZS) DINOHeadFTr3c s tt|d}|dkr,t|||_nt||g}|rN|t||tt |dD]8} |t|||r|t||tqh|t||tj ||_| |j tj tj||dd|_|jjjd|rd|jj_dS)Nrr@Fr4)rrmaxr%r&rVr BatchNorm1dr0r Sequentialrrutils weight_norm last_layerweight_gdatafill_ requires_grad) rin_dimout_dimuse_bnnorm_last_layerZnlayers hidden_dimZbottleneck_dimlayers_rrrrs(    zDINOHead.__init__cCsDt|tjr@t|jddt|tjr@|jdk r@tj|jddS)Nrurvr)rr%r&rrr5rrrrrrrs zDINOHead._init_weightscCs*||}tjj|ddd}||}|S)NrCr@)r<rm)rVr%r normalizerrrrrrs  zDINOHead.forward)FTr3rr)rr r!rrrr#rrrrrsr)rF)r^)r^)r^)r"r functoolsrr torch.nnr%rrfloatboolrModulerr$r1rPr\rkrrrrrrrrs      f