multi_head_self_attention.py 文件源码

python

阅读 26 收藏 0 点赞 0 评论 0

项目：allennlp 作者: allenai 项目源码文件源码

def reset_parameters(self) -> None:
        # Because we are doing so many torch.bmm calls, which is fast but unstable,
        # it is critically important to intitialise the parameters correctly such
        # that these matrix multiplications are well conditioned initially.
        # Without this initialisation, this (non-deterministically) produces
        # NaNs and overflows.
        init.xavier_normal(self._query_projections)
        init.xavier_normal(self._key_projections)
        init.xavier_normal(self._value_projections)

评论列表正在加载评论...

文章目录

提
问题

写
面经

写
文章

微信
公众号

扫码关注公众号