multi_head_self_attention.py 文件源码

python
阅读 26 收藏 0 点赞 0 评论 0

项目:allennlp 作者: allenai 项目源码 文件源码
def reset_parameters(self) -> None:
        # Because we are doing so many torch.bmm calls, which is fast but unstable,
        # it is critically important to intitialise the parameters correctly such
        # that these matrix multiplications are well conditioned initially.
        # Without this initialisation, this (non-deterministically) produces
        # NaNs and overflows.
        init.xavier_normal(self._query_projections)
        init.xavier_normal(self._key_projections)
        init.xavier_normal(self._value_projections)
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号