runner.py 文件源码

python
阅读 27 收藏 0 点赞 0 评论 0

项目:Multitask-and-Transfer-Learning 作者: AI-ON 项目源码 文件源码
def learn_single(self, value, value_last, last_action, reward):
        expected_value = self.gamma * value + reward # What value_last should have been if it was perfect

        value_loss = F.smooth_l1_loss(expected_value, value_last)
        print(value_loss.data)
        last_action.reinforce(value_loss.data[0])

        self.optimizer.zero_grad()
        final_nodes = [value_loss, last_action]
        gradients = [maybe_cuda(torch.ones(1)), None]
        autograd.backward(final_nodes, gradients, retain_graph=True)
        self.optimizer.step()
        del last_action
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号