double_dqn.py 文件源码

python
阅读 20 收藏 0 点赞 0 评论 0

项目:chainerrl 作者: chainer 项目源码 文件源码
def _compute_target_values(self, exp_batch, gamma):

        batch_next_state = exp_batch['next_state']

        with chainer.using_config('train', False):
            with state_kept(self.q_function):
                next_qout = self.q_function(batch_next_state)

        target_next_qout = self.target_q_function(batch_next_state)

        next_q_max = target_next_qout.evaluate_actions(
            next_qout.greedy_actions)

        batch_rewards = exp_batch['reward']
        batch_terminal = exp_batch['is_state_terminal']

        return batch_rewards + self.gamma * (1.0 - batch_terminal) * next_q_max
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号