policies.py 文件源码

python
阅读 42 收藏 0 点赞 0 评论 0

项目:rl_algorithms 作者: DanielTakeshi 项目源码 文件源码
def update_policy(self, ob_no, ac_n, std_adv_n, stepsize):
        """ 
        The input is the same for the discrete control case, except we return a
        single log standard deviation vector in addition to our logits. In this
        case, the logits are really the mean vector of Gaussians, which differs
        among components (observations) in the minbatch. We return the *old*
        ones since they are assigned, then `self.update_op` runs, which makes
        them outdated.
        """
        feed = {self.ob_no: ob_no,
                self.ac_na: ac_n,
                self.adv_n: std_adv_n,
                self.stepsize: stepsize}
        _, surr_loss, oldmean_na, oldlogstd_a = self.sess.run(
                [self.update_op, self.surr_loss, self.mean_na, self.logstd_a],
                feed_dict=feed)
        return surr_loss, oldmean_na, oldlogstd_a
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号