trpo.py 文件源码-python代码片段

trpo.py 文件源码

python

阅读 34 收藏 0 点赞 0 评论 0

项目：rl_algorithms 作者: DanielTakeshi 项目源码文件源码

def _flatgrad(self, loss, var_list):
        """ A Tensorflow version of John Schulman's `flatgrad` function. It
        computes the gradients but does NOT apply them (for now). 

        This is only called during the `init` of the TRPO graph, so I think it's
        OK. Otherwise, wouldn't it be constantly rebuilding the computational
        graph? Or doing something else? Eh, for now I think it's OK.

        Params:
            loss: The loss function we're optimizing, which I assume is always
                scalar-valued.
            var_list: The list of variables (from `tf.trainable_variables()`) to
                take gradients. This should only be for the policynets.

        Returns:
            A single flat vector with all gradients concatenated.
        """
        grads = tf.gradients(loss, var_list)
        return tf.concat([tf.reshape(g, [-1]) for g in grads], axis=0)