def _flatgrad(self, loss, var_list):
""" A Tensorflow version of John Schulman's `flatgrad` function. It
computes the gradients but does NOT apply them (for now).
This is only called during the `init` of the TRPO graph, so I think it's
OK. Otherwise, wouldn't it be constantly rebuilding the computational
graph? Or doing something else? Eh, for now I think it's OK.
Params:
loss: The loss function we're optimizing, which I assume is always
scalar-valued.
var_list: The list of variables (from `tf.trainable_variables()`) to
take gradients. This should only be for the policynets.
Returns:
A single flat vector with all gradients concatenated.
"""
grads = tf.gradients(loss, var_list)
return tf.concat([tf.reshape(g, [-1]) for g in grads], axis=0)
评论列表
文章目录