def gauss_KL(mu1, logstd1, mu2, logstd2):
""" Returns KL divergence among two multivariate Gaussians, component-wise.
It assumes the covariance matrix is diagonal. All inputs have shape (n,a).
It is not necessary to know the number of actions because reduce_sum will
sum over this to get the `d` constant offset. The part consisting of the
trace in the formula is blended with the mean difference squared due to the
common "denominator" of var2_na. This forumula generalizes for an arbitrary
number of actions. I think mu2 and logstd2 should represent the policy
before the update.
Returns the KL divergence for each of the n components in the minibatch,
then we do a reduce_mean outside this.
"""
var1_na = tf.exp(2.*logstd1)
var2_na = tf.exp(2.*logstd2)
tmp_matrix = 2.*(logstd2 - logstd1) + (var1_na + tf.square(mu1-mu2))/var2_na - 1
kl_n = tf.reduce_sum(0.5 * tmp_matrix, axis=[1]) # Don't forget the 1/2 !!
assert_op = tf.Assert(tf.reduce_all(kl_n >= -0.0000001), [kl_n])
with tf.control_dependencies([assert_op]):
kl_n = tf.identity(kl_n)
return kl_n
评论列表
文章目录