utils_pg.py 文件源码-python代码片段

def gauss_KL(mu1, logstd1, mu2, logstd2):
    """ Returns KL divergence among two multivariate Gaussians, component-wise.

    It assumes the covariance matrix is diagonal. All inputs have shape (n,a).
    It is not necessary to know the number of actions because reduce_sum will
    sum over this to get the `d` constant offset. The part consisting of the
    trace in the formula is blended with the mean difference squared due to the
    common "denominator" of var2_na.  This forumula generalizes for an arbitrary
    number of actions.  I think mu2 and logstd2 should represent the policy
    before the update.

    Returns the KL divergence for each of the n components in the minibatch,
    then we do a reduce_mean outside this.
    """
    var1_na = tf.exp(2.*logstd1)
    var2_na = tf.exp(2.*logstd2)
    tmp_matrix = 2.*(logstd2 - logstd1) + (var1_na + tf.square(mu1-mu2))/var2_na - 1
    kl_n = tf.reduce_sum(0.5 * tmp_matrix, axis=[1]) # Don't forget the 1/2 !!
    assert_op = tf.Assert(tf.reduce_all(kl_n >= -0.0000001), [kl_n]) 
    with tf.control_dependencies([assert_op]):
        kl_n = tf.identity(kl_n)
    return kl_n