def gauss_log_prob(mu, logstd, x):
""" Used for computing the log probability, following the formula for the
multivariate Gaussian density.
All the inputs should have shape (n,a). The `gp_na` contains component-wise
probabilitiles, then the reduce_sum results in a tensor of size (n,) which
contains the log probability for each of the n elements. (We later perform a
mean on this.) Also, the 2*pi part needs 1/2, but doesn't need the sum over
the number of components (# of actions) because of the reduce sum here.
Finally, logstd doesn't need a 1/2 constant because log(\sigma_i^2) will
bring the 2 over.
This formula generalizes for an arbitrary number of actions, BUT it assumes
that the covariance matrix is diagonal.
"""
var_na = tf.exp(2*logstd)
gp_na = -tf.square(x - mu)/(2*var_na) - 0.5*tf.log(tf.constant(2*np.pi)) - logstd
return tf.reduce_sum(gp_na, axis=[1])
评论列表
文章目录