def init_policy(self):
output_vec = L.get_output(self._output_vec_layer, deterministic=True) / self._c
prob = tf.nn.softmax(output_vec)
max_qval = tf.reduce_logsumexp(output_vec, [1])
self._f_prob = tensor_utils.compile_function([self._obs_layer.input_var], prob)
self._f_max_qvals = tensor_utils.compile_function([self._obs_layer.input_var], max_qval)
self._dist = Categorical(self._n)
stochastic_discrete_mlp_q_function.py 文件源码
python
阅读 30
收藏 0
点赞 0
评论 0
评论列表
文章目录