rl_tuner.py 文件源码-python代码片段

rl_tuner.py 文件源码

python

阅读 34 收藏 0 点赞 0 评论 0

def reward_from_reward_rnn_scores(self, action, reward_scores):
    """Rewards based on probabilities learned from data by trained RNN.

    Computes the reward_network's learned softmax probabilities. When used as
    rewards, allows the model to maintain information it learned from data.

    Args:
      action: A one-hot encoding of the chosen action.
      reward_scores: The value for each note output by the reward_rnn.
    Returns:
      Float reward value.
    """
    action_note = np.argmax(action)
    normalization_constant = logsumexp(reward_scores)
    return reward_scores[action_note] - normalization_constant