def choose_action(d, c, q_table):
global epsilon
state_actions = q_table[d][c][:]
# random move or no data recorded for this state yet
if (np.random.uniform() < epsilon) or (np.sum(state_actions) == 0):
action_chose = np.random.randint(n_actions)
# decrease random moves over time to a minimum of 10%
if epsilon > 0.1: epsilon *= 0.9
else:
action_chose = state_actions.argmax()
return action_chose
BlackRobot_SARSA_Trace.py 文件源码
python
阅读 21
收藏 0
点赞 0
评论 0
评论列表
文章目录