Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problemwhere the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context
下载地址
用户评论