强化学习 Reinforcement learning

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

强化学习(Reinforcement learning,简称RL),有时候也被翻译为增强学习,是机器学习中的一个分支,关注的问题为让智能体在环境中执行动作以达到最大奖励。强化学习是机器学习三大分之之一,其余的为非监督学习监督学习

Reinforcement learning differs from supervised learning in not needing labelled input/output pairs be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).

强化学习和监督式学习算法的最大区别在于,强化学习不需要任何带有标签的输入输出信息,也不需要对其任何操作进行矫正。强化学习关注未知环境的探索和现有知识的利用。

The environment is typically stated in the form of a Markov decision process (MDP), because many reinforcement learning algorithms for this context utilize dynamic programming techniques.[2] The main difference between the classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the MDP and they target large MDPs where exact methods become infeasible.

强化学习中的环境通常以马尔可夫决策过程的形式表示,