| Reinforcement learning is an area of machine learning concerned with how [[software agent]]s ought to take [[Action selection|actions]] in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as [[game theory]], [[control theory]], [[operations research]], [[information theory]], [[simulation-based optimization]], [[multi-agent system]]s, [[swarm intelligence]], [[statistics]] and [[genetic algorithm]]s. In machine learning, the environment is typically represented as a [[Markov Decision Process]] (MDP). Many reinforcement learning algorithms use [[dynamic programming]] techniques.<ref>{{Cite book|title=Reinforcement learning and markov decision processes|author1=van Otterlo, M.|author2=Wiering, M.|journal=Reinforcement Learning |volume=12|pages=3–42 |year=2012 |doi=10.1007/978-3-642-27645-3_1|series=Adaptation, Learning, and Optimization|isbn=978-3-642-27644-6}}</ref> Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent. | | Reinforcement learning is an area of machine learning concerned with how [[software agent]]s ought to take [[Action selection|actions]] in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as [[game theory]], [[control theory]], [[operations research]], [[information theory]], [[simulation-based optimization]], [[multi-agent system]]s, [[swarm intelligence]], [[statistics]] and [[genetic algorithm]]s. In machine learning, the environment is typically represented as a [[Markov Decision Process]] (MDP). Many reinforcement learning algorithms use [[dynamic programming]] techniques.<ref>{{Cite book|title=Reinforcement learning and markov decision processes|author1=van Otterlo, M.|author2=Wiering, M.|journal=Reinforcement Learning |volume=12|pages=3–42 |year=2012 |doi=10.1007/978-3-642-27645-3_1|series=Adaptation, Learning, and Optimization|isbn=978-3-642-27644-6}}</ref> Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent. |
| Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In machine learning, the environment is typically represented as a Markov Decision Process (MDP). Many reinforcement learning algorithms use dynamic programming techniques. Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent. | | Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In machine learning, the environment is typically represented as a Markov Decision Process (MDP). Many reinforcement learning algorithms use dynamic programming techniques. Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent. |
− | 强化学习是机器学习的一个领域,它研究软件组件应该如何在某个环境中进行行动决策,以便最大化某种累积收益的概念。由于其存在的普遍性,该领域的研究在许多其他学科,如'''博弈论 Game Theory''','''控制理论 Control Theory''','''运筹学 Operations Research''','''信息论 Information Theory''','''基于仿真的优化 Simulation-based Optimization''','''多主体系统 Multi-agent System''','''群体智能 Swarm Intelligence''','''统计学 Statistics'''和'''遗传算法 Genetic Algorithm'''。在机器学习中,环境通常被表示为'''马可夫决策过程 Markov Decision Process ,MDP'''。许多强化学习算法使用动态编程技术。强化学习算法不需要知道 MDP 的精确数学模型,而是在精确模型不可行的情况下使用。强化学习算法常用于车辆自动驾驶问题或人机游戏场景。
| + | 强化学习是机器学习的一个分支,它研究软件组件应该如何在某个环境中进行行动决策,以便最大化某种累积收益的概念。由于其存在的普遍性,该领域的研究在许多其他学科,如'''博弈论 Game Theory''','''控制理论 Control Theory''','''运筹学 Operations Research''','''信息论 Information Theory''','''基于仿真的优化 Simulation-based Optimization''','''多主体系统 Multi-agent System''','''群体智能 Swarm Intelligence''','''统计学 Statistics'''和'''遗传算法 Genetic Algorithm'''。在机器学习中,环境通常被表示为'''马可夫决策过程 Markov Decision Process ,MDP'''。许多强化学习算法使用动态编程技术。强化学习算法不需要知道 MDP 的精确数学模型,而是在精确模型不可行的情况下使用。强化学习算法常用于车辆自动驾驶问题或人机游戏场景。 |