更改

第366行: 第366行:  
==== 强化学习 Reinforcement learning ====
 
==== 强化学习 Reinforcement learning ====
   −
{{Main|Reinforcement learning}}
+
:''主文章:[[强化学习]]''
 +
强化学习是指一个''智能体(agent)''应该如何在''环境''中采取''行动'',从而最大限度地获得长期''报酬''的概念。强化学习算法试图找到一种''策略'',将世界''状态''映射到智能体在这些状态中应该采取的行动。强化学习不同于[https://en.wikipedia.org/wiki/Supervised_learning 监督学习]问题,因为不会提供正确的输入/输出对,也没有明确地修正次优行为。
    
Reinforcement learning is an area of machine learning concerned with how [[software agent]]s ought to take [[Action selection|actions]] in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as [[game theory]], [[control theory]], [[operations research]], [[information theory]], [[simulation-based optimization]], [[multi-agent system]]s, [[swarm intelligence]], [[statistics]] and [[genetic algorithm]]s. In machine learning, the environment is typically represented as a [[Markov Decision Process]] (MDP). Many reinforcement learning algorithms use [[dynamic programming]] techniques.<ref>{{Cite book|title=Reinforcement learning and markov decision processes|author1=van Otterlo, M.|author2=Wiering, M.|journal=Reinforcement Learning |volume=12|pages=3–42 |year=2012 |doi=10.1007/978-3-642-27645-3_1|series=Adaptation, Learning, and Optimization|isbn=978-3-642-27644-6}}</ref> Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent.
 
Reinforcement learning is an area of machine learning concerned with how [[software agent]]s ought to take [[Action selection|actions]] in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as [[game theory]], [[control theory]], [[operations research]], [[information theory]], [[simulation-based optimization]], [[multi-agent system]]s, [[swarm intelligence]], [[statistics]] and [[genetic algorithm]]s. In machine learning, the environment is typically represented as a [[Markov Decision Process]] (MDP). Many reinforcement learning algorithms use [[dynamic programming]] techniques.<ref>{{Cite book|title=Reinforcement learning and markov decision processes|author1=van Otterlo, M.|author2=Wiering, M.|journal=Reinforcement Learning |volume=12|pages=3–42 |year=2012 |doi=10.1007/978-3-642-27645-3_1|series=Adaptation, Learning, and Optimization|isbn=978-3-642-27644-6}}</ref> Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent.
第372行: 第373行:  
Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In machine learning, the environment is typically represented as a Markov Decision Process (MDP). Many reinforcement learning algorithms use dynamic programming techniques. Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent.
 
Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In machine learning, the environment is typically represented as a Markov Decision Process (MDP). Many reinforcement learning algorithms use dynamic programming techniques. Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent.
   −
强化学习是机器学习的一个领域,它研究软件组件应该如何在某个环境中进行行动决策,以便最大化某种累积收益的概念。由于其存在的普遍性,该领域的研究在许多其他学科,如'''博弈论 Game Theory''','''控制理论 Control Theory''','''运筹学 Operations Research''','''信息论 Information Theory''','''基于仿真的优化 Simulation-based Optimization''','''多主体系统 Multi-agent System''','''群体智能 Swarm Intelligence''','''统计学 Statistics'''和'''遗传算法 Genetic Algorithm'''。在机器学习中,环境通常被表示为'''马可夫决策过程 Markov Decision Process ,MDP'''。许多强化学习算法使用动态编程技术。强化学习算法不需要知道 MDP 的精确数学模型,而是在精确模型不可行的情况下使用。强化学习算法常用于车辆自动驾驶问题或人机游戏场景。
+
强化学习是机器学习的一个分支,它研究软件组件应该如何在某个环境中进行行动决策,以便最大化某种累积收益的概念。由于其存在的普遍性,该领域的研究在许多其他学科,如'''博弈论 Game Theory''','''控制理论 Control Theory''','''运筹学 Operations Research''','''信息论 Information Theory''','''基于仿真的优化 Simulation-based Optimization''','''多主体系统 Multi-agent System''','''群体智能 Swarm Intelligence''','''统计学 Statistics'''和'''遗传算法 Genetic Algorithm'''。在机器学习中,环境通常被表示为'''马可夫决策过程 Markov Decision Process ,MDP'''。许多强化学习算法使用动态编程技术。强化学习算法不需要知道 MDP 的精确数学模型,而是在精确模型不可行的情况下使用。强化学习算法常用于车辆自动驾驶问题或人机游戏场景。
    
==== 自学习 Self learning ====  
 
==== 自学习 Self learning ====  
463

个编辑