更改

机器学习 (查看源代码)

2020年7月17日 (五) 09:44的版本

添加268字节、 2020年7月17日 (五) 09:44

→‎Reinforcement learning

第505行：第505行：

在'''弱监督学习 Weak supervision'''中，训练标签是有噪声的、有限的或不精确的; 然而，这些标签使用起来往往更加“实惠”——这种数据更容易得到、更容易拥有更大的有效训练集。

−

==== Reinforcement learning ====

+

==== 强化学习 Reinforcement learning ====

第513行：第513行：

Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In machine learning, the environment is typically represented as a Markov Decision Process (MDP). Many reinforcement learning algorithms use dynamic programming techniques. Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent.

−

强化学习学习是机器学习的一个领域，它研究软件代理应该如何在一个环境中采取行动，以便最大化某种累积回报的概念。由于其普遍性，该领域的研究在许多其他学科，如博弈论，控制理论，运筹学，信息论，基于仿真的优化，多智能体系统，群体智能，统计学和遗传算法。在机器学习中，环境通常被表示为马可夫决策过程。许多强化学习算法使用动态编程技术。强化学习算法不需要知道 MDP ~~的精确数学模型，而是在精确模型不可行的情况下使用。强化学习算法用于自动驾驶车辆或学习与人类对手玩游戏。~~

+

强化学习是机器学习的一个领域，它研究软件组件应该如何在某个环境中进行行动决策，以便最大化某种累积收益的概念。由于其存在的普遍性，该领域的研究在许多其他学科，如'''博弈论 Game theory'''，'''控制理论 Control theory'''，'''运筹学 Operations research'''，'''信息论 Information theory'''，'''基于仿真的优化 Simulation-based optimization'''，'''多主体系统 Multi-agent system'''，'''群体智能 Swarm intelligence'''，'''统计学 Statistics'''和'''遗传算法 Genetic algorithm'''。在机器学习中，环境通常被表示为'''马可夫决策过程 Markov Decision Process ，MDP'''。许多强化学习算法使用动态编程技术。强化学习算法不需要知道 MDP 的精确数学模型，而是在精确模型不可行的情况下使用。强化学习算法常用于车辆自动驾驶问题或人机游戏场景。

−

==== 自学习 Self learning ====

Yillia Jing

463

个编辑