更改

人工神经网络 (查看源代码)

2018年8月24日 (五) 14:03的版本

添加13字节、 2018年8月24日 (五) 14:03

第156行：第156行：

====强化学习（Reinforcement learning）====

−

在[https://en.wikipedia.org/wiki/Reinforcement_learning 强化学习]中，数据<math>\textstyle x</math> 通常不被给出，而是由一个代理人与环境的交互生成。在每个时间点 <math>\textstyle {t}</math>，代理做出一个动作 <math>\textstyle y_t</math>，环境根据某种（通常未知）动力学产生一个观测值 <math>\textstyle x_t</math> ，和一个瞬时损失<math>\textstyle c_t</math>。目标是找到一个选择动作的方针，它最小化长期损失的某种衡量。例如，期望积累损失。环境的动力学和每种方针的长期损失通常未知，但可以估计。

+

在[https://en.wikipedia.org/wiki/Reinforcement_learning 强化学习]中，数据<math>\textstyle x</math> 通常不被给出，而是由一个代理人与环境的交互生成。在每个时间点 <math>{\textstyle{t}}</math>，代理做出一个动作 <math>\textstyle y_t</math>，环境根据某种（通常未知）动力学产生一个观测值 <math>\textstyle x_t</math> ，和一个瞬时损失<math>\textstyle c_t</math>。目标是找到一个选择动作的方针，它最小化长期损失的某种衡量。例如，期望积累损失。环境的动力学和每种方针的长期损失通常未知，但可以估计。

−

更正式地说，环境被建模成[https://en.wikipedia.org/wiki/Markov_decision_process 马尔科夫决策过程] (MDP)，具有如下概率分布的状态 <math>\textstyle {s_1,...,s_n}\in S </math>和动作 <math>\textstyle {{a_1,...,a_m} \in A}</math>：瞬时损失分布 <math>\textstyle P(c_t|s_t)</math>,观测分布 <math>\textstyle P(x_t|s_t)</math> 和转移 <math>\textstyle P(s_{t+1}|s_t, a_t)</math>, 方针被定义为给定观测值的动作上的条件分布。合起来，这二者定义了一个[https://en.wikipedia.org/wiki/Markov_chain 马尔科夫链]（MC）。目标是找到最小化损失的方针（也就是MC）。

+

更正式地说，环境被建模成[https://en.wikipedia.org/wiki/Markov_decision_process 马尔科夫决策过程] (MDP)，具有如下概率分布的状态 <math>\textstyle {s_1,...,s_n}\in S </math>和动作 <math>\textstyle {{a_1,...,a_m} \in A}</math>：瞬时损失分布 <math>\textstyle P(c_t|s_t)</math>,观测分布 <math>\textstyle {P({x_t}|{s_t})}</math>和转移<math>\textstyle {P({s_{t+1}}|{s_t}, {a_t})}</math>, 方针被定义为给定观测值的动作上的条件分布。合起来，这二者定义了一个[https://en.wikipedia.org/wiki/Markov_chain 马尔科夫链]（MC）。目标是找到最小化损失的方针（也就是MC）。

强化学习中，ANN通常被用作整个算法的一部分。[https://en.wikipedia.org/wiki/Dimitri_Bertsekas Bertsekas]和Tsiksiklis给[https://en.wikipedia.org/wiki/Dynamic_programming 动态编程]加上ANN（给出神经动力的编程）并应用到如[https://en.wikipedia.org/wiki/Vehicle_routing 车辆路径]和[https://en.wikipedia.org/wiki/Natural_resource_management 自然资源管理]或[https://en.wikipedia.org/wiki/Medicine 医药]领域中的多维非线性问题。因为ANN能够减小精度损失，甚至在为数值逼近原始控制问题解而降低离散化网格密度时。

匿名用户

http://c2.com/cgi/wiki?$1>Cynthia