更改
跳到导航
跳到搜索
第83行:
第83行:
− +
第101行:
第101行:
− +
− +
第115行:
第115行:
− +
第143行:
第143行:
− +
− +
− +
第159行:
第159行:
− +
第174行:
第174行:
− +
第183行:
第183行:
− +
− +
− +
→模型
==模型==
==模型==
一个“人工神经网络”是一个称为[https://en.wikipedia.org/wiki/Artificial_neurons 人工神经元]的简单元素的网络,它们接收输入,根据输入改变内部状态(“激活”),然后依靠输入和激活产生输出,通过连接某些神经元的输出到其他神经元的输入的“网络”形式构成了一个[https://en.wikipedia.org/wiki/Directed_graph 有向的][https://en.wikipedia.org/wiki/Weighted_graph 有权图]。权重和[https://en.wikipedia.org/wiki/Activation_function 计算激活的函数]可以被称为“学习”的过程改变,这被[https://en.wikipedia.org/wiki/Learning_rule 学习规则]控制。<ref name=Zell1994ch5.2>{{cite book |last=Zell |first=Andreas |year=1994 |title=Simulation Neuronaler Netze |trans-title=Simulation of Neural Networks |language=German |edition=1st |publisher=Addison-Wesley |chapter=chapter 5.2 |isbn=3-89319-554-8}}</ref>
一个“人工神经网络”是一个称为[https://en.wikipedia.org/wiki/Artificial_neurons 人工神经元]的简单元素的网络,它们接收输入,根据输入改变内部状态(“激活”),然后依靠输入和激活产生输出,通过连接某些神经元的输出到其他神经元的输入的“网络”形式构成了一个[https://en.wikipedia.org/wiki/Directed_graph 有向的][https://en.wikipedia.org/wiki/Weighted_graph 有权图]。权重和[https://en.wikipedia.org/wiki/Activation_function 计算激活的函数]可以被称为“学习”的过程改变,这被[https://en.wikipedia.org/wiki/Learning_rule 学习规则]控制。<ref name=Zell1994ch5.2>{{cite book |last=Zell |first=Andreas |year=1994 |title=Simulation Neuronaler Netze |trans-title=Simulation of Neural Networks |language=German |edition=1st |publisher=Addison-Wesley |chapter=chapter 5.2}}</ref>
===人工神经网络的组成部分(Components of an artificial neural network)===
===人工神经网络的组成部分(Components of an artificial neural network)===
====连接和权重(Connections and weights)====
====连接和权重(Connections and weights)====
网络由连接组成,每个连接传递一个神经元的输出 <math>{i}</math> 到一个神经元的输入 <math>{j}</math>. 从这个角度来说, <math>{i}</math> 是 <math>{j}</math> 的前驱, <math>{j}</math> 是 <math>{i}</math> 的后继.每个连接被赋予一个权重 <math>{w_{ij}}</math>.<ref name=Zell1994ch5.2 />有时一个偏置项加在输入的总权重和上,用作变化激活函数的阈值。<ref name="Abbod2007">{{cite journal|year=2007|title=Application of Artificial Intelligence to the Management of Urological Cancer|url=https://www.sciencedirect.com/science/article/pii/S0022534707013936|journal=The Journal of Urology|volume=178|issue=4|pages=1150-1156|doi=10.1016/j.juro.2007.05.122|last1=Abbod|first1=Maysam F}}</ref>.
网络由连接组成,每个连接传递一个神经元的输出 <math>{i}</math> 到一个神经元的输入 <math>{j}</math>. 从这个角度来说, <math>{i}</math> 是 <math>{j}</math> 的前驱, <math>{j}</math> 是 <math>{i}</math> 的后继.每个连接被赋予一个权重 <math>{w_{ij}}</math>.<ref name=Zell1994ch5.2 />有时一个偏置项加在输入的总权重和上,用作变化激活函数的阈值。<ref name="Abbod2007">{{cite journal|year=2007|title=Application of Artificial Intelligence to the Management of Urological Cancer|url=https://www.sciencedirect.com/science/article/pii/S0022534707013936|journal=The Journal of Urology|volume=178|issue=4|pages=1150-1156|last1=Abbod|first1=Maysam F}}</ref>.
====传播函数(Propagation function)====
====传播函数(Propagation function)====
“传播函数”计算“从前驱神经元的输出<math>o_i(t)</math>到神经元 <math>{j}</math>的输入”<math>p_j(t)</math>通常有这种形式:<ref name=Zell1994ch5.2 />
“传播函数”计算“从前驱神经元的输出<math>o_i(t)</math>到神经元 <math>{j}</math>的输入”<math>p_j(t)</math>通常有这种形式:<ref name=Zell1994ch5.2 />
: <math> {p_j}(t) = {\sum_{i}} {o_i}(t) {w_{ij}} </math>
: <math> {p_j}(t) = {\sum_{i}} {o_i}(t) {w_{ij}} </math>
当偏置值加在函数上时,上面的形式变成下面的:<ref name="DAWSON1998">{{cite journal|year=1998|title=An artificial neural network approach to rainfall-runoff modelling|url=https://www.tandfonline.com/doi/abs/10.1080/02626669809492102|journal=Hydrological Sciences Journal|volume=43|issue=1|pages=47-66|doi=10.1080/02626669809492102|last1=DAWSON|first1=CHRISTIAN W}}</ref>
当偏置值加在函数上时,上面的形式变成下面的:<ref name="DAWSON1998">{{cite journal|year=1998|title=An artificial neural network approach to rainfall-runoff modelling|url=https://www.tandfonline.com/doi/abs/10.1080/02626669809492102|journal=Hydrological Sciences Journal|volume=43|issue=1|pages=47-66|last1=DAWSON|first1=CHRISTIAN W}}</ref>
: <math> {p_j}(t) = {\sum_{i}} {o_i}(t) {w_{ij}}+ {w_{0j}} </math>,
: <math> {p_j}(t) = {\sum_{i}} {o_i}(t) {w_{ij}}+ {w_{0j}} </math>,
其中<math>{{w_{0j}}}</math>是偏置。
其中<math>{{w_{0j}}}</math>是偏置。
===作为函数的神经网络(Neural networks as functions)===
===作为函数的神经网络(Neural networks as functions)===
神经网络模型可以被看成简单的数学模型,定义为一个函数<math>\textstyle f : X \rightarrow Y </math> 或者是一个 <math>\textstyle X</math> 上或 <math>\textstyle X</math> 和<math>\textstyle Y</math>上的分布。有时模型与一个特定学习规则紧密联系。短语“ANN模型”的通常使用确实是这种函数的“类”的定义(类的成员被不同参数,连接权重或结构的细节如神经元数量或他们的连接获得)
神经网络模型可以被看成简单的数学模型,定义为一个函数<math>\textstyle f : X \rightarrow Y </math> 或者是一个 <math>\textstyle X</math> 上或 <math>\textstyle X</math> 和<math>\textstyle Y</math>上的分布。有时模型与一个特定学习规则紧密联系。短语“ANN模型”的通常使用确实是这种函数的“类”的定义(类的成员被不同参数,连接权重或结构的细节如神经元数量或他们的连接获得)
数学上,一个神经元的网络函数 <math>\textstyle f(x)</math> 被定义为其他函数<math>{{g_i}(x)}</math>的组合,它可以之后被分解为其他函数。这可以被方便地用一个网络结构表示,它有箭头描述函数间的依赖关系。一类广泛应用的组合是“非线性加权和”, <math>\textstyle f(x) = K \left(\sum_i w_i g_i(x)\right) </math>, 其中 <math>\textstyle K</math> (通常称为[https://en.wikipedia.org/wiki/Activation_function 激活函数]<ref>{{cite web|url=http://www.cse.unsw.edu.au/~billw/mldict.html#activnfn|title=The Machine Learning Dictionary}}</ref>) 是某种预定义的函数,如[https://en.wikipedia.org/wiki/Hyperbolic_function#Standard_analytic_expressions 双曲正切]或[https://en.wikipedia.org/wiki/Sigmoid_function 双弯曲函数] 或[https://en.wikipedia.org/wiki/Softmax_function 柔性最大值传输函数]或[https://en.wikipedia.org/wiki/ReLU 线性整流函数]。激活函数最重要的特点是它随输入值变化提供一个平滑的过渡,例如,在输入中一个小的变化产生输出中一个小的变化 。下面指的是一组函数 <math>\textstyle g_i</math>作为[https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics) 向量] <math>\textstyle g = (g_1, g_2, \ldots, g_n)</math>.
数学上,一个神经元的网络函数 <math>\textstyle f(x)</math> 被定义为其他函数<math>{{g_i}(x)}</math>的组合,它可以之后被分解为其他函数。这可以被方便地用一个网络结构表示,它有箭头描述函数间的依赖关系。一类广泛应用的组合是“非线性加权和”, <math>\textstyle f(x) = K \left(\sum_i w_i g_i(x)\right) </math>, 其中 <math>\textstyle K</math> (通常称为[https://en.wikipedia.org/wiki/Activation_function 激活函数]<ref>{{cite journal|url=http://www.cse.unsw.edu.au/~billw/mldict.html#activnfn|title=The Machine Learning Dictionary}}</ref>) 是某种预定义的函数,如[https://en.wikipedia.org/wiki/Hyperbolic_function#Standard_analytic_expressions 双曲正切]或[https://en.wikipedia.org/wiki/Sigmoid_function 双弯曲函数] 或[https://en.wikipedia.org/wiki/Softmax_function 柔性最大值传输函数]或[https://en.wikipedia.org/wiki/ReLU 线性整流函数]。激活函数最重要的特点是它随输入值变化提供一个平滑的过渡,例如,在输入中一个小的变化产生输出中一个小的变化 。下面指的是一组函数 <math>\textstyle g_i</math>作为[https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics) 向量] <math>\textstyle g = (g_1, g_2, \ldots, g_n)</math>.
[[File:Ann_dependency_(graph).svg.png|150px|ANN依赖图]]
[[File:Ann_dependency_(graph).svg.png|150px|ANN依赖图]]
一个[https://en.wikipedia.org/wiki/Deep_neural_network 深度神经网络]可以使用标准反向传播算法判别地训练。反向传播是一种计算关于ANN中权重的[https://en.wikipedia.org/wiki/Loss_function 损失函数](产生与给定状态相联系的损失)[https://en.wikipedia.org/wiki/Gradient 梯度]的方法。
一个[https://en.wikipedia.org/wiki/Deep_neural_network 深度神经网络]可以使用标准反向传播算法判别地训练。反向传播是一种计算关于ANN中权重的[https://en.wikipedia.org/wiki/Loss_function 损失函数](产生与给定状态相联系的损失)[https://en.wikipedia.org/wiki/Gradient 梯度]的方法。
连续反向传播的基础<ref name="SCHIDHUB2"/><ref name="scholarpedia2">{{cite journal|year=2015|title=Deep Learning|url=http://www.scholarpedia.org/article/Deep_Learning|journal=Scholarpedia|volume=10|issue=11|page=32832|doi=10.4249/scholarpedia.32832|last1=Schmidhuber|first1=Jürgen|authorlink=Jürgen Schmidhuber|bibcode=2015SchpJ..1032832S}}</ref><ref name=":5">{{Cite journal|last=Dreyfus|first=Stuart E.|date=1990-09-01|title=Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure|url=http://arc.aiaa.org/doi/10.2514/3.25422|journal=Journal of Guidance, Control, and Dynamics|volume=13|issue=5|pages=926–928|doi=10.2514/3.25422|issn=0731-5090|bibcode=1990JGCD...13..926D}}</ref><ref name="mizutani2000">Eiji Mizutani, [[Stuart Dreyfus]], Kenichi Nishio (2000). On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2000), Como Italy, July 2000. [http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/ijcnn2k.pdf Online]</ref> 由[https://en.wikipedia.org/wiki/Henry_J._Kelley Kelley]<ref name="kelley1960">{{cite journal|year=1960|title=Gradient theory of optimal flight paths|url=http://arc.aiaa.org/doi/abs/10.2514/8.5282?journalCode=arsj|journal=Ars Journal|volume=30|issue=10|pages=947–954|doi=10.2514/8.5282|last1=Kelley|first1=Henry J.|authorlink=Henry J. Kelley}}</ref> 在1960和[https://en.wikipedia.org/wiki/Arthur_E._Bryson Bryson]在1961<ref name="bryson1961">[[Arthur E. Bryson]] (1961, April). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard Univ. Symposium on digital computers and their applications.</ref>使用[https://en.wikipedia.org/wiki/Chain_rule 动态编程]的原则从[https://en.wikipedia.org/wiki/Control_theory 控制论]引出。1962,[https://en.wikipedia.org/wiki/Stuart_Dreyfus Dreyfus]发表了只基于[https://en.wikipedia.org/wiki/Chain_rule 链式法则]<ref name="dreyfus1962">{{cite journal|year=1962|title=The numerical solution of variational problems|url=https://www.researchgate.net/publication/256244271_The_numerical_solution_of_variational_problems|journal=Journal of Mathematical Analysis and Applications|volume=5|issue=1|pages=30–45|doi=10.1016/0022-247x(62)90004-5|last1=Dreyfus|first1=Stuart|authorlink=Stuart Dreyfus}}</ref>的更简单的衍生。1969,Bryson和[https://en.wikipedia.org/wiki/Yu-Chi_Ho Ho]把它描述成一种多级动态系统优化方法。<ref>{{cite book|url={{google books |plainurl=y |id=8jZBksh-bUMC|page=578}}|title=Artificial Intelligence A Modern Approach|last2=Norvig|first2=Peter|publisher=Prentice Hall|year=2010|isbn=978-0-13-604259-4|page=578|quote=The most popular method for learning in multilayer networks is called Back-propagation.|author-link2=Peter Norvig|first1=Stuart J.|last1=Russell|author-link1=Stuart J. Russell}}</ref><ref name="Bryson1969">{{cite book|url={{google books |plainurl=y |id=1bChDAEACAAJ|page=481}}|title=Applied Optimal Control: Optimization, Estimation and Control|last=Bryson|first=Arthur Earl|publisher=Blaisdell Publishing Company or Xerox College Publishing|year=1969|page=481}}</ref>1970,[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Linnainmaa]最终发表了嵌套[https://en.wikipedia.org/wiki/Differentiable_function 可微函数]<ref name="lin1970">[[Seppo Linnainmaa]] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6–7.</ref><ref name="lin1976">{{cite journal|year=1976|title=Taylor expansion of the accumulated rounding error|url=|journal=BIT Numerical Mathematics|volume=16|issue=2|pages=146–160|doi=10.1007/bf01931367|last1=Linnainmaa|first1=Seppo|authorlink=Seppo Linnainmaa}}</ref> 的离散连接网络[https://en.wikipedia.org/wiki/Automatic_differentiation 自动差分机](AD)的通用方法。这对应于反向传播的现代版本,它在网络稀疏时仍有效<ref name="SCHIDHUB2"/><ref name="scholarpedia2"/><ref name="grie2012">{{Cite journal|last=Griewank|first=Andreas|date=2012|title=Who Invented the Reverse Mode of Differentiation?|url=http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|journal=Documenta Matematica, Extra Volume ISMP|volume=|pages=389–400|via=}}</ref><ref name="grie2008">{{cite book|url={{google books |plainurl=y |id=xoiiLaRxcbEC}}|title=Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition|last2=Walther|first2=Andrea|publisher=SIAM|year=2008|isbn=978-0-89871-776-1|first1=Andreas|last1=Griewank}}</ref>。1973<ref name="dreyfus1973">{{cite journal|year=1973|title=The computational solution of optimal control problems with time lag|url=|journal=IEEE Transactions on Automatic Control|volume=18|issue=4|pages=383–385|doi=10.1109/tac.1973.1100330|last1=Dreyfus|first1=Stuart|authorlink=Stuart Dreyfus}}</ref> ,Dreyfus使用反向传播适配与误差梯度成比例的控制器[https://en.wikipedia.org/wiki/Parameter 参数]。1974,[https://en.wikipedia.org/wiki/Paul_Werbos Werbos]提出将这个规则应用到ANN上的可能<ref name="werbos1974">[[Paul Werbos]] (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.</ref>,1982他将LInnainmaa的AD方法以今天广泛使用的方式应用到神经网络上<ref name="scholarpedia2"/><ref name="werbos1982">{{Cite book|url=http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|title=System modeling and optimization|last=Werbos|first=Paul|authorlink=Paul Werbos|publisher=Springer|year=1982|isbn=|location=|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis}}</ref>。1986, [https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart], Hinton和[https://en.wikipedia.org/wiki/Ronald_J._Williams Williams]注意到这种方法可以产生有用的神经网络隐藏层到来数据的内部表征。<ref name=":4">{{Cite journal|last=Rumelhart|first=David E.|last2=Hinton|first2=Geoffrey E.|last3=Williams|first3=Ronald J.|title=Learning representations by back-propagating errors|url=http://www.nature.com/articles/Art323533a0|journal=Nature|volume=323|issue=6088|pages=533–536|doi=10.1038/323533a0|year=1986|bibcode=1986Natur.323..533R}}</ref> 1933,Wan第一个<ref name="SCHIDHUB2"/> 用反向传播赢得国际模式识别竞赛。<ref name="wan1993">Eric A. Wan (1993). "Time series prediction by using a connectionist network with internal delay lines." In ''Proceedings of the Santa Fe Institute Studies in the Sciences of Complexity'', '''15''': p. 195. Addison-Wesley Publishing Co.</ref>
连续反向传播的基础<ref name="SCHIDHUB2"/><ref name="scholarpedia2">{{cite journal|year=2015|title=Deep Learning|url=http://www.scholarpedia.org/article/Deep_Learning|journal=Scholarpedia|volume=10|issue=11|page=32832|last1=Schmidhuber|first1=Jürgen|authorlink=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber}}</ref><ref name=":5">{{Cite journal|last=Dreyfus|first=Stuart E.|date=1990-09-01|title=Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure|url=http://arc.aiaa.org/doi/10.2514/3.25422|journal=Journal of Guidance, Control, and Dynamics|volume=13|issue=5|pages=926–928}}</ref><ref name="mizutani2000">Eiji Mizutani, [https://en.wikipedia.org/wiki/Stuart_Dreyfus Stuart Dreyfus], Kenichi Nishio (2000). On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2000), Como Italy, July 2000. [http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/ijcnn2k.pdf Online]</ref> 由[https://en.wikipedia.org/wiki/Henry_J._Kelley Kelley]<ref name="kelley1960">{{cite journal|year=1960|title=Gradient theory of optimal flight paths|url=http://arc.aiaa.org/doi/abs/10.2514/8.5282?journalCode=arsj|journal=Ars Journal|volume=30|issue=10|pages=947–954|last1=Kelley|first1=Henry J.|authorlink=https://en.wikipedia.org/wiki/Henry_J._Kelley}}</ref> 在1960和[https://en.wikipedia.org/wiki/Arthur_E._Bryson Bryson]在1961<ref name="bryson1961">[https://en.wikipedia.org/wiki/Arthur_E._Bryson Arthur E. Bryson] (1961, April). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard Univ. Symposium on digital computers and their applications.</ref>使用[https://en.wikipedia.org/wiki/Chain_rule 动态编程]的原则从[https://en.wikipedia.org/wiki/Control_theory 控制论]引出。1962,[https://en.wikipedia.org/wiki/Stuart_Dreyfus Dreyfus]发表了只基于[https://en.wikipedia.org/wiki/Chain_rule 链式法则]<ref name="dreyfus1962">{{cite journal|year=1962|title=The numerical solution of variational problems|url=https://www.researchgate.net/publication/256244271_The_numerical_solution_of_variational_problems|journal=Journal of Mathematical Analysis and Applications|volume=5|issue=1|pages=30–45|last1=Dreyfus|first1=Stuart|authorlink=https://en.wikipedia.org/wiki/Stuart_Dreyfus}}</ref>的更简单的衍生。1969,Bryson和[https://en.wikipedia.org/wiki/Yu-Chi_Ho Ho]把它描述成一种多级动态系统优化方法。<ref>{{cite book|url=https://books.google.com/books?id=8jZBksh-bUMC&pg=PA578|title=Artificial Intelligence A Modern Approach|last2=Norvig|first2=Peter|publisher=Prentice Hall|year=2010|page=578|quote=The most popular method for learning in multilayer networks is called Back-propagation.|author-link2=https://en.wikipedia.org/wiki/Peter_Norvig|first1=Stuart J.|last1=Russell|author-link1=https://en.wikipedia.org/wiki/Stuart_J._Russell}}</ref><ref name="Bryson1969">{{cite book|url=https://books.google.com/books?id=1bChDAEACAAJ&pg=PA481|title=Applied Optimal Control: Optimization, Estimation and Control|last=Bryson|first=Arthur Earl|publisher=Blaisdell Publishing Company or Xerox College Publishing|year=1969|page=481}}</ref>1970,[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Linnainmaa]最终发表了嵌套[https://en.wikipedia.org/wiki/Differentiable_function 可微函数]<ref name="lin1970">[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Seppo Linnainmaa] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6–7.</ref><ref name="lin1976">{{cite journal|year=1976|title=Taylor expansion of the accumulated rounding error|url=|journal=BIT Numerical Mathematics|volume=16|issue=2|pages=146–160|last1=Linnainmaa|first1=Seppo|authorlink=https://en.wikipedia.org/wiki/Seppo_Linnainmaa}}</ref> 的离散连接网络[https://en.wikipedia.org/wiki/Automatic_differentiation 自动差分机](AD)的通用方法。这对应于反向传播的现代版本,它在网络稀疏时仍有效<ref name="SCHIDHUB2"/><ref name="scholarpedia2"/><ref name="grie2012">{{Cite journal|last=Griewank|first=Andreas|date=2012|title=Who Invented the Reverse Mode of Differentiation?|url=http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|journal=Documenta Matematica, Extra Volume ISMP|volume=|pages=389–400|via=}}</ref><ref name="grie2008">{{cite book|url=https://books.google.com/books?id=xoiiLaRxcbEC|title=Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition|last2=Walther|first2=Andrea|publisher=SIAM|year=2008|first1=Andreas|last1=Griewank}}</ref>。1973<ref name="dreyfus1973">{{cite journal|year=1973|title=The computational solution of optimal control problems with time lag|url=|journal=IEEE Transactions on Automatic Control|volume=18|issue=4|pages=383–385|last1=Dreyfus|first1=Stuart|authorlink=https://en.wikipedia.org/wiki/Stuart_Dreyfus}}</ref> ,Dreyfus使用反向传播适配与误差梯度成比例的控制器[https://en.wikipedia.org/wiki/Parameter 参数]。1974,[https://en.wikipedia.org/wiki/Paul_Werbos Werbos]提出将这个规则应用到ANN上的可能<ref name="werbos1974">https://en.wikipedia.org/wiki/Paul_Werbos (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.</ref>,1982他将LInnainmaa的AD方法以今天广泛使用的方式应用到神经网络上<ref name="scholarpedia2"/><ref name="werbos1982">{{Cite book|url=http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|title=System modeling and optimization|last=Werbos|first=Paul|authorlink=https://en.wikipedia.org/wiki/Paul_Werbos|publisher=Springer|year=1982|location=|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis}}</ref>。1986, [https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart], Hinton和[https://en.wikipedia.org/wiki/Ronald_J._Williams Williams]注意到这种方法可以产生有用的神经网络隐藏层到来数据的内部表征。<ref name=":4">{{Cite journal|last=Rumelhart|first=David E.|last2=Hinton|first2=Geoffrey E.|last3=Williams|first3=Ronald J.|title=Learning representations by back-propagating errors|url=http://www.nature.com/articles/Art323533a0|journal=Nature|volume=323|issue=6088|pages=533–536|year=1986}}</ref> 1933,Wan第一个<ref name="SCHIDHUB2"/> 用反向传播赢得国际模式识别竞赛。<ref name="wan1993">Eric A. Wan (1993). "Time series prediction by using a connectionist network with internal delay lines." In ''Proceedings of the Santa Fe Institute Studies in the Sciences of Complexity'', '''15''': p. 195. Addison-Wesley Publishing Co.</ref>
反向传播的权重更新可以通过[https://en.wikipedia.org/wiki/Stochastic_gradient_descent 随机梯度下降]完成,使用下面的等式:
反向传播的权重更新可以通过[https://en.wikipedia.org/wiki/Stochastic_gradient_descent 随机梯度下降]完成,使用下面的等式:
: <math> w_{ij}(t + 1) = w_{ij}(t) + \eta\frac{\partial C}{\partial w_{ij}} +\xi(t) </math>
: <math> w_{ij}(t + 1) = w_{ij}(t) + \eta\frac{\partial C}{\partial w_{ij}} +\xi(t) </math>
其中<math> \eta </math> 是学习速率, <math> {C} </math>是损失函数, <math>\xi(t)</math> 是一个随机项。损失函数的选择由如学习类型(监督,无监督,强化等等)和[https://en.wikipedia.org/wiki/Activation_function 激活函数]等因素决定。例如,当在[https://en.wikipedia.org/wiki/Multiclass_classification 多类分类]问题上使用监督学习,激活函数和损失函数的通常选择分别是[https://en.wikipedia.org/wiki/Softmax_activation_function 柔性最大值传输函数]和[https://en.wikipedia.org/wiki/Cross_entropy 交叉熵]函数。柔性最大值传输函数定义为 <math> p_j = \frac{\exp(x_j)}{\sum_k \exp(x_k)} </math> 其中 <math> p_j </math> 代表类的概率(单元<math> {j} </math>的输出), <math> x_j </math> 和 <math> x_k </math> 分别代表单元<math> {j} </math>和<math> k </math>在相同程度上的总输入。交叉熵定义为 <math> {C} = -\sum_j d_j \log(p_j) </math> 其中 <math> d_j </math> 代表输出单元<math> {j} </math> 的目标概率, <math> p_j </math> 是应用激活函数后 <math> {j} </math>的输出概率。<ref>{{Cite journal|last=Hinton|first=G.|last2=Deng|first2=L.|last3=Yu|first3=D.|last4=Dahl|first4=G. E.|last5=Mohamed|first5=A. r|last6=Jaitly|first6=N.|last7=Senior|first7=A.|last8=Vanhoucke|first8=V.|last9=Nguyen|first9=P.|date=November 2012|title=Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups|url=http://ieeexplore.ieee.org/document/6296526/|journal=IEEE Signal Processing Magazine|volume=29|issue=6|pages=82–97|doi=10.1109/msp.2012.2205597|issn=1053-5888|bibcode=2012ISPM...29...82H}}</ref>
其中<math> \eta </math> 是学习速率, <math> {C} </math>是损失函数, <math>\xi(t)</math> 是一个随机项。损失函数的选择由如学习类型(监督,无监督,强化等等)和[https://en.wikipedia.org/wiki/Activation_function 激活函数]等因素决定。例如,当在[https://en.wikipedia.org/wiki/Multiclass_classification 多类分类]问题上使用监督学习,激活函数和损失函数的通常选择分别是[https://en.wikipedia.org/wiki/Softmax_activation_function 柔性最大值传输函数]和[https://en.wikipedia.org/wiki/Cross_entropy 交叉熵]函数。柔性最大值传输函数定义为 <math> p_j = \frac{\exp(x_j)}{\sum_k \exp(x_k)} </math> 其中 <math> p_j </math> 代表类的概率(单元<math> {j} </math>的输出), <math> x_j </math> 和 <math> x_k </math> 分别代表单元<math> {j} </math>和<math> k </math>在相同程度上的总输入。交叉熵定义为 <math> {C} = -\sum_j d_j \log(p_j) </math> 其中 <math> d_j </math> 代表输出单元<math> {j} </math> 的目标概率, <math> p_j </math> 是应用激活函数后 <math> {j} </math>的输出概率。<ref>{{Cite journal|last=Hinton|first=G.|last2=Deng|first2=L.|last3=Yu|first3=D.|last4=Dahl|first4=G. E.|last5=Mohamed|first5=A. r|last6=Jaitly|first6=N.|last7=Senior|first7=A.|last8=Vanhoucke|first8=V.|last9=Nguyen|first9=P.|date=November 2012|title=Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups|url=http://ieeexplore.ieee.org/document/6296526/|journal=IEEE Signal Processing Magazine|volume=29|issue=6|pages=82–97}}</ref>
这可以被用于以二元掩码的形式输出目标[https://en.wikipedia.org/wiki/Minimum_bounding_box 包围盒]。它们也可以用于多元回归来增加局部精度。基于DNN的回归除作为一个好的分类器外还可以学习捕获几何信息特征。它们免除了显式模型部分和它们的关系。这有助于扩大可以被学习的目标种类。模型由多层组成,每层有一个[https://en.wikipedia.org/wiki/Rectified_linear_unit 线性整流单元]作为它的非线性变换激活函数。一些层是卷积的,其他层是全连接的。每个卷积层有一个额外的最大池化。这个网络被训练[https://en.wikipedia.org/wiki/Minimum_mean_square_error 最小化][https://en.wikipedia.org/wiki/L2_norm ''L''<sup>2</sup> 误差]
这可以被用于以二元掩码的形式输出目标[https://en.wikipedia.org/wiki/Minimum_bounding_box 包围盒]。它们也可以用于多元回归来增加局部精度。基于DNN的回归除作为一个好的分类器外还可以学习捕获几何信息特征。它们免除了显式模型部分和它们的关系。这有助于扩大可以被学习的目标种类。模型由多层组成,每层有一个[https://en.wikipedia.org/wiki/Rectified_linear_unit 线性整流单元]作为它的非线性变换激活函数。一些层是卷积的,其他层是全连接的。每个卷积层有一个额外的最大池化。这个网络被训练[https://en.wikipedia.org/wiki/Minimum_mean_square_error 最小化][https://en.wikipedia.org/wiki/L2_norm ''L''<sup>2</sup> 误差]
来预测整个训练集范围的掩码包含代表掩码的包围盒。【?】
来预测整个训练集范围的掩码包含代表掩码的包围盒。【?】
反向传播的替代包括[https://en.wikipedia.org/wiki/Extreme_Learning_Machines 极端学习机]<ref>{{cite journal|last2=Zhu|first2=Qin-Yu|last3=Siew|first3=Chee-Kheong|year=2006|title=Extreme learning machine: theory and applications|url=|journal=Neurocomputing|volume=70|issue=1|pages=489–501|doi=10.1016/j.neucom.2005.12.126|last1=Huang|first1=Guang-Bin}}</ref>,不使用回溯法<ref>{{cite arXiv|eprint=1507.07680|first=Yann|last=Ollivier|first2=Guillaume|last2=Charpiat|title=Training recurrent networks without backtracking|year=2015|class=cs.NE}}</ref>训练的“无权重”<ref>ESANN. 2009</ref><ref name="RBMTRAIN">{{Cite journal|last=Hinton|first=G. E.|date=2010|title=A Practical Guide to Training Restricted Boltzmann Machines|url=https://www.researchgate.net/publication/221166159_A_brief_introduction_to_Weightless_Neural_Systems|journal=Tech. Rep. UTML TR 2010-003,|volume=|pages=|via=}}</ref>网络<ref>{{cite journal|year=2013|title=The no-prop algorithm: A new learning algorithm for multilayer neural networks|url=|journal=Neural Networks|volume=37|issue=|pages=182–188|doi=10.1016/j.neunet.2012.09.020|last1=Widrow|first1=Bernard|display-authors=etal}}</ref>,和[https://en.wikipedia.org/wiki/Holographic_associative_memory 非联结主义神经网络]
反向传播的替代包括[https://en.wikipedia.org/wiki/Extreme_Learning_Machines 极端学习机]<ref>{{cite journal|last2=Zhu|first2=Qin-Yu|last3=Siew|first3=Chee-Kheong|year=2006|title=Extreme learning machine: theory and applications|url=|journal=Neurocomputing|volume=70|issue=1|pages=489–501|last1=Huang|first1=Guang-Bin}}</ref>,不使用回溯法<ref>{{cite journal|url=https://arxiv.org/abs/1507.07680|first=Yann|last=Ollivier|first2=Guillaume|last2=Charpiat|title=Training recurrent networks without backtracking|year=2015}}</ref>训练的“无权重”<ref>ESANN. 2009</ref><ref name="RBMTRAIN">{{Cite journal|last=Hinton|first=G. E.|date=2010|title=A Practical Guide to Training Restricted Boltzmann Machines|url=https://www.researchgate.net/publication/221166159_A_brief_introduction_to_Weightless_Neural_Systems|journal=Tech. Rep. UTML TR 2010-003,|volume=|pages=|via=}}</ref>网络<ref>{{cite journal|year=2013|title=The no-prop algorithm: A new learning algorithm for multilayer neural networks|url=|journal=Neural Networks|volume=37|issue=|pages=182–188|last1=Widrow|first1=Bernard|display-authors=etal}}</ref>,和[https://en.wikipedia.org/wiki/Holographic_associative_memory 非联结主义神经网络]
===学习范式(Learning paradigms)===
===学习范式(Learning paradigms)===
==== 监督学习(Supervised learning) ====
==== 监督学习(Supervised learning) ====
[https://en.wikipedia.org/wiki/Supervised_learning 监督学习]使用一组例子对<math>{(x, y)}, {x \in X}, {y \in Y}</math>,目标是在允许的函数类中找到一个函数 <math> f : X \rightarrow Y </math> 匹配例子。 换言之,我们希望推断数据隐含的映射;损失函数与我们的映射和数据间的不匹配相关,它隐含了关于问题域的先验知识。<ref>{{Cite journal|last=Ojha|first=Varun Kumar|last2=Abraham|first2=Ajith|last3=Snášel|first3=Václav|date=2017-04-01|title=Metaheuristic design of feedforward neural networks: A review of two decades of research|url=http://www.sciencedirect.com/science/article/pii/S0952197617300234|journal=Engineering Applications of Artificial Intelligence|volume=60|pages=97–116|doi=10.1016/j.engappai.2017.01.013}}</ref>
[https://en.wikipedia.org/wiki/Supervised_learning 监督学习]使用一组例子对<math>{(x, y)}, {x \in X}, {y \in Y}</math>,目标是在允许的函数类中找到一个函数 <math> f : X \rightarrow Y </math> 匹配例子。 换言之,我们希望推断数据隐含的映射;损失函数与我们的映射和数据间的不匹配相关,它隐含了关于问题域的先验知识。<ref>{{Cite journal|last=Ojha|first=Varun Kumar|last2=Abraham|first2=Ajith|last3=Snášel|first3=Václav|date=2017-04-01|title=Metaheuristic design of feedforward neural networks: A review of two decades of research|url=http://www.sciencedirect.com/science/article/pii/S0952197617300234|journal=Engineering Applications of Artificial Intelligence|volume=60|pages=97–116}}</ref>
通常使用的损失函数是[https://en.wikipedia.org/wiki/Mean-squared_error 均方误差],它对所有的例子对在网络输出 <math> f(x)</math>和目标值<math> y</math>之间最小化平均平方误差。最小化损失对一类叫做[https://en.wikipedia.org/wiki/Multilayer_perceptron 多层感知机](MLP)的一类神经网络使用了[https://en.wikipedia.org/wiki/Gradient_descent 梯度下降],产生了训练神经网络的[https://en.wikipedia.org/wiki/Backpropagation 反向传播算法]。
通常使用的损失函数是[https://en.wikipedia.org/wiki/Mean-squared_error 均方误差],它对所有的例子对在网络输出 <math> f(x)</math>和目标值<math> y</math>之间最小化平均平方误差。最小化损失对一类叫做[https://en.wikipedia.org/wiki/Multilayer_perceptron 多层感知机](MLP)的一类神经网络使用了[https://en.wikipedia.org/wiki/Gradient_descent 梯度下降],产生了训练神经网络的[https://en.wikipedia.org/wiki/Backpropagation 反向传播算法]。
更正式地说,环境被建模成[https://en.wikipedia.org/wiki/Markov_decision_process 马尔科夫决策过程] (MDP),具有如下概率分布的状态 <math>\textstyle {s_1,...,s_n}\in S </math>和动作 <math>\textstyle {{a_1,...,a_m} \in A}</math>:瞬时损失分布 <math>\textstyle P(c_t|s_t)</math>,观测分布 <math>\textstyle {P({x_t}|{s_t})}</math>和转移<math>{P({s_{t+1}}|{s_t}, {a_t})}</math>, 方针被定义为给定观测值的动作上的条件分布。合起来,这二者定义了一个[https://en.wikipedia.org/wiki/Markov_chain 马尔科夫链](MC)。目标是找到最小化损失的方针(也就是MC)。
更正式地说,环境被建模成[https://en.wikipedia.org/wiki/Markov_decision_process 马尔科夫决策过程] (MDP),具有如下概率分布的状态 <math>\textstyle {s_1,...,s_n}\in S </math>和动作 <math>\textstyle {{a_1,...,a_m} \in A}</math>:瞬时损失分布 <math>\textstyle P(c_t|s_t)</math>,观测分布 <math>\textstyle {P({x_t}|{s_t})}</math>和转移<math>{P({s_{t+1}}|{s_t}, {a_t})}</math>, 方针被定义为给定观测值的动作上的条件分布。合起来,这二者定义了一个[https://en.wikipedia.org/wiki/Markov_chain 马尔科夫链](MC)。目标是找到最小化损失的方针(也就是MC)。
强化学习中,ANN通常被用作整个算法的一部分<ref>{{cite conference| author = Dominic, S. |author2=Das, R. |author3=Whitley, D. |author4=Anderson, C. |date=July 1991 | title = Genetic reinforcement learning for neural networks | conference = IJCNN-91-Seattle International Joint Conference on Neural Networks | booktitle = IJCNN-91-Seattle International Joint Conference on Neural Networks | publisher = IEEE | location = Seattle, Washington, USA | doi = 10.1109/IJCNN.1991.155315 | accessdate = | isbn = 0-7803-0164-1 }}</ref><ref>{{cite journal |last=Hoskins |first=J.C. |author2=Himmelblau, D.M. |title=Process control via artificial neural networks and reinforcement learning |journal=Computers & Chemical Engineering |year=1992 |volume=16 |pages=241–251 |doi=10.1016/0098-1354(92)80045-B |issue=4}}</ref>。[https://en.wikipedia.org/wiki/Dimitri_Bertsekas Bertsekas]和Tsiksiklis<ref>{{cite book|url=https://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|title=Neuro-dynamic programming|first=D.P.|first2=J.N.|publisher=Athena Scientific|year=1996|isbn=1-886529-10-8|location=|page=512|pages=|author=Bertsekas|author2=Tsitsiklis}}</ref> 给[https://en.wikipedia.org/wiki/Dynamic_programming 动态编程]加上ANN(给出神经动力的编程)并应用到如[https://en.wikipedia.org/wiki/Vehicle_routing 车辆路径]<ref>{{cite journal |last=Secomandi |first=Nicola |title=Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands |journal=Computers & Operations Research |year=2000 |volume=27 |pages=1201–1225 |doi=10.1016/S0305-0548(99)00146-X |issue=11–12}}</ref> 和[https://en.wikipedia.org/wiki/Natural_resource_management 自然资源管理]<ref>{{cite conference| author = de Rigo, D. |author2=Rizzoli, A. E. |author3=Soncini-Sessa, R. |author4=Weber, E. |author5=Zenesi, P. | year = 2001 | title = Neuro-dynamic programming for the efficient management of reservoir networks | conference = MODSIM 2001, International Congress on Modelling and Simulation | conferenceurl = http://www.mssanz.org.au/MODSIM01/MODSIM01.htm | booktitle = Proceedings of MODSIM 2001, International Congress on Modelling and Simulation | publisher = Modelling and Simulation Society of Australia and New Zealand | location = Canberra, Australia | doi = 10.5281/zenodo.7481 | url = https://zenodo.org/record/7482/files/de_Rigo_etal_MODSIM2001_activelink_authorcopy.pdf | accessdate = 29 July 2012 | isbn = 0-867405252 }}</ref><ref>{{cite conference| author = Damas, M. |author2=Salmeron, M. |author3=Diaz, A. |author4=Ortega, J. |author5=Prieto, A. |author6=Olivares, G.| year = 2000 | title = Genetic algorithms and neuro-dynamic programming: application to water supply networks | conference = 2000 Congress on Evolutionary Computation | booktitle = Proceedings of 2000 Congress on Evolutionary Computation | publisher = IEEE | location = La Jolla, California, USA | doi = 10.1109/CEC.2000.870269 | accessdate = | isbn = 0-7803-6375-2 }}</ref>或[https://en.wikipedia.org/wiki/Medicine 医药]<ref>{{cite journal |last=Deng |first=Geng |author2=Ferris, M.C. |title=Neuro-dynamic programming for fractionated radiotherapy planning |journal=Springer Optimization and Its Applications |year=2008 |volume=12 |pages=47–70 |doi=10.1007/978-0-387-73299-2_3|citeseerx=10.1.1.137.8288 |series=Springer Optimization and Its Applications |isbn=978-0-387-73298-5 }}</ref>领域中的多维非线性问题。因为ANN能够减小精度损失,甚至在为数值逼近原始控制问题解而降低离散化网格密度时。
强化学习中,ANN通常被用作整个算法的一部分<ref>{{cite journal| author = Dominic, S. |author2=Das, R. |author3=Whitley, D. |author4=Anderson, C. |date=July 1991 | title = Genetic reinforcement learning for neural networks | journal = IJCNN-91-Seattle International Joint Conference on Neural Networks| publisher = IEEE | location = Seattle, Washington, USA| accessdate = }}</ref><ref>{{cite journal |last=Hoskins |first=J.C. |author2=Himmelblau, D.M. |title=Process control via artificial neural networks and reinforcement learning |journal=Computers & Chemical Engineering |year=1992 |volume=16 |pages=241–251 |issue=4}}</ref>。[https://en.wikipedia.org/wiki/Dimitri_Bertsekas Bertsekas]和Tsiksiklis<ref>{{cite book|url=https://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|title=Neuro-dynamic programming|first=D.P.|first2=J.N.|publisher=Athena Scientific|year=1996|location=|page=512|pages=|author=Bertsekas|author2=Tsitsiklis}}</ref> 给[https://en.wikipedia.org/wiki/Dynamic_programming 动态编程]加上ANN(给出神经动力的编程)并应用到如[https://en.wikipedia.org/wiki/Vehicle_routing 车辆路径]<ref>{{cite journal |last=Secomandi |first=Nicola |title=Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands |journal=Computers & Operations Research |year=2000 |volume=27 |pages=1201–1225|issue=11–12}}</ref> 和[https://en.wikipedia.org/wiki/Natural_resource_management 自然资源管理]<ref>{{cite journal| author = de Rigo, D. |author2=Rizzoli, A. E. |author3=Soncini-Sessa, R. |author4=Weber, E. |author5=Zenesi, P. | year = 2001 | title = Neuro-dynamic programming for the efficient management of reservoir networks | journal = MODSIM 2001, International Congress on Modelling and Simulation | url = http://www.mssanz.org.au/MODSIM01/MODSIM01.htm | publisher = Modelling and Simulation Society of Australia and New Zealand | location = Canberra, Australia | url = https://zenodo.org/record/7482/files/de_Rigo_etal_MODSIM2001_activelink_authorcopy.pdf | accessdate = 29 July 2012 }}</ref><ref>{{cite journal| author = Damas, M. |author2=Salmeron, M. |author3=Diaz, A. |author4=Ortega, J. |author5=Prieto, A. |author6=Olivares, G.| year = 2000 | title = Genetic algorithms and neuro-dynamic programming: application to water supply networks | journal = 2000 Congress on Evolutionary Computation | publisher = IEEE | location = La Jolla, California, USA | accessdate = }}</ref>或[https://en.wikipedia.org/wiki/Medicine 医药]<ref>{{cite journal |last=Deng |first=Geng |author2=Ferris, M.C. |title=Neuro-dynamic programming for fractionated radiotherapy planning |journal=Springer Optimization and Its Applications |year=2008 |volume=12 |pages=47–70|citeseerx=10.1.1.137.8288 |series=Springer Optimization and Its Applications }}</ref>领域中的多维非线性问题。因为ANN能够减小精度损失,甚至在为数值逼近原始控制问题解而降低离散化网格密度时。
强化学习范式中的任务是控制问题,[https://en.wikipedia.org/wiki/Game 游戏]和其他序列决策任务。
强化学习范式中的任务是控制问题,[https://en.wikipedia.org/wiki/Game 游戏]和其他序列决策任务。
* [https://en.wikipedia.org/wiki/Gradient_descent 最速下降](带参数学习速率和[https://en.wikipedia.org/wiki/Gradient_descent#The_momentum_method 动量],[https://en.wikipedia.org/wiki/Rprop 弹性反向传播];
* [https://en.wikipedia.org/wiki/Gradient_descent 最速下降](带参数学习速率和[https://en.wikipedia.org/wiki/Gradient_descent#The_momentum_method 动量],[https://en.wikipedia.org/wiki/Rprop 弹性反向传播];
* 拟牛顿 ([https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm Broyden-Fletcher-Goldfarb-Shanno]),单[https://en.wikipedia.org/wiki/Secant_method 步割线];
* 拟牛顿 ([https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm Broyden-Fletcher-Goldfarb-Shanno]),单[https://en.wikipedia.org/wiki/Secant_method 步割线];
* [https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm Levenberg-Marquardt]和[https://en.wikipedia.org/wiki/Conjugate_gradient_method 共轭梯度](Fletcher-Reeves 更新, Polak-Ribiére 更新, Powell-Beale 重启,标度共轭梯度)。<ref>{{cite conference|author1=M. Forouzanfar |author2=H. R. Dajani |author3=V. Z. Groza |author4=M. Bolic |author5=S. Rajan |last-author-amp=yes |date=July 2010 | title = Comparison of Feed-Forward Neural Network Training Algorithms for Oscillometric Blood Pressure Estimation | conference = 4th Int. Workshop Soft Computing Applications | publisher = IEEE| location = Arad, Romania |url=https://www.researchgate.net/profile/Mohamad_Forouzanfar/publication/224173336_Comparison_of_Feed-Forward_Neural_Network_training_algorithms_for_oscillometric_blood_pressure_estimation/links/00b7d533829c3a7484000000.pdf?ev=pub_int_doc_dl&origin=publication_detail&inViewer=true&msrp=TyT96%2BjWOHJo%2BVhkMF4IzwHPAImSd442n%2BAkEuXj9qBmQSZ495CpxqlaOYon%2BSlEzWQElBGyJmbBCiiUOV8ImeEqPFXiIRivcrWsWmlPBYU%3D }}</ref>
* [https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm Levenberg-Marquardt]和[https://en.wikipedia.org/wiki/Conjugate_gradient_method 共轭梯度](Fletcher-Reeves 更新, Polak-Ribiére 更新, Powell-Beale 重启,标度共轭梯度)。<ref>{{cite journal|author1=M. Forouzanfar |author2=H. R. Dajani |author3=V. Z. Groza |author4=M. Bolic |author5=S. Rajan |last-author-amp=yes |date=July 2010 | title = Comparison of Feed-Forward Neural Network Training Algorithms for Oscillometric Blood Pressure Estimation | journal = 4th Int. Workshop Soft Computing Applications | publisher = IEEE| location = Arad, Romania |url=https://www.researchgate.net/profile/Mohamad_Forouzanfar/publication/224173336_Comparison_of_Feed-Forward_Neural_Network_training_algorithms_for_oscillometric_blood_pressure_estimation/links/00b7d533829c3a7484000000.pdf?ev=pub_int_doc_dl&origin=publication_detail&inViewer=true&msrp=TyT96%2BjWOHJo%2BVhkMF4IzwHPAImSd442n%2BAkEuXj9qBmQSZ495CpxqlaOYon%2BSlEzWQElBGyJmbBCiiUOV8ImeEqPFXiIRivcrWsWmlPBYU%3D }}</ref>
[https://en.wikipedia.org/wiki/Evolutionary_methods 进化法]<ref>{{cite conference| author1 = de Rigo, D. | author2 = Castelletti, A. | author3 = Rizzoli, A. E. | author4 = Soncini-Sessa, R. | author5 = Weber, E. |date=January 2005 | title = A selective improvement technique for fastening Neuro-Dynamic Programming in Water Resources Network Management | conference = 16th IFAC World Congress | conferenceurl = http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Index.html | booktitle = Proceedings of the 16th IFAC World Congress – IFAC-PapersOnLine | editor = Pavel Zítek | volume = 16 | publisher = IFAC | location = Prague, Czech Republic | url = http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Papers/Paper4269.html
[https://en.wikipedia.org/wiki/Evolutionary_methods 进化法]<ref>{{cite journal| author1 = de Rigo, D. | author2 = Castelletti, A. | author3 = Rizzoli, A. E. | author4 = Soncini-Sessa, R. | author5 = Weber, E. |date=January 2005 | title = A selective improvement technique for fastening Neuro-Dynamic Programming in Water Resources Network Management | journal = 16th IFAC World Congress | conferenceurl = http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Index.html | editor = Pavel Zítek | volume = 16 | publisher = IFAC | location = Prague, Czech Republic | url = http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Papers/Paper4269.html
| accessdate = 30 December 2011 | doi = 10.3182/20050703-6-CZ-1902.02172 | isbn = 978-3-902661-75-3 }}</ref>,[https://en.wikipedia.org/wiki/Gene_expression_programming 基因表达式编程]<ref>{{cite web|last=Ferreira|first=C.|year=2006|title=Designing Neural Networks Using Gene Expression Programming|url= http://www.gene-expression-programming.com/webpapers/Ferreira-ASCT2006.pdf|publisher= In A. Abraham, B. de Baets, M. Köppen, and B. Nickolay, eds., Applied Soft Computing Technologies: The Challenge of Complexity, pages 517–536, Springer-Verlag}}</ref>,[https://en.wikipedia.org/wiki/Simulated_annealing 模拟退火]<ref>{{cite conference| author = Da, Y. |author2=Xiurun, G. |date=July 2005 | title = An improved PSO-based ANN with simulated annealing technique | conference = New Aspects in Neurocomputing: 11th European Symposium on Artificial Neural Networks | conferenceurl = http://www.dice.ucl.ac.be/esann/proceedings/electronicproceedings.htm | editor = T. Villmann | publisher = Elsevier | doi = 10.1016/j.neucom.2004.07.002 }}<!--| accessdate = 30 December 2011 --></ref>,[https://en.wikipedia.org/wiki/Expectation-maximization 期望最大化],[https://en.wikipedia.org/wiki/Non-parametric_methods 非参数方法]和[https://en.wikipedia.org/wiki/Particle_swarm_optimization 粒子群算法]<ref>{{cite conference| author = Wu, J. |author2=Chen, E. |date=May 2009 | title = A Novel Nonparametric Regression Ensemble for Rainfall Forecasting Using Particle Swarm Optimization Technique Coupled with Artificial Neural Network | conference = 6th International Symposium on Neural Networks, ISNN 2009 | conferenceurl = http://www2.mae.cuhk.edu.hk/~isnn2009/ | editors = Wang, H., Shen, Y., Huang, T., Zeng, Z. | publisher = Springer | doi = 10.1007/978-3-642-01513-7-6 | isbn = 978-3-642-01215-0 }}<!--| accessdate = 1 January 2012 --></ref>是训练神经网络的其他方法。
| accessdate = 30 December 2011}}</ref>,[https://en.wikipedia.org/wiki/Gene_expression_programming 基因表达式编程]<ref>{{cite journal|last=Ferreira|first=C.|year=2006|title=Designing Neural Networks Using Gene Expression Programming|url= http://www.gene-expression-programming.com/webpapers/Ferreira-ASCT2006.pdf|publisher= In A. Abraham, B. de Baets, M. Köppen, and B. Nickolay, eds., Applied Soft Computing Technologies: The Challenge of Complexity, pages 517–536, Springer-Verlag}}</ref>,[https://en.wikipedia.org/wiki/Simulated_annealing 模拟退火]<ref>{{cite journal| author = Da, Y. |author2=Xiurun, G. |date=July 2005 | title = An improved PSO-based ANN with simulated annealing technique | journal = New Aspects in Neurocomputing: 11th European Symposium on Artificial Neural Networks | conferenceurl = http://www.dice.ucl.ac.be/esann/proceedings/electronicproceedings.htm | editor = T. Villmann | publisher = Elsevier }}<!--| accessdate = 30 December 2011 --></ref>,[https://en.wikipedia.org/wiki/Expectation-maximization 期望最大化],[https://en.wikipedia.org/wiki/Non-parametric_methods 非参数方法]和[https://en.wikipedia.org/wiki/Particle_swarm_optimization 粒子群算法]<ref>{{cite journal| author = Wu, J. |author2=Chen, E. |date=May 2009 | title = A Novel Nonparametric Regression Ensemble for Rainfall Forecasting Using Particle Swarm Optimization Technique Coupled with Artificial Neural Network | journal = 6th International Symposium on Neural Networks, ISNN 2009 | conferenceurl = http://www2.mae.cuhk.edu.hk/~isnn2009/ | editors = Wang, H., Shen, Y., Huang, T., Zeng, Z. | publisher = Springer }}<!--| accessdate = 1 January 2012 --></ref>是训练神经网络的其他方法。
==== 递推收敛学习算法(Convergent recursive learning algorithm) ====
==== 递推收敛学习算法(Convergent recursive learning algorithm) ====