更改

人工神经网络 (查看源代码)

2022年1月9日 (日) 12:09的版本

删除780字节、 2022年1月9日 (日) 12:09

无编辑摘要

第53行：第53行：

为了解决这个问题，[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Schmidhuber]采用了一种多层网络结构，通过[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习]每次预训练一级然后使用反向传播很好地调整<ref name="SCHMID1992">J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," ''Neural Computation'', 4, pp. 234–242, 1992.</ref>。例如，Behnke 在图像重建和人脸定位中只依赖梯度符号。<ref>{{cite book|url=http://www.ais.uni-bonn.de/books/LNCS2766.pdf|title=Hierarchical Neural Networks for Image Interpretation.|publisher=Springer|year=2003|series=Lecture Notes in Computer Science|volume=2766|author=Sven Behnke}}</ref>

−

[https://en.wikipedia.org/wiki/Geoffrey_Hinton Hinton]提出了使用连续层的二进制或潜变量实数[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机]<ref name="smolensky1986">{{cite book|url=http://portal.acm.org/citation.cfm?id=104290|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|year=1986|editors=D. E. Rumelhart, J. L. McClelland, & the PDP Research Group|volume=1|pages=194–281|chapter=Information processing in dynamical systems: Foundations of harmony theory.|last1=Smolensky|first1=P.~~|authorlink1=https://en.wikipedia.org/wiki/Paul_Smolensky~~}}</ref>来模拟每一层学习一种高级别表征。一旦很多层被充分学习，这种深度结构可能像[https://en.wikipedia.org/wiki/Generative_model 生成模型]一样被使用，通过在下采样（一个古老的方法）模型时从顶层特征激活处复制数据。<ref name="hinton2006">{{cite journal|last2=Osindero|first2=S.|last3=Teh|first3=Y.|year=2006|title=A fast learning algorithm for deep belief nets|url=http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf|journal=[https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation]|volume=18|issue=7|pages=1527–1554|last1=Hinton|first1=G. E.~~|authorlink1=https://en.wikipedia.org/wiki/Geoffrey_Hinton~~}}</ref><ref>{{Cite journal|year=2009|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|volume=4|issue=5|pages=5947|last1=Hinton|first1=G.}}</ref> 2012年[https://en.wikipedia.org/wiki/Andrew_Ng Ng] 和[https://en.wikipedia.org/wiki/Jeff_Dean_(computer_scientist) Dean]创造了一个只通过看[https://en.wikipedia.org/wiki/YouTube YouTube]视频中未标记的图像学习识别例如猫这样更高层概念的网络。<ref name="ng2012">{{cite journal|url=https://arxiv.org/abs/1112.6209|first2=Jeff|last2=Dean|title=Building High-level Features Using Large Scale Unsupervised Learning|last1=Ng|first1=Andrew|year=2012|class=cs.LG}}</ref>

+

[https://en.wikipedia.org/wiki/Geoffrey_Hinton Hinton]提出了使用连续层的二进制或潜变量实数[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机]<ref name="smolensky1986">{{cite book|url=http://portal.acm.org/citation.cfm?id=104290|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|year=1986|editors=D. E. Rumelhart, J. L. McClelland, & the PDP Research Group|volume=1|pages=194–281|chapter=Information processing in dynamical systems: Foundations of harmony theory.|last1=Smolensky|first1=P.}}</ref>来模拟每一层学习一种高级别表征。一旦很多层被充分学习，这种深度结构可能像[https://en.wikipedia.org/wiki/Generative_model 生成模型]一样被使用，通过在下采样（一个古老的方法）模型时从顶层特征激活处复制数据。<ref name="hinton2006">{{cite journal|last2=Osindero|first2=S.|last3=Teh|first3=Y.|year=2006|title=A fast learning algorithm for deep belief nets|url=http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf|journal=[https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation]|volume=18|issue=7|pages=1527–1554|last1=Hinton|first1=G. E.}}</ref><ref>{{Cite journal|year=2009|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|volume=4|issue=5|pages=5947|last1=Hinton|first1=G.}}</ref> 2012年[https://en.wikipedia.org/wiki/Andrew_Ng Ng] 和[https://en.wikipedia.org/wiki/Jeff_Dean_(computer_scientist) Dean]创造了一个只通过看[https://en.wikipedia.org/wiki/YouTube YouTube]视频中未标记的图像学习识别例如猫这样更高层概念的网络。<ref name="ng2012">{{cite journal|url=https://arxiv.org/abs/1112.6209|first2=Jeff|last2=Dean|title=Building High-level Features Using Large Scale Unsupervised Learning|last1=Ng|first1=Andrew|year=2012|class=cs.LG}}</ref>

在训练深度神经网络中早期的挑战被成功地用无监督预训练等方法处理，与此同时可见的计算性能通过GPU和分布计算的使用提升。神经网络被部署在大规模，尤其是在图像和视觉识别问题上。这被称为“[https://en.wikipedia.org/wiki/Deep_learning 深度学习]”

第141行：第141行：

一个[https://en.wikipedia.org/wiki/Deep_neural_network 深度神经网络]可以使用标准反向传播算法判别地训练。反向传播是一种计算关于ANN中权重的[https://en.wikipedia.org/wiki/Loss_function 损失函数]（产生与给定状态相联系的损失）[https://en.wikipedia.org/wiki/Gradient 梯度]的方法。

−

连续反向传播的基础<ref name="SCHIDHUB2"/><ref name="scholarpedia2">{{cite journal|year=2015|title=Deep Learning|url=http://www.scholarpedia.org/article/Deep_Learning|journal=Scholarpedia|volume=10|issue=11|page=32832|last1=Schmidhuber|first1=Jürgen~~|authorlink=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber~~}}</ref><ref name=":5">{{Cite journal|last=Dreyfus|first=Stuart E.|date=1990-09-01|title=Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure|url=http://arc.aiaa.org/doi/10.2514/3.25422|journal=Journal of Guidance, Control, and Dynamics|volume=13|issue=5|pages=926–928}}</ref><ref name="mizutani2000">Eiji Mizutani, [https://en.wikipedia.org/wiki/Stuart_Dreyfus Stuart Dreyfus], Kenichi Nishio (2000). On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2000), Como Italy, July 2000. [http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/ijcnn2k.pdf Online]</ref> 由[https://en.wikipedia.org/wiki/Henry_J._Kelley Kelley]<ref name="kelley1960">{{cite journal|year=1960|title=Gradient theory of optimal flight paths|url=http://arc.aiaa.org/doi/abs/10.2514/8.5282?journalCode=arsj|journal=Ars Journal|volume=30|issue=10|pages=947–954|last1=Kelley|first1=Henry J.~~|authorlink=https://en.wikipedia.org/wiki/Henry_J._Kelley~~}}</ref> 在1960和[https://en.wikipedia.org/wiki/Arthur_E._Bryson Bryson]在1961<ref name="bryson1961">[https://en.wikipedia.org/wiki/Arthur_E._Bryson Arthur E. Bryson] (1961, April). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard Univ. Symposium on digital computers and their applications.</ref>使用[https://en.wikipedia.org/wiki/Chain_rule 动态编程]的原则从[https://en.wikipedia.org/wiki/Control_theory 控制论]引出。1962，[https://en.wikipedia.org/wiki/Stuart_Dreyfus Dreyfus]发表了只基于[https://en.wikipedia.org/wiki/Chain_rule 链式法则]<ref name="dreyfus1962">{{cite journal|year=1962|title=The numerical solution of variational problems|url=https://www.researchgate.net/publication/256244271_The_numerical_solution_of_variational_problems|journal=Journal of Mathematical Analysis and Applications|volume=5|issue=1|pages=30–45|last1=Dreyfus|first1=Stuart~~|authorlink=https://en.wikipedia.org/wiki/Stuart_Dreyfus~~}}</ref>的更简单的衍生。1969，Bryson和[https://en.wikipedia.org/wiki/Yu-Chi_Ho Ho]把它描述成一种多级动态系统优化方法。<ref>{{cite book|url=https://books.google.com/books?id=8jZBksh-bUMC&pg=PA578|title=Artificial Intelligence A Modern Approach|last2=Norvig|first2=Peter|publisher=Prentice Hall|year=2010|page=578|quote=The most popular method for learning in multilayer networks is called Back-propagation.|first1=Stuart J.|last1=Russell}}</ref><ref name="Bryson1969">{{cite book|url=https://books.google.com/books?id=1bChDAEACAAJ&pg=PA481|title=Applied Optimal Control: Optimization, Estimation and Control|last=Bryson|first=Arthur Earl|publisher=Blaisdell Publishing Company or Xerox College Publishing|year=1969|page=481}}</ref>1970，[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Linnainmaa]最终发表了嵌套[https://en.wikipedia.org/wiki/Differentiable_function 可微函数]<ref name="lin1970">[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Seppo Linnainmaa] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6–7.</ref><ref name="lin1976">{{cite journal|year=1976|title=Taylor expansion of the accumulated rounding error|url=|journal=BIT Numerical Mathematics|volume=16|issue=2|pages=146–160|last1=Linnainmaa|first1=Seppo~~|authorlink=https://en.wikipedia.org/wiki/Seppo_Linnainmaa~~}}</ref> 的离散连接网络[https://en.wikipedia.org/wiki/Automatic_differentiation 自动差分机]（AD）的通用方法。这对应于反向传播的现代版本，它在网络稀疏时仍有效<ref name="SCHIDHUB2"/><ref name="scholarpedia2"/><ref name="grie2012">{{Cite journal|last=Griewank|first=Andreas|date=2012|title=Who Invented the Reverse Mode of Differentiation?|url=http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|journal=Documenta Matematica, Extra Volume ISMP|volume=|pages=389–400|via=}}</ref><ref name="grie2008">{{cite book|url=https://books.google.com/books?id=xoiiLaRxcbEC|title=Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition|last2=Walther|first2=Andrea|publisher=SIAM|year=2008|first1=Andreas|last1=Griewank}}</ref>。1973<ref name="dreyfus1973">{{cite journal|year=1973|title=The computational solution of optimal control problems with time lag|url=|journal=IEEE Transactions on Automatic Control|volume=18|issue=4|pages=383–385|last1=Dreyfus|first1=Stuart~~|authorlink=https://en.wikipedia.org/wiki/Stuart_Dreyfus~~}}</ref> ，Dreyfus使用反向传播适配与误差梯度成比例的控制器[https://en.wikipedia.org/wiki/Parameter 参数]。1974，[https://en.wikipedia.org/wiki/Paul_Werbos Werbos]提出将这个规则应用到ANN上的可能<ref name="werbos1974">https://en.wikipedia.org/wiki/Paul_Werbos (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.</ref>，1982他将LInnainmaa的AD方法以今天广泛使用的方式应用到神经网络上<ref name="scholarpedia2"/><ref name="werbos1982">{{Cite book|url=http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|title=System modeling and optimization|last=Werbos|first=Paul~~|authorlink=https://en.wikipedia.org/wiki/Paul_Werbos~~|publisher=Springer|year=1982|location=|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis}}</ref>。1986, [https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart], Hinton和[https://en.wikipedia.org/wiki/Ronald_J._Williams Williams]注意到这种方法可以产生有用的神经网络隐藏层到来数据的内部表征。<ref name=":4">{{Cite journal|last=Rumelhart|first=David E.|last2=Hinton|first2=Geoffrey E.|last3=Williams|first3=Ronald J.|title=Learning representations by back-propagating errors|url=http://www.nature.com/articles/Art323533a0|journal=Nature|volume=323|issue=6088|pages=533–536|year=1986}}</ref> 1933，Wan第一个<ref name="SCHIDHUB2"/> 用反向传播赢得国际模式识别竞赛。<ref name="wan1993">Eric A. Wan (1993). "Time series prediction by using a connectionist network with internal delay lines." In ''Proceedings of the Santa Fe Institute Studies in the Sciences of Complexity'', '''15''': p. 195. Addison-Wesley Publishing Co.</ref>

+

连续反向传播的基础<ref name="SCHIDHUB2"/><ref name="scholarpedia2">{{cite journal|year=2015|title=Deep Learning|url=http://www.scholarpedia.org/article/Deep_Learning|journal=Scholarpedia|volume=10|issue=11|page=32832|last1=Schmidhuber|first1=Jürgen}}</ref><ref name=":5">{{Cite journal|last=Dreyfus|first=Stuart E.|date=1990-09-01|title=Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure|url=http://arc.aiaa.org/doi/10.2514/3.25422|journal=Journal of Guidance, Control, and Dynamics|volume=13|issue=5|pages=926–928}}</ref><ref name="mizutani2000">Eiji Mizutani, [https://en.wikipedia.org/wiki/Stuart_Dreyfus Stuart Dreyfus], Kenichi Nishio (2000). On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2000), Como Italy, July 2000. [http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/ijcnn2k.pdf Online]</ref> 由[https://en.wikipedia.org/wiki/Henry_J._Kelley Kelley]<ref name="kelley1960">{{cite journal|year=1960|title=Gradient theory of optimal flight paths|url=http://arc.aiaa.org/doi/abs/10.2514/8.5282?journalCode=arsj|journal=Ars Journal|volume=30|issue=10|pages=947–954|last1=Kelley|first1=Henry J.}}</ref> 在1960和[https://en.wikipedia.org/wiki/Arthur_E._Bryson Bryson]在1961<ref name="bryson1961">[https://en.wikipedia.org/wiki/Arthur_E._Bryson Arthur E. Bryson] (1961, April). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard Univ. Symposium on digital computers and their applications.</ref>使用[https://en.wikipedia.org/wiki/Chain_rule 动态编程]的原则从[https://en.wikipedia.org/wiki/Control_theory 控制论]引出。1962，[https://en.wikipedia.org/wiki/Stuart_Dreyfus Dreyfus]发表了只基于[https://en.wikipedia.org/wiki/Chain_rule 链式法则]<ref name="dreyfus1962">{{cite journal|year=1962|title=The numerical solution of variational problems|url=https://www.researchgate.net/publication/256244271_The_numerical_solution_of_variational_problems|journal=Journal of Mathematical Analysis and Applications|volume=5|issue=1|pages=30–45|last1=Dreyfus|first1=Stuart}}</ref>的更简单的衍生。1969，Bryson和[https://en.wikipedia.org/wiki/Yu-Chi_Ho Ho]把它描述成一种多级动态系统优化方法。<ref>{{cite book|url=https://books.google.com/books?id=8jZBksh-bUMC&pg=PA578|title=Artificial Intelligence A Modern Approach|last2=Norvig|first2=Peter|publisher=Prentice Hall|year=2010|page=578|quote=The most popular method for learning in multilayer networks is called Back-propagation.|first1=Stuart J.|last1=Russell}}</ref><ref name="Bryson1969">{{cite book|url=https://books.google.com/books?id=1bChDAEACAAJ&pg=PA481|title=Applied Optimal Control: Optimization, Estimation and Control|last=Bryson|first=Arthur Earl|publisher=Blaisdell Publishing Company or Xerox College Publishing|year=1969|page=481}}</ref>1970，[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Linnainmaa]最终发表了嵌套[https://en.wikipedia.org/wiki/Differentiable_function 可微函数]<ref name="lin1970">[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Seppo Linnainmaa] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6–7.</ref><ref name="lin1976">{{cite journal|year=1976|title=Taylor expansion of the accumulated rounding error|url=|journal=BIT Numerical Mathematics|volume=16|issue=2|pages=146–160|last1=Linnainmaa|first1=Seppo}}</ref> 的离散连接网络[https://en.wikipedia.org/wiki/Automatic_differentiation 自动差分机]（AD）的通用方法。这对应于反向传播的现代版本，它在网络稀疏时仍有效<ref name="SCHIDHUB2"/><ref name="scholarpedia2"/><ref name="grie2012">{{Cite journal|last=Griewank|first=Andreas|date=2012|title=Who Invented the Reverse Mode of Differentiation?|url=http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|journal=Documenta Matematica, Extra Volume ISMP|volume=|pages=389–400|via=}}</ref><ref name="grie2008">{{cite book|url=https://books.google.com/books?id=xoiiLaRxcbEC|title=Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition|last2=Walther|first2=Andrea|publisher=SIAM|year=2008|first1=Andreas|last1=Griewank}}</ref>。1973<ref name="dreyfus1973">{{cite journal|year=1973|title=The computational solution of optimal control problems with time lag|url=|journal=IEEE Transactions on Automatic Control|volume=18|issue=4|pages=383–385|last1=Dreyfus|first1=Stuart}}</ref> ，Dreyfus使用反向传播适配与误差梯度成比例的控制器[https://en.wikipedia.org/wiki/Parameter 参数]。1974，[https://en.wikipedia.org/wiki/Paul_Werbos Werbos]提出将这个规则应用到ANN上的可能<ref name="werbos1974">https://en.wikipedia.org/wiki/Paul_Werbos (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.</ref>，1982他将LInnainmaa的AD方法以今天广泛使用的方式应用到神经网络上<ref name="scholarpedia2"/><ref name="werbos1982">{{Cite book|url=http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|title=System modeling and optimization|last=Werbos|first=Paul|publisher=Springer|year=1982|location=|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis}}</ref>。1986, [https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart], Hinton和[https://en.wikipedia.org/wiki/Ronald_J._Williams Williams]注意到这种方法可以产生有用的神经网络隐藏层到来数据的内部表征。<ref name=":4">{{Cite journal|last=Rumelhart|first=David E.|last2=Hinton|first2=Geoffrey E.|last3=Williams|first3=Ronald J.|title=Learning representations by back-propagating errors|url=http://www.nature.com/articles/Art323533a0|journal=Nature|volume=323|issue=6088|pages=533–536|year=1986}}</ref> 1933，Wan第一个<ref name="SCHIDHUB2"/> 用反向传播赢得国际模式识别竞赛。<ref name="wan1993">Eric A. Wan (1993). "Time series prediction by using a connectionist network with internal delay lines." In ''Proceedings of the Santa Fe Institute Studies in the Sciences of Complexity'', '''15''': p. 195. Addison-Wesley Publishing Co.</ref>

反向传播的权重更新可以通过[https://en.wikipedia.org/wiki/Stochastic_gradient_descent 随机梯度下降]完成，使用下面的等式：

第359行：第359行：

== 变体 ==

=== 数据处理的群方法（Group method of data handling） ===

−

数据处理的群方法(GMDH) <ref name="ivak1968">{{cite journal|year=1968|title=The group method of data handling – a rival of the method of stochastic approximation|url=|journal=Soviet Automatic Control|volume=13|issue=3|pages=43–55|last1=Ivakhnenko|first1=Alexey Grigorevich~~|authorlink=https://en.wikipedia.org/wiki/Alexey_Grigorevich_Ivakhnenko~~}}</ref> 突出了全自动结构和参数化模型优化。结点激活函数是允许加法和乘法操作的[https://en.wikipedia.org/wiki/Andrey_Kolmogorov Kolmogorov]-Gabor多项式。它使用八层的深度前馈多层感知机<ref name="ivak1971">{{Cite journal|last=Ivakhnenko|first=Alexey|date=1971|title=Polynomial theory of complex systems|url=|journal=IEEE Transactions on Systems, Man and Cybernetics (4)|issue=4|pages=364–378|access-date=}}</ref> ，是一个逐层增长的[https://en.wikipedia.org/wiki/Supervised_learning 监督学习]网络，其中每层使用[https://en.wikipedia.org/wiki/Regression_analysis 回归分析]训练。使用验证集检测无用的项，通过[https://en.wikipedia.org/wiki/Regression_analysis 正则化]消除。结果网络的尺寸和深度取决于任务。<ref name="kondo2008">{{cite journal|last2=Ueno|first2=J.|date=|year=2008|title=Multi-layered GMDH-type neural network self-selecting optimum neural network architecture and its application to 3-dimensional medical image recognition of blood vessels|url=https://www.researchgate.net/publication/228402366_GMDH-Type_Neural_Network_Self-Selecting_Optimum_Neural_Network_Architecture_and_Its_Application_to_3-Dimensional_Medical_Image_Recognition_of_the_Lungs|journal=International Journal of Innovative Computing, Information and Control|volume=4|issue=1|pages=175–187|via=|last1=Kondo|first1=T.}}</ref>

+

数据处理的群方法(GMDH) <ref name="ivak1968">{{cite journal|year=1968|title=The group method of data handling – a rival of the method of stochastic approximation|url=|journal=Soviet Automatic Control|volume=13|issue=3|pages=43–55|last1=Ivakhnenko|first1=Alexey Grigorevich}}</ref> 突出了全自动结构和参数化模型优化。结点激活函数是允许加法和乘法操作的[https://en.wikipedia.org/wiki/Andrey_Kolmogorov Kolmogorov]-Gabor多项式。它使用八层的深度前馈多层感知机<ref name="ivak1971">{{Cite journal|last=Ivakhnenko|first=Alexey|date=1971|title=Polynomial theory of complex systems|url=|journal=IEEE Transactions on Systems, Man and Cybernetics (4)|issue=4|pages=364–378|access-date=}}</ref> ，是一个逐层增长的[https://en.wikipedia.org/wiki/Supervised_learning 监督学习]网络，其中每层使用[https://en.wikipedia.org/wiki/Regression_analysis 回归分析]训练。使用验证集检测无用的项，通过[https://en.wikipedia.org/wiki/Regression_analysis 正则化]消除。结果网络的尺寸和深度取决于任务。<ref name="kondo2008">{{cite journal|last2=Ueno|first2=J.|date=|year=2008|title=Multi-layered GMDH-type neural network self-selecting optimum neural network architecture and its application to 3-dimensional medical image recognition of blood vessels|url=https://www.researchgate.net/publication/228402366_GMDH-Type_Neural_Network_Self-Selecting_Optimum_Neural_Network_Architecture_and_Its_Application_to_3-Dimensional_Medical_Image_Recognition_of_the_Lungs|journal=International Journal of Innovative Computing, Information and Control|volume=4|issue=1|pages=175–187|via=|last1=Kondo|first1=T.}}</ref>

=== 卷积神经网络（Convolutional neural networks） ===

第371行：第371行：

长短期记忆 (LSTM) 网络是避免了[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]。<ref name=":03">{{Cite journal|last=Hochreiter|first=Sepp|last2=Schmidhuber|first2=Jürgen|date=1997-11-01|title=Long Short-Term Memory|url=http://www.mitpressjournals.org/doi/10.1162/neco.1997.9.8.1735|journal=Neural Computation|volume=9|issue=8|pages=1735–1780|via=}}</ref> LSTM通常被称为遗忘门的循环门扩展<ref name=":10">{{Cite journal|url=https://www.researchgate.net/publication/220320057_Learning_Precise_Timing_with_LSTM_Recurrent_Networks|title=Learning Precise Timing with LSTM Recurrent Networks (PDF Download Available)|journal=ResearchGate|language=en|access-date=2017-06-13|pp=115–143}}</ref>。 LSTM网络避免了反向传播误差的消失或爆炸。<ref name="HOCH19912"/> 误差可以通过在空间展开的LSTM中的无限制的虚层反向回流。也就是说，LSTM可以学习“非常深的学习”任务，<ref name="SCHIDHUB2" />这些任务需要记住上千甚至上百万离散时间步前的事件。问题特殊的LSTM形态的拓扑结构可以成为进化的LSTM，<ref>{{Cite journal|last=Bayer|first=Justin|last2=Wierstra|first2=Daan|last3=Togelius|first3=Julian|last4=Schmidhuber|first4=Jürgen|date=2009-09-14|title=Evolving Memory Cell Structures for Sequence Learning|url=https://link.springer.com/chapter/10.1007/978-3-642-04277-5_76|journal=Artificial Neural Networks – ICANN 2009|volume=5769|language=en|publisher=Springer, Berlin, Heidelberg|pages=755–764|series=Lecture Notes in Computer Science}}</ref> 能处理长延迟和混合高低频成分的信号。

大量LSTM RNN<ref>{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=Sequence labelling in structured domains with hierarchical recurrent neural networks|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.1887|journal=In Proc. 20th Int. Joint Conf. on Artificial In℡ligence, Ijcai 2007|pages=774–779}}</ref> 使用联结主义时间分类（CTC）训练，<ref name=":12">{{Cite journal|last=Graves|first=Alex|last2=Fernández|first2=Santiago|last3=Gomez|first3=Faustino|date=2006|title=Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.6306|journal=In Proceedings of the International Conference on Machine Learning, ICML 2006|pages=369–376}}</ref> 给定相应输入序列，可以找到一个最大化训练集中标记序列概率的RNN权重矩阵。CTC达到了校准和识别。

−

2003，LSTM开始在传统语音识别器中具有竞争力。<ref name="graves2003">{{Cite journal|url=Ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf|title=Biologically Plausible Speech Recognition with LSTM Neural Nets|last=Graves|first=Alex|last2=Eck|first2=Douglas|date=2003|journal=1st Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland|pages=175–184|archive-url=|archive-date=|dead-url=|access-date=|last3=Beringer|first3=Nicole|last4=Schmidhuber|first4=Jürgen~~|authorlink4=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber~~}}</ref>2007，与CTC的结合在语音数据上达到了第一个良好的结果。<ref name="fernandez2007keyword">{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting|url=http://dl.acm.org/citation.cfm?id=1778066.1778092|journal=Proceedings of the 17th International Conference on Artificial Neural Networks|series=ICANN'07|location=Berlin, Heidelberg|publisher=Springer-Verlag|pages=220–229}}</ref>2009，一个CTC训练的LSTM成为第一个赢得模式识别比赛的RNN，当它赢得了几个连笔[https://en.wikipedia.org/wiki/Handwriting_recognition 手写识别]比赛。<ref name="SCHIDHUB2" /><ref name="graves20093"/>2014，[https://en.wikipedia.org/wiki/Baidu 百度]使用CTC训练的RNN打破了Switchboard Hub5'00语音识别在基准测试数据集上的表现，而没有使用传统语音处理方法。<ref name="hannun2014">{{cite journal|last=Hannun|first=Awni|last2=Case|first2=Carl|last3=Casper|first3=Jared|last4=Catanzaro|first4=Bryan|last5=Diamos|first5=Greg|last6=Elsen|first6=Erich|last7=Prenger|first7=Ryan|last8=Satheesh|first8=Sanjeev|last9=Sengupta|first9=Shubho|date=2014-12-17|title=Deep Speech: Scaling up end-to-end speech recognition|url=https://arxiv.org/abs/1412.5567}}</ref> LSTM也提高了大量词汇语音识别，<ref name="sak2014">{{Cite journal|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling|last=Sak|first=Hasim|last2=Senior|first2=Andrew|date=2014|archive-url=|archive-date=|dead-url=|access-date=|last3=Beaufays|first3=Francoise}}</ref><ref name="liwu2015">{{cite journal|last=Li|first=Xiangang|last2=Wu|first2=Xihong|date=2014-10-15|title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition|url=https://arxiv.org/abs/1410.4281}}</ref>文本到语音合成，<ref>{{Cite journal|url=https://www.researchgate.net/publication/287741874_TTS_synthesis_with_bidirectional_LSTM_based_Recurrent_Neural_Networks|title=TTS synthesis with bidirectional LSTM based Recurrent Neural Networks|last=Fan|first=Y.|last2=Qian|first2=Y.|date=2014|journal=ResearchGate|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-06-13|last3=Xie|first3=F.|last4=Soong|first4=F. K.}}</ref> 对谷歌安卓<ref name="scholarpedia2"/><ref name="zen2015">{{Cite journal|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis|last=Zen|first=Heiga|last2=Sak|first2=Hasim|date=2015|journal=Google.com|publisher=ICASSP|pages=4470–4474|archive-url=|archive-date=|dead-url=|access-date=}}</ref>和真实图片的传声头像。<ref name="fan2015">{{Cite journal|last=Fan|first=Bo|last2=Wang|first2=Lijuan|last3=Soong|first3=Frank K.|last4=Xie|first4=Lei|date=2015|title=Photo-Real Talking Head with Deep Bidirectional LSTM|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2015/04/icassp2015_fanbo_1009.pdf|journal=Proceedings of ICASSP|volume=|pages=|via=}}</ref>2015，谷歌的语音识别通过CTC训练的LSTM提高了49%的性能。<ref name="sak2015">{{Cite journal|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html|title=Google voice search: faster and more accurate|last=Sak|first=Haşim|last2=Senior|first2=Andrew|date=September 2015|archive-url=|archive-date=|dead-url=|access-date=|last3=Rao|first3=Kanishka|last4=Beaufays|first4=Françoise|last5=Schalkwyk|first5=Johan}}</ref>

+

2003，LSTM开始在传统语音识别器中具有竞争力。<ref name="graves2003">{{Cite journal|url=Ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf|title=Biologically Plausible Speech Recognition with LSTM Neural Nets|last=Graves|first=Alex|last2=Eck|first2=Douglas|date=2003|journal=1st Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland|pages=175–184|archive-url=|archive-date=|dead-url=|access-date=|last3=Beringer|first3=Nicole|last4=Schmidhuber|first4=Jürgen}}</ref>2007，与CTC的结合在语音数据上达到了第一个良好的结果。<ref name="fernandez2007keyword">{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting|url=http://dl.acm.org/citation.cfm?id=1778066.1778092|journal=Proceedings of the 17th International Conference on Artificial Neural Networks|series=ICANN'07|location=Berlin, Heidelberg|publisher=Springer-Verlag|pages=220–229}}</ref>2009，一个CTC训练的LSTM成为第一个赢得模式识别比赛的RNN，当它赢得了几个连笔[https://en.wikipedia.org/wiki/Handwriting_recognition 手写识别]比赛。<ref name="SCHIDHUB2" /><ref name="graves20093"/>2014，[https://en.wikipedia.org/wiki/Baidu 百度]使用CTC训练的RNN打破了Switchboard Hub5'00语音识别在基准测试数据集上的表现，而没有使用传统语音处理方法。<ref name="hannun2014">{{cite journal|last=Hannun|first=Awni|last2=Case|first2=Carl|last3=Casper|first3=Jared|last4=Catanzaro|first4=Bryan|last5=Diamos|first5=Greg|last6=Elsen|first6=Erich|last7=Prenger|first7=Ryan|last8=Satheesh|first8=Sanjeev|last9=Sengupta|first9=Shubho|date=2014-12-17|title=Deep Speech: Scaling up end-to-end speech recognition|url=https://arxiv.org/abs/1412.5567}}</ref> LSTM也提高了大量词汇语音识别，<ref name="sak2014">{{Cite journal|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling|last=Sak|first=Hasim|last2=Senior|first2=Andrew|date=2014|archive-url=|archive-date=|dead-url=|access-date=|last3=Beaufays|first3=Francoise}}</ref><ref name="liwu2015">{{cite journal|last=Li|first=Xiangang|last2=Wu|first2=Xihong|date=2014-10-15|title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition|url=https://arxiv.org/abs/1410.4281}}</ref>文本到语音合成，<ref>{{Cite journal|url=https://www.researchgate.net/publication/287741874_TTS_synthesis_with_bidirectional_LSTM_based_Recurrent_Neural_Networks|title=TTS synthesis with bidirectional LSTM based Recurrent Neural Networks|last=Fan|first=Y.|last2=Qian|first2=Y.|date=2014|journal=ResearchGate|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-06-13|last3=Xie|first3=F.|last4=Soong|first4=F. K.}}</ref> 对谷歌安卓<ref name="scholarpedia2"/><ref name="zen2015">{{Cite journal|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis|last=Zen|first=Heiga|last2=Sak|first2=Hasim|date=2015|journal=Google.com|publisher=ICASSP|pages=4470–4474|archive-url=|archive-date=|dead-url=|access-date=}}</ref>和真实图片的传声头像。<ref name="fan2015">{{Cite journal|last=Fan|first=Bo|last2=Wang|first2=Lijuan|last3=Soong|first3=Frank K.|last4=Xie|first4=Lei|date=2015|title=Photo-Real Talking Head with Deep Bidirectional LSTM|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2015/04/icassp2015_fanbo_1009.pdf|journal=Proceedings of ICASSP|volume=|pages=|via=}}</ref>2015，谷歌的语音识别通过CTC训练的LSTM提高了49%的性能。<ref name="sak2015">{{Cite journal|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html|title=Google voice search: faster and more accurate|last=Sak|first=Haşim|last2=Senior|first2=Andrew|date=September 2015|archive-url=|archive-date=|dead-url=|access-date=|last3=Rao|first3=Kanishka|last4=Beaufays|first4=Françoise|last5=Schalkwyk|first5=Johan}}</ref>

−

LSTM在[https://en.wikipedia.org/wiki/Natural_Language_Processing 自然语言处理]中变得受欢迎。不像之前基于[https://en.wikipedia.org/wiki/Hidden_Markov_model 隐式马尔科夫模型]和相似概念的模型，LSTM可以学习识别[https://en.wikipedia.org/wiki/Context-sensitive_languages 上下文有关语言]。<ref name="gers2001">{{cite journal|last2=Schmidhuber|first2=Jürgen|year=2001|title=LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages|url=|journal=IEEE Transactions on Neural Networks|volume=12|issue=6|pages=1333–1340|last1=Gers|first1=Felix A.~~|authorlink2=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber~~}}</ref>LSTM提高了机器翻译，<ref>{{cite journal | last=Huang | first=Jie | last2=Zhou | first2=Wengang | last3=Zhang | first3=Qilin | last4=Li | first4=Houqiang | last5=Li | first5=Weiping | title=Video-based Sign Language Recognition without Temporal Segmentation | date=2018-01-30 | url=https://arxiv.org/pdf/1801.10111.pdf}}</ref><ref name="NIPS2014">{{Cite journal|last=Sutskever|first=L.|last2=Vinyals|first2=O.|last3=Le|first3=Q.|date=2014|title=Sequence to Sequence Learning with Neural Networks|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|journal=NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems |volume=2 |pages=3104–3112 |url=https://arxiv.org/abs/1409.3215}}</ref>[https://en.wikipedia.org/wiki/Language_modeling 语言建模]<ref name="vinyals2016">{{cite journal|last=Jozefowicz|first=Rafal|last2=Vinyals|first2=Oriol|last3=Schuster|first3=Mike|last4=Shazeer|first4=Noam|last5=Wu|first5=Yonghui|date=2016-02-07|title=Exploring the Limits of Language Modeling|url=https://arxiv.org/abs/1602.02410}}</ref>和多语言语言处理。<ref name="gillick2015">{{cite journal|last=Gillick|first=Dan|last2=Brunk|first2=Cliff|last3=Vinyals|first3=Oriol|last4=Subramanya|first4=Amarnag|date=2015-11-30|title=Multilingual Language Processing From Bytes|url=https://arxiv.org/abs/1512.00103}}</ref>与CNN结合的LSTM提高了自动图像字幕标记。<ref name="vinyals2015">{{cite journal|last=Vinyals|first=Oriol|last2=Toshev|first2=Alexander|last3=Bengio|first3=Samy|last4=Erhan|first4=Dumitru|date=2014-11-17|title=Show and Tell: A Neural Image Caption Generator|url=https://arxiv.org/abs/1411.4555}}</ref>

+

LSTM在[https://en.wikipedia.org/wiki/Natural_Language_Processing 自然语言处理]中变得受欢迎。不像之前基于[https://en.wikipedia.org/wiki/Hidden_Markov_model 隐式马尔科夫模型]和相似概念的模型，LSTM可以学习识别[https://en.wikipedia.org/wiki/Context-sensitive_languages 上下文有关语言]。<ref name="gers2001">{{cite journal|last2=Schmidhuber|first2=Jürgen|year=2001|title=LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages|url=|journal=IEEE Transactions on Neural Networks|volume=12|issue=6|pages=1333–1340|last1=Gers|first1=Felix A.}}</ref>LSTM提高了机器翻译，<ref>{{cite journal | last=Huang | first=Jie | last2=Zhou | first2=Wengang | last3=Zhang | first3=Qilin | last4=Li | first4=Houqiang | last5=Li | first5=Weiping | title=Video-based Sign Language Recognition without Temporal Segmentation | date=2018-01-30 | url=https://arxiv.org/pdf/1801.10111.pdf}}</ref><ref name="NIPS2014">{{Cite journal|last=Sutskever|first=L.|last2=Vinyals|first2=O.|last3=Le|first3=Q.|date=2014|title=Sequence to Sequence Learning with Neural Networks|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|journal=NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems |volume=2 |pages=3104–3112 |url=https://arxiv.org/abs/1409.3215}}</ref>[https://en.wikipedia.org/wiki/Language_modeling 语言建模]<ref name="vinyals2016">{{cite journal|last=Jozefowicz|first=Rafal|last2=Vinyals|first2=Oriol|last3=Schuster|first3=Mike|last4=Shazeer|first4=Noam|last5=Wu|first5=Yonghui|date=2016-02-07|title=Exploring the Limits of Language Modeling|url=https://arxiv.org/abs/1602.02410}}</ref>和多语言语言处理。<ref name="gillick2015">{{cite journal|last=Gillick|first=Dan|last2=Brunk|first2=Cliff|last3=Vinyals|first3=Oriol|last4=Subramanya|first4=Amarnag|date=2015-11-30|title=Multilingual Language Processing From Bytes|url=https://arxiv.org/abs/1512.00103}}</ref>与CNN结合的LSTM提高了自动图像字幕标记。<ref name="vinyals2015">{{cite journal|last=Vinyals|first=Oriol|last2=Toshev|first2=Alexander|last3=Bengio|first3=Samy|last4=Erhan|first4=Dumitru|date=2014-11-17|title=Show and Tell: A Neural Image Caption Generator|url=https://arxiv.org/abs/1411.4555}}</ref>

第657行：第657行：

* {{Cite book|url=https://www.worldcat.org/oclc/837524179|title=Computational intelligence : a methodological introduction|first1=Rudolf,|last=Kruse|first2=Christian|last2=Borgelt|first3=F.|last3=Klawonn|first4=Christian|last4=Moewes|first5=Matthias|last5=Steinbrecher|first6=Pascal|last6=Held,|year=2013|publisher=Springer}}

* {{Cite book|url=https://www.worldcat.org/oclc/32179420|title=Introduction to neural networks : design, theory and applications|last=Lawrence|first=Jeanette|year=1994|publisher=California Scientific Software}}

−

* {{cite book| last=MacKay | first=David, J.C.~~| authorlink=https://en.wikipedia.org/wiki/David_J.C._MacKay~~|year=2003|publisher=[https://en.wikipedia.org/wiki/Cambridge_University_Press Cambridge University Press]| url=http://www.inference.phy.cam.ac.uk/itprnn/book.pdf|title=Information Theory, Inference, and Learning Algorithms|ref=harv}}

+

* {{cite book| last=MacKay | first=David, J.C.|year=2003|publisher=[https://en.wikipedia.org/wiki/Cambridge_University_Press Cambridge University Press]| url=http://www.inference.phy.cam.ac.uk/itprnn/book.pdf|title=Information Theory, Inference, and Learning Algorithms|ref=harv}}

* {{Cite book|url=https://www.worldcat.org/oclc/29877717|title=Signal and image processing with neural networks : a C++ sourcebook|first=Timothy|last=Masters,|year=1994|publisher=J. Wiley}}

−

* {{cite book|url=https://books.google.com/books?id=m12UR8QmLqoC|title=Pattern Recognition and Neural Networks|last=Ripley|first=Brian D.~~|authorlink=https://en.wikipedia.org/wiki/Brian_D._Ripley~~|publisher=Cambridge University Press|year=2007}}

+

* {{cite book|url=https://books.google.com/books?id=m12UR8QmLqoC|title=Pattern Recognition and Neural Networks|last=Ripley|first=Brian D.|publisher=Cambridge University Press|year=2007}}

* {{cite journal|last1=Siegelmann |first1=H.T. |first2=Eduardo D.|last2=Sontag|year=1994|title=Analog computation via neural networks |journal=Theoretical Computer Science |volume= 131 |issue= 2 |pp=331–360|url=https://pdfs.semanticscholar.org/861e/de32115d157e1568622b153e7ed3dca28467.pdf}}

* {{Cite book|url=https://www.worldcat.org/oclc/27145760|title=Neural networks for statistical modeling|last1=Smith |first1=Murray|date=1993|publisher=Van Nostrand Reinhold}}

薄荷

7,129

个编辑

更改

人工神经网络 (查看源代码)

2022年1月9日 (日) 12:09的版本

导航菜单

搜索