更改
→反向传播(Backpropagation)
=== 反向传播(Backpropagation) ===
=== 反向传播(Backpropagation) ===
[https://en.wikipedia.org/wiki/Paul_Werbos Werbos]的[https://en.wikipedia.org/wiki/Backpropagation 反向传播]算法重新燃起了人们对于神经网络和学习的兴趣,它有效地解决了异或问题并且更普遍地加速了多层网络的训练。反向传播通过修改每个节点的权重,反向分散了贯穿层中的误差项<ref name="Werbos 1975" />
[https://en.wikipedia.org/wiki/Paul_Werbos Werbos]的[https://en.wikipedia.org/wiki/Backpropagation 反向传播]算法重新燃起了人们对于神经网络和学习的兴趣,它有效地解决了异或问题并且更普遍地加速了多层网络的训练。反向传播通过修改每个节点的权重,反向分散了贯穿每个层中的梯度<ref name="Werbos 1975" />
。
。
在19世纪80年代中期,并行分布处理以[https://en.wikipedia.org/wiki/Connectionism 联结主义]的名义变得受欢迎,[https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart]和[https://en.wikipedia.org/wiki/James_McClelland_(psychologist) McClelland]描述了联结主义模拟神经过程的作用。<ref>{{cite book|url={{google books |plainurl=y |id=davmLgzusB8C}}|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|last=Rumelhart|first=D.E|first2=James|publisher=MIT Press|year=1986|isbn=978-0-262-63110-5|location=Cambridge|pages=|author2=McClelland}}</ref>
在19世纪80年代中期,并行分布处理以[https://en.wikipedia.org/wiki/Connectionism 联结主义]的名义变得受欢迎,[https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart]和[https://en.wikipedia.org/wiki/James_McClelland_(psychologist) McClelland]描述了联结主义模拟神经过程的作用。<ref>{{cite book|url={{google books |plainurl=y |id=davmLgzusB8C}}|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|last=Rumelhart|first=D.E|first2=James|publisher=MIT Press|year=1986|isbn=978-0-262-63110-5|location=Cambridge|pages=|author2=McClelland}}</ref>
[https://en.wikipedia.org/wiki/Support_vector_machine 支持向量机(Support vector machine)]和其他更简单的方法如[https://en.wikipedia.org/wiki/Linear_classifier 线性分类器]在机器学习中的受欢迎程度逐步超过了神经网络。然而,使用神经网络改变了一些领域,例如蛋白质结构的预测。<ref>{{cite article|id=Qian1988|title=
[https://en.wikipedia.org/wiki/Support_vector_machine 支持向量机(Support vector machine)]和其他更简单的方法如[https://en.wikipedia.org/wiki/Linear_classifier 线性分类器]在机器学习中的受欢迎程度逐步超过了神经网络。然而,使用神经网络改变了一些领域,例如蛋白质结构的预测。<ref>{{cite article|id=Qian1988|title=
Predicting the secondary structure of globular proteins using neural network models. |last=Qian|first=N.|last2=Sejnowski|first2=T.J.|journal=Journal of Molecular Biology|volume=202|pages=865-884|year=1988}}</ref><ref>{{cite article|id=Rost1993|title=
Predicting the secondary structure of globular proteins using neural network models. |last=Qian|first=N.|last2=Sejnowski|first2=T.J.|journal=Journal of Molecular Biology|volume=202|pages=865-884|year=1988}}</ref><ref>{{cite article|id=Rost1993|title=
Prediction of protein secondary structure at better than 70% accuracy |last=Rost|first=B.|last2=Sander|first2=C.|journal=Journal of Molecular Biology|volume=232|pages=584-599|year=1993}}</ref>
Prediction of protein secondary structure at better than 70% accuracy |last=Rost|first=B.|last2=Sander|first2=C.|journal=Journal of Molecular Biology|volume=232|pages=584-599|year=1993}}</ref>
1992年[https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer 最大池化]被引入帮助最小转移不变性和最大容忍性来变形,有助于3D物体识别。<ref name="Weng1992">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf Cresceptron: a self-organizing neural network which grows adaptively]," ''Proc. International Joint Conference on Neural Networks'', Baltimore, Maryland, vol I, pp. 576–581, June, 1992.</ref><ref name="Weng19932">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf Learning recognition and segmentation of 3-D objects from 2-D images]," ''Proc. 4th International Conf. Computer Vision'', Berlin, Germany, pp. 121–128, May, 1993.</ref><ref name="Weng1997">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCV.pdf Learning recognition and segmentation using the Cresceptron]," ''International Journal of Computer Vision'', vol. 25, no. 2, pp. 105–139, Nov. 1997.</ref>
1992年[https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer 最大池化]被引入帮助最小转移不变性和最大容忍性来变形,有助于3D物体识别。<ref name="Weng1992">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf Cresceptron: a self-organizing neural network which grows adaptively]," ''Proc. International Joint Conference on Neural Networks'', Baltimore, Maryland, vol I, pp. 576–581, June, 1992.</ref><ref name="Weng19932">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf Learning recognition and segmentation of 3-D objects from 2-D images]," ''Proc. 4th International Conf. Computer Vision'', Berlin, Germany, pp. 121–128, May, 1993.</ref><ref name="Weng1997">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCV.pdf Learning recognition and segmentation using the Cresceptron]," ''International Journal of Computer Vision'', vol. 25, no. 2, pp. 105–139, Nov. 1997.</ref>
2010年,通过[https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer 最大池化]训练的反向传播训练被GPU加速,显示出超过其他池化变体的性能。<ref name="Scherer2010">Dominik Scherer, Andreas C. Müller, and Sven Behnke: "[https://www.ais.uni-bonn.de/papers/icann2010_maxpool.pdf Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition]," ''In 20th International Conference Artificial Neural Networks (ICANN)'', pp. 92–101, 2010. {{doi|10.1007/978-3-642-15825-4_10}}.</ref>
2010年,通过[https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer 最大池化]训练的反向传播训练被GPU加速,显示出超过其他池化变体的性能。<ref name="Scherer2010">Dominik Scherer, Andreas C. Müller, and Sven Behnke: "[https://www.ais.uni-bonn.de/papers/icann2010_maxpool.pdf Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition]," ''In 20th International Conference Artificial Neural Networks (ICANN)'', pp. 92–101, 2010. {{doi|10.1007/978-3-642-15825-4_10}}.</ref>
为了解决这个问题,[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Schmidhuber]采用了一种多层网络结构,通过[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习]每次预训练一级然后使用反向传播很好地调整<ref name="SCHMID1992">J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," ''Neural Computation'', 4, pp. 234–242, 1992.</ref>。Behnke在例如图像重建和人脸定位中只依赖梯度符号。<ref>{{cite book|url=http://www.ais.uni-bonn.de/books/LNCS2766.pdf|title=Hierarchical Neural Networks for Image Interpretation.|publisher=Springer|year=2003|series=Lecture Notes in Computer Science|volume=2766|author=Sven Behnke}}</ref>
[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]影响使用反向传播的多层[前馈神经网络https://en.wikipedia.org/wiki/Feedforward_neural_network 前馈神经网络] 和[https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络](RNN)。<ref name="HOCH19912">S. Hochreiter., "[http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf Untersuchungen zu dynamischen neuronalen Netzen]," ''Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber'', 1991.</ref><ref name="HOCH2001">{{cite book|url={{google books |plainurl=y |id=NWOcMVA64aAC}}|title=A Field Guide to Dynamical Recurrent Networks|last=Hochreiter|first=S.|last2=et al.|date=15 January 2001|publisher=John Wiley & Sons|year=|isbn=978-0-7803-5369-5|location=|pages=|chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies|editor-last2=Kremer|editor-first2=Stefan C.|editor-first1=John F.|editor-last1=Kolen}}</ref> 由于梯度从一层到另一层传播,它们随着层数指数级减小,这样阻碍了依赖这些误差的的神经元权重的调整,尤其影响深度网络。
为了解决这个问题,[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Schmidhuber]采用了一种多层网络结构,通过[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习]每次预训练一级然后使用反向传播很好地调整<ref name="SCHMID1992">J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," ''Neural Computation'', 4, pp. 234–242, 1992.</ref>。例如,Behnke 在图像重建和人脸定位中只依赖梯度符号。<ref>{{cite book|url=http://www.ais.uni-bonn.de/books/LNCS2766.pdf|title=Hierarchical Neural Networks for Image Interpretation.|publisher=Springer|year=2003|series=Lecture Notes in Computer Science|volume=2766|author=Sven Behnke}}</ref>
[https://en.wikipedia.org/wiki/Geoffrey_Hinton Hinton]提出了使用连续层的二进制或潜变量实数[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机]<ref name="smolensky1986">{{cite book|url=http://portal.acm.org/citation.cfm?id=104290|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|year=1986|editors=D. E. Rumelhart, J. L. McClelland, & the PDP Research Group|volume=1|pages=194–281|chapter=Information processing in dynamical systems: Foundations of harmony theory.|last1=Smolensky|first1=P.|authorlink1=Paul Smolensky}}</ref>来模拟每一层学习一种高级别表征。一旦很多层被充分学习,这种深度结构可能像[https://en.wikipedia.org/wiki/Generative_model 生成模型]一样被使用,通过在下采样(一个古老的方法)模型时从顶层特征激活处复制数据。<ref name="hinton2006">{{cite journal|last2=Osindero|first2=S.|last3=Teh|first3=Y.|year=2006|title=A fast learning algorithm for deep belief nets|url=http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf|journal=[[Neural Computation (journal)|Neural Computation]]|volume=18|issue=7|pages=1527–1554|doi=10.1162/neco.2006.18.7.1527|pmid=16764513|last1=Hinton|first1=G. E.|authorlink1=Geoffrey Hinton}}</ref><ref>{{Cite journal|year=2009|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|volume=4|issue=5|pages=5947|doi=10.4249/scholarpedia.5947|pmc=|pmid=|last1=Hinton|first1=G.|bibcode=2009SchpJ...4.5947H}}</ref> 2012年[https://en.wikipedia.org/wiki/Andrew_Ng Ng]和[https://en.wikipedia.org/wiki/Jeff_Dean_(computer_scientist) Dean]创造了一个只通过看[https://en.wikipedia.org/wiki/YouTube YouTube]视频中未标记的图像学习识别例如猫这样更高层概念的网络。<ref name="ng2012">{{cite arXiv|eprint=1112.6209|first2=Jeff|last2=Dean|title=Building High-level Features Using Large Scale Unsupervised Learning|last1=Ng|first1=Andrew|year=2012|class=cs.LG}}</ref>
[https://en.wikipedia.org/wiki/Geoffrey_Hinton Hinton]提出了使用连续层的二进制或潜变量实数[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机]<ref name="smolensky1986">{{cite book|url=http://portal.acm.org/citation.cfm?id=104290|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|year=1986|editors=D. E. Rumelhart, J. L. McClelland, & the PDP Research Group|volume=1|pages=194–281|chapter=Information processing in dynamical systems: Foundations of harmony theory.|last1=Smolensky|first1=P.|authorlink1=Paul Smolensky}}</ref>来模拟每一层学习一种高级别表征。一旦很多层被充分学习,这种深度结构可能像[https://en.wikipedia.org/wiki/Generative_model 生成模型]一样被使用,通过在下采样(一个古老的方法)模型时从顶层特征激活处复制数据。<ref name="hinton2006">{{cite journal|last2=Osindero|first2=S.|last3=Teh|first3=Y.|year=2006|title=A fast learning algorithm for deep belief nets|url=http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf|journal=[[Neural Computation (journal)|Neural Computation]]|volume=18|issue=7|pages=1527–1554|doi=10.1162/neco.2006.18.7.1527|pmid=16764513|last1=Hinton|first1=G. E.|authorlink1=Geoffrey Hinton}}</ref><ref>{{Cite journal|year=2009|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|volume=4|issue=5|pages=5947|doi=10.4249/scholarpedia.5947|pmc=|pmid=|last1=Hinton|first1=G.|bibcode=2009SchpJ...4.5947H}}</ref> 2012年[https://en.wikipedia.org/wiki/Andrew_Ng Ng] 和[https://en.wikipedia.org/wiki/Jeff_Dean_(computer_scientist) Dean]创造了一个只通过看[https://en.wikipedia.org/wiki/YouTube YouTube]视频中未标记的图像学习识别例如猫这样更高层概念的网络。<ref name="ng2012">{{cite arXiv|eprint=1112.6209|first2=Jeff|last2=Dean|title=Building High-level Features Using Large Scale Unsupervised Learning|last1=Ng|first1=Andrew|year=2012|class=cs.LG}}</ref>
在训练深度神经网络中早期的挑战被成功地用无监督预训练等方法处理,与此同时可见的计算性能通过GPU和分布计算的使用提升。神经网络被部署在大规模,尤其是在图像和视觉识别问题上。这被称为“[https://en.wikipedia.org/wiki/Deep_learning 深度学习]”
在训练深度神经网络中早期的挑战被成功地用无监督预训练等方法处理,与此同时可见的计算性能通过GPU和分布计算的使用提升。神经网络被部署在大规模,尤其是在图像和视觉识别问题上。这被称为“[https://en.wikipedia.org/wiki/Deep_learning 深度学习]”