更改

删除791字节 、 2018年8月27日 (一) 16:49
第17行: 第17行:     
== 历史 ==
 
== 历史 ==
[https://en.wikipedia.org/wiki/Warren_McCulloch Warren McCulloch] 和 [https://en.wikipedia.org/wiki/Walter_Pitts Walter Pitts]<ref>{{cite journal|last=McCulloch|first=Warren|author2=Walter Pitts|title=A Logical Calculus of Ideas Immanent in Nervous Activity|journal=Bulletin of Mathematical Biophysics|year=1943|volume=5|pages=115–133|doi=10.1007/BF02478259|issue=4}}</ref> 构造了一个关于基于[https://en.wikipedia.org/wiki/Mathematics 数学]和[https://en.wikipedia.org/wiki/Algorithm 算法]的神经网络计算模型,称为阈值逻辑。这个模型为神经网络研究铺平了两条道路。一个关注大脑中的生物学过程,而另一个关注神经网络向[https://en.wikipedia.org/wiki/Artificial_intelligence 人工智能]的应用。这个工作引领了神经网络的工作以及他们与[https://en.wikipedia.org/wiki/Finite_state_machine 有限状态机(Finite state machine)]的联系<ref>{{Cite news|url=https://www.degruyter.com/view/books/9781400882618/9781400882618-002/9781400882618-002.xml|title=Representation of Events in Nerve Nets and Finite Automata|last=Kleene|first=S.C.|date=|work=Annals of Mathematics Studies|access-date=2017-06-17|archive-url=|archive-date=|dead-url=|publisher=Princeton University Press|year=1956|issue=34|pages=3–41|language=en}}</ref>。
+
[https://en.wikipedia.org/wiki/Warren_McCulloch Warren McCulloch] 和 [https://en.wikipedia.org/wiki/Walter_Pitts Walter Pitts]<ref>{{cite journal|last=McCulloch|first=Warren|author2=Walter Pitts|title=A Logical Calculus of Ideas Immanent in Nervous Activity|journal=Bulletin of Mathematical Biophysics|year=1943|volume=5|pages=115–133|issue=4}}</ref> 构造了一个关于基于[https://en.wikipedia.org/wiki/Mathematics 数学]和[https://en.wikipedia.org/wiki/Algorithm 算法]的神经网络计算模型,称为阈值逻辑。这个模型为神经网络研究铺平了两条道路。一个关注大脑中的生物学过程,而另一个关注神经网络向[https://en.wikipedia.org/wiki/Artificial_intelligence 人工智能]的应用。这个工作引领了神经网络的工作以及他们与[https://en.wikipedia.org/wiki/Finite_state_machine 有限状态机(Finite state machine)]的联系<ref>{{Cite news|url=https://www.degruyter.com/view/books/9781400882618/9781400882618-002/9781400882618-002.xml|title=Representation of Events in Nerve Nets and Finite Automata|last=Kleene|first=S.C.|date=|work=Annals of Mathematics Studies|access-date=2017-06-17|archive-url=|archive-date=|dead-url=|publisher=Princeton University Press|year=1956|issue=34|pages=3–41|language=en}}</ref>。
    
=== 赫布学习(Hebbian learning)===
 
=== 赫布学习(Hebbian learning)===
在19世纪40年代晚期,[https://en.wikipedia.org/wiki/Donald_O._Hebb D.O.Hebb]<ref>{{cite book|url={{google books |plainurl=y |id=ddB4AgAAQBAJ}}|title=The Organization of Behavior|last=Hebb|first=Donald|publisher=Wiley|year=1949|isbn=978-1-135-63190-1|location=New York|pages=}}</ref> 基于[https://en.wikipedia.org/wiki/Neuroplasticity 神经可塑性]的机制构造了一个学习假设,被称为[https://en.wikipedia.org/wiki/Hebbian_learning 赫布学习]。赫布学习是[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习(unsupervised learning)]。这形成了[https://en.wikipedia.org/wiki/Long_term_potentiation 长程增强效应]模型。在1948年,研究者开始将这些想法和[https://en.wikipedia.org/wiki/Unorganized_machine B类图灵机]应用到计算模型上。
+
在19世纪40年代晚期,[https://en.wikipedia.org/wiki/Donald_O._Hebb D.O.Hebb]<ref>{{cite book|url={{google books |plainurl=y |id=ddB4AgAAQBAJ}}|title=The Organization of Behavior|last=Hebb|first=Donald|publisher=Wiley|year=1949|location=New York|pages=}}</ref> 基于[https://en.wikipedia.org/wiki/Neuroplasticity 神经可塑性]的机制构造了一个学习假设,被称为[https://en.wikipedia.org/wiki/Hebbian_learning 赫布学习]。赫布学习是[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习(unsupervised learning)]。这形成了[https://en.wikipedia.org/wiki/Long_term_potentiation 长程增强效应]模型。在1948年,研究者开始将这些想法和[https://en.wikipedia.org/wiki/Unorganized_machine B类图灵机]应用到计算模型上。
   −
Farley 和[https://en.wikipedia.org/wiki/Wesley_A._Clark Clark]<ref>{{cite journal|last=Farley|first=B.G.|author2=W.A. Clark|title=Simulation of Self-Organizing Systems by Digital Computer|journal=IRE Transactions on Information Theory|year=1954|volume=4|pages=76–84|doi=10.1109/TIT.1954.1057468|issue=4}}</ref> 首先使用计算机器,后来称作“计算器”,来模拟赫布网络。其他神经网络计算机器被[https://en.wikipedia.org/wiki/Nathaniel_Rochester_(computer_scientist) Rochester]Holland, Habit 和 Duda创造<ref>{{cite journal|last=Rochester|first=N. |author2=J.H. Holland |author3=L.H. Habit |author4=W.L. Duda|title=Tests on a cell assembly theory of the action of the brain, using a large digital computer|journal=IRE Transactions on Information Theory|year=1956|volume=2|pages=80–93|doi=10.1109/TIT.1956.1056810|issue=3}}</ref>.
+
Farley 和[https://en.wikipedia.org/wiki/Wesley_A._Clark Clark]<ref>{{cite journal|last=Farley|first=B.G.|author2=W.A. Clark|title=Simulation of Self-Organizing Systems by Digital Computer|journal=IRE Transactions on Information Theory|year=1954|volume=4|pages=76–84|issue=4}}</ref> 首先使用计算机器,后来称作“计算器”,来模拟赫布网络。其他神经网络计算机器被[https://en.wikipedia.org/wiki/Nathaniel_Rochester_(computer_scientist) Rochester]Holland, Habit 和 Duda创造<ref>{{cite journal|last=Rochester|first=N. |author2=J.H. Holland |author3=L.H. Habit |author4=W.L. Duda|title=Tests on a cell assembly theory of the action of the brain, using a large digital computer|journal=IRE Transactions on Information Theory|year=1956|volume=2|pages=80–93|issue=3}}</ref>.
   −
[https://en.wikipedia.org/wiki/Frank_Rosenblatt Rosenblatt]<ref>{{cite journal|last=Rosenblatt|first=F.|title=The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain|journal=Psychological Review|year=1958|volume=65|pages=386–408|doi=10.1037/h0042519|pmid=13602029|issue=6|citeseerx=10.1.1.588.3775}}</ref> 创造了[https://en.wikipedia.org/wiki/Perceptron 感知机],这是一种模式识别算法。Rosenblatt 使用数学记号描述了无法用基本感知机识别的逻辑电路,如那时无法被神经网络处理的异或电路<ref name="Werbos 1975">{{cite book|url={{google books |plainurl=y |id=z81XmgEACAAJ}}|title=Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences|last=Werbos|first=P.J.|publisher=|year=1975|isbn=|location=|pages=}}</ref>
+
[https://en.wikipedia.org/wiki/Frank_Rosenblatt Rosenblatt]<ref>{{cite journal|last=Rosenblatt|first=F.|title=The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain|journal=Psychological Review|year=1958|volume=65|pages=386–408|issue=6|citeseerx=10.1.1.588.3775}}</ref> 创造了[https://en.wikipedia.org/wiki/Perceptron 感知机],这是一种模式识别算法。Rosenblatt 使用数学记号描述了无法用基本感知机识别的逻辑电路,如那时无法被神经网络处理的异或电路<ref name="Werbos 1975">{{cite book|url={{google books |plainurl=y |id=z81XmgEACAAJ}}|title=Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences|last=Werbos|first=P.J.|publisher=|year=1975|location=|pages=}}</ref>
 
 
   −
1959年,[https://en.wikipedia.org/wiki/Nobel_laureate Nobel laureate][https://en.wikipedia.org/wiki/David_H._Hubel Hubel]和[https://en.wikipedia.org/wiki/Torsten_Wiesel Wiesel]在初级视皮层发现了两种类型的细胞:简单细胞(simple cell)和复杂细胞(complex cell)<ref>{{cite book|url=https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106|title=Brain and visual perception: the story of a 25-year collaboration|publisher=Oxford University Press US|year=2005|isbn=978-0-19-517618-6|page=106|author=David H. Hubel and Torsten N. Wiesel}}</ref>,并基于他们的发现提出了一个生物学模型,
+
1959年,[https://en.wikipedia.org/wiki/Nobel_laureate Nobel laureate][https://en.wikipedia.org/wiki/David_H._Hubel Hubel]和[https://en.wikipedia.org/wiki/Torsten_Wiesel Wiesel]在初级视皮层发现了两种类型的细胞:简单细胞(simple cell)和复杂细胞(complex cell)<ref>{{cite book|url=https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106|title=Brain and visual perception: the story of a 25-year collaboration|publisher=Oxford University Press US|year=2005|page=106|author=David H. Hubel and Torsten N. Wiesel}}</ref>,并基于他们的发现提出了一个生物学模型,
   −
第一个有多层的功能网络由[https://en.wikipedia.org/wiki/Alexey_Grigorevich_Ivakhnenko Ivakhnenko]和Lapa在1965年发表,它成为了[https://en.wikipedia.org/wiki/Group_method_of_data_handling 数据处理的组方法]<ref name="SCHIDHUB2">{{cite journal|last=Schmidhuber|first=J.|year=2015|title=Deep Learning in Neural Networks: An Overview|journal=Neural Networks|volume=61|pages=85–117|arxiv=1404.7828|doi=10.1016/j.neunet.2014.09.003|pmid=25462637}}</ref><ref name="ivak1965">{{cite book|url={{google books |plainurl=y |id=FhwVNQAACAAJ}}|title=Cybernetic Predicting Devices|last=Ivakhnenko|first=A. G.|publisher=CCM Information Corporation|year=1973}}</ref><ref name="ivak1967">{{cite book|url={{google books |plainurl=y |id=rGFgAAAAMAAJ}}|title=Cybernetics and forecasting techniques|last2=Grigorʹevich Lapa|first2=Valentin|publisher=American Elsevier Pub. Co.|year=1967|first1=A. G.|last1=Ivakhnenko}}</ref>
+
第一个有多层的功能网络由[https://en.wikipedia.org/wiki/Alexey_Grigorevich_Ivakhnenko Ivakhnenko]和Lapa在1965年发表,它成为了[https://en.wikipedia.org/wiki/Group_method_of_data_handling 数据处理的组方法]<ref name="SCHIDHUB2">{{cite journal|last=Schmidhuber|first=J.|year=2015|title=Deep Learning in Neural Networks: An Overview|journal=Neural Networks|volume=61|pages=85–117|url=https://arxiv.org/abs/1404.7828}}</ref><ref name="ivak1965">{{cite book|url={{google books |plainurl=y |id=FhwVNQAACAAJ}}|title=Cybernetic Predicting Devices|last=Ivakhnenko|first=A. G.|publisher=CCM Information Corporation|year=1973}}</ref><ref name="ivak1967">{{cite book|url={{google books |plainurl=y |id=rGFgAAAAMAAJ}}|title=Cybernetics and forecasting techniques|last2=Grigorʹevich Lapa|first2=Valentin|publisher=American Elsevier Pub. Co.|year=1967|first1=A. G.|last1=Ivakhnenko}}</ref>
   −
在发现了两个执行神经网络的计算机器关键问题的[https://en.wikipedia.org/wiki/Marvin_Minsky Minsky]和[https://en.wikipedia.org/wiki/Seymour_Papert Papert]<ref>{{cite book|url={{google books |plainurl=y |id=Ow1OAQAAIAAJ}}|title=Perceptrons: An Introduction to Computational Geometry|last=Minsky|first=Marvin|first2=Seymour|publisher=MIT Press|year=1969|isbn=0-262-63022-2|location=|pages=|author2=Papert}}</ref> 研究的[https://en.wikipedia.org/wiki/Machine_learning|机器学习]后,神经网络的研究停滞了。第一个是基本感知机不能处理异或电路。第二个是计算机没有足够的处理能力来有效地处理大型神经网络需要的任务。神经网络研究步伐放缓直到计算机具有了更好的运算能力。
+
在发现了两个执行神经网络的计算机器关键问题的[https://en.wikipedia.org/wiki/Marvin_Minsky Minsky]和[https://en.wikipedia.org/wiki/Seymour_Papert Papert]<ref>{{cite book|url={{google books |plainurl=y |id=Ow1OAQAAIAAJ}}|title=Perceptrons: An Introduction to Computational Geometry|last=Minsky|first=Marvin|first2=Seymour|publisher=MIT Press|year=1969|location=|pages=|author2=Papert}}</ref> 研究的[https://en.wikipedia.org/wiki/Machine_learning|机器学习]后,神经网络的研究停滞了。第一个是基本感知机不能处理异或电路。第二个是计算机没有足够的处理能力来有效地处理大型神经网络需要的任务。神经网络研究步伐放缓直到计算机具有了更好的运算能力。
    
更多的[https://en.wikipedia.org/wiki/Artificial_intelligence 人工智能]专注于[https://en.wikipedia.org/wiki/Algorithm 算法]执行的高层面(符号的)模型,以知识体现在如果-那么规则中的[https://en.wikipedia.org/wiki/Expert_system 专家系统]为特征。直到19世纪80年代末期,研究扩展到低层面(次符号的)[https://en.wikipedia.org/wiki/Machine_learning|机器学习],以知识体现在一个[https://en.wikipedia.org/wiki/Cognitive_model 认知模型]的参数中为特征。
 
更多的[https://en.wikipedia.org/wiki/Artificial_intelligence 人工智能]专注于[https://en.wikipedia.org/wiki/Algorithm 算法]执行的高层面(符号的)模型,以知识体现在如果-那么规则中的[https://en.wikipedia.org/wiki/Expert_system 专家系统]为特征。直到19世纪80年代末期,研究扩展到低层面(次符号的)[https://en.wikipedia.org/wiki/Machine_learning|机器学习],以知识体现在一个[https://en.wikipedia.org/wiki/Cognitive_model 认知模型]的参数中为特征。
第40行: 第40行:  
 
   −
在19世纪80年代中期,并行分布处理以[https://en.wikipedia.org/wiki/Connectionism 联结主义]的名义变得受欢迎,[https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart]和[https://en.wikipedia.org/wiki/James_McClelland_(psychologist) McClelland]描述了联结主义模拟神经过程的作用。<ref>{{cite book|url={{google books |plainurl=y |id=davmLgzusB8C}}|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|last=Rumelhart|first=D.E|first2=James|publisher=MIT Press|year=1986|isbn=978-0-262-63110-5|location=Cambridge|pages=|author2=McClelland}}</ref>
+
在19世纪80年代中期,并行分布处理以[https://en.wikipedia.org/wiki/Connectionism 联结主义]的名义变得受欢迎,[https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart]和[https://en.wikipedia.org/wiki/James_McClelland_(psychologist) McClelland]描述了联结主义模拟神经过程的作用。<ref>{{cite book|url={{google books |plainurl=y |id=davmLgzusB8C}}|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|last=Rumelhart|first=D.E|first2=James|publisher=MIT Press|year=1986|location=Cambridge|pages=|author2=McClelland}}</ref>
   −
[https://en.wikipedia.org/wiki/Support_vector_machine 支持向量机(Support vector machine)]和其他更简单的方法如[https://en.wikipedia.org/wiki/Linear_classifier 线性分类器]在机器学习中的受欢迎程度逐步超过了神经网络。然而,使用神经网络改变了一些领域,例如蛋白质结构的预测。<ref>{{cite article|id=Qian1988|title=
+
[https://en.wikipedia.org/wiki/Support_vector_machine 支持向量机(Support vector machine)]和其他更简单的方法如[https://en.wikipedia.org/wiki/Linear_classifier 线性分类器]在机器学习中的受欢迎程度逐步超过了神经网络。然而,使用神经网络改变了一些领域,例如蛋白质结构的预测。<ref>{{cite journal|id=Qian1988|title=
Predicting the secondary structure of globular proteins using neural network models. |last=Qian|first=N.|last2=Sejnowski|first2=T.J.|journal=Journal of Molecular Biology|volume=202|pages=865-884|year=1988}}</ref><ref>{{cite article|id=Rost1993|title=
+
Predicting the secondary structure of globular proteins using neural network models. |last=Qian|first=N.|last2=Sejnowski|first2=T.J.|journal=Journal of Molecular Biology|volume=202|pages=865-884|year=1988}}</ref><ref>{{cite journal|id=Rost1993|title=
 
Prediction of protein secondary structure at better than 70% accuracy |last=Rost|first=B.|last2=Sander|first2=C.|journal=Journal of Molecular Biology|volume=232|pages=584-599|year=1993}}</ref>
 
Prediction of protein secondary structure at better than 70% accuracy |last=Rost|first=B.|last2=Sander|first2=C.|journal=Journal of Molecular Biology|volume=232|pages=584-599|year=1993}}</ref>
    
1992年[https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer 最大池化]被引入帮助最小转移不变性和最大容忍性来变形,有助于3D物体识别。<ref name="Weng1992">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf Cresceptron: a self-organizing neural network which grows adaptively]," ''Proc. International Joint Conference on Neural Networks'', Baltimore, Maryland, vol I, pp. 576–581, June, 1992.</ref><ref name="Weng19932">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf Learning recognition and segmentation of 3-D objects from 2-D images]," ''Proc. 4th International Conf. Computer Vision'', Berlin, Germany, pp. 121–128, May, 1993.</ref><ref name="Weng1997">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCV.pdf Learning recognition and segmentation using the Cresceptron]," ''International Journal of Computer Vision'', vol. 25, no. 2, pp. 105–139, Nov. 1997.</ref>
 
1992年[https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer 最大池化]被引入帮助最小转移不变性和最大容忍性来变形,有助于3D物体识别。<ref name="Weng1992">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf Cresceptron: a self-organizing neural network which grows adaptively]," ''Proc. International Joint Conference on Neural Networks'', Baltimore, Maryland, vol I, pp. 576–581, June, 1992.</ref><ref name="Weng19932">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf Learning recognition and segmentation of 3-D objects from 2-D images]," ''Proc. 4th International Conf. Computer Vision'', Berlin, Germany, pp. 121–128, May, 1993.</ref><ref name="Weng1997">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCV.pdf Learning recognition and segmentation using the Cresceptron]," ''International Journal of Computer Vision'', vol. 25, no. 2, pp. 105–139, Nov. 1997.</ref>
   −
2010年,通过[https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer 最大池化]训练的反向传播训练被GPU加速,显示出超过其他池化变体的性能。<ref name="Scherer2010">Dominik Scherer, Andreas C. Müller, and Sven Behnke: "[https://www.ais.uni-bonn.de/papers/icann2010_maxpool.pdf Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition]," ''In 20th International Conference Artificial Neural Networks (ICANN)'', pp. 92–101, 2010. {{doi|10.1007/978-3-642-15825-4_10}}.</ref>
+
2010年,通过[https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer 最大池化]训练的反向传播训练被GPU加速,显示出超过其他池化变体的性能。<ref name="Scherer2010">Dominik Scherer, Andreas C. Müller, and Sven Behnke: "[https://www.ais.uni-bonn.de/papers/icann2010_maxpool.pdf Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition]," ''In 20th International Conference Artificial Neural Networks (ICANN)'', pp. 92–101, 2010.</ref>
      −
[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]影响使用反向传播的多层[前馈神经网络https://en.wikipedia.org/wiki/Feedforward_neural_network 前馈神经网络] 和[https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络](RNN)。<ref name="HOCH19912">S. Hochreiter., "[http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf Untersuchungen zu dynamischen neuronalen Netzen]," ''Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber'', 1991.</ref><ref name="HOCH2001">{{cite book|url={{google books |plainurl=y |id=NWOcMVA64aAC}}|title=A Field Guide to Dynamical Recurrent Networks|last=Hochreiter|first=S.|last2=et al.|date=15 January 2001|publisher=John Wiley & Sons|year=|isbn=978-0-7803-5369-5|location=|pages=|chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies|editor-last2=Kremer|editor-first2=Stefan C.|editor-first1=John F.|editor-last1=Kolen}}</ref> 由于梯度从一层到另一层传播,它们随着层数指数级减小,这样阻碍了依赖这些误差的的神经元权重的调整,尤其影响深度网络。
+
[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]影响使用反向传播的多层[前馈神经网络https://en.wikipedia.org/wiki/Feedforward_neural_network 前馈神经网络] 和[https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络](RNN)。<ref name="HOCH19912">S. Hochreiter., "[http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf Untersuchungen zu dynamischen neuronalen Netzen]," ''Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber'', 1991.</ref><ref name="HOCH2001">{{cite book|url={{google books |plainurl=y |id=NWOcMVA64aAC}}|title=A Field Guide to Dynamical Recurrent Networks|last=Hochreiter|first=S.|last2=et al.|date=15 January 2001|publisher=John Wiley & Sons|year=|location=|pages=|chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies|editor-last2=Kremer|editor-first2=Stefan C.|editor-first1=John F.|editor-last1=Kolen}}</ref> 由于梯度从一层到另一层传播,它们随着层数指数级减小,这样阻碍了依赖这些误差的的神经元权重的调整,尤其影响深度网络。
    
为了解决这个问题,[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Schmidhuber]采用了一种多层网络结构,通过[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习]每次预训练一级然后使用反向传播很好地调整<ref name="SCHMID1992">J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," ''Neural Computation'', 4, pp. 234–242, 1992.</ref>。例如,Behnke 在图像重建和人脸定位中只依赖梯度符号。<ref>{{cite book|url=http://www.ais.uni-bonn.de/books/LNCS2766.pdf|title=Hierarchical Neural Networks for Image Interpretation.|publisher=Springer|year=2003|series=Lecture Notes in Computer Science|volume=2766|author=Sven Behnke}}</ref>
 
为了解决这个问题,[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Schmidhuber]采用了一种多层网络结构,通过[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习]每次预训练一级然后使用反向传播很好地调整<ref name="SCHMID1992">J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," ''Neural Computation'', 4, pp. 234–242, 1992.</ref>。例如,Behnke 在图像重建和人脸定位中只依赖梯度符号。<ref>{{cite book|url=http://www.ais.uni-bonn.de/books/LNCS2766.pdf|title=Hierarchical Neural Networks for Image Interpretation.|publisher=Springer|year=2003|series=Lecture Notes in Computer Science|volume=2766|author=Sven Behnke}}</ref>
   −
[https://en.wikipedia.org/wiki/Geoffrey_Hinton Hinton]提出了使用连续层的二进制或潜变量实数[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机]<ref name="smolensky1986">{{cite book|url=http://portal.acm.org/citation.cfm?id=104290|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|year=1986|editors=D. E. Rumelhart, J. L. McClelland, & the PDP Research Group|volume=1|pages=194–281|chapter=Information processing in dynamical systems: Foundations of harmony theory.|last1=Smolensky|first1=P.|authorlink1=Paul Smolensky}}</ref>来模拟每一层学习一种高级别表征。一旦很多层被充分学习,这种深度结构可能像[https://en.wikipedia.org/wiki/Generative_model 生成模型]一样被使用,通过在下采样(一个古老的方法)模型时从顶层特征激活处复制数据。<ref name="hinton2006">{{cite journal|last2=Osindero|first2=S.|last3=Teh|first3=Y.|year=2006|title=A fast learning algorithm for deep belief nets|url=http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf|journal=[[Neural Computation (journal)|Neural Computation]]|volume=18|issue=7|pages=1527–1554|doi=10.1162/neco.2006.18.7.1527|pmid=16764513|last1=Hinton|first1=G. E.|authorlink1=Geoffrey Hinton}}</ref><ref>{{Cite journal|year=2009|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|volume=4|issue=5|pages=5947|doi=10.4249/scholarpedia.5947|pmc=|pmid=|last1=Hinton|first1=G.|bibcode=2009SchpJ...4.5947H}}</ref> 2012年[https://en.wikipedia.org/wiki/Andrew_Ng Ng] 和[https://en.wikipedia.org/wiki/Jeff_Dean_(computer_scientist) Dean]创造了一个只通过看[https://en.wikipedia.org/wiki/YouTube YouTube]视频中未标记的图像学习识别例如猫这样更高层概念的网络。<ref name="ng2012">{{cite arXiv|eprint=1112.6209|first2=Jeff|last2=Dean|title=Building High-level Features Using Large Scale Unsupervised Learning|last1=Ng|first1=Andrew|year=2012|class=cs.LG}}</ref>  
+
[https://en.wikipedia.org/wiki/Geoffrey_Hinton Hinton]提出了使用连续层的二进制或潜变量实数[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机]<ref name="smolensky1986">{{cite book|url=http://portal.acm.org/citation.cfm?id=104290|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|year=1986|editors=D. E. Rumelhart, J. L. McClelland, & the PDP Research Group|volume=1|pages=194–281|chapter=Information processing in dynamical systems: Foundations of harmony theory.|last1=Smolensky|first1=P.|authorlink1=Paul Smolensky}}</ref>来模拟每一层学习一种高级别表征。一旦很多层被充分学习,这种深度结构可能像[https://en.wikipedia.org/wiki/Generative_model 生成模型]一样被使用,通过在下采样(一个古老的方法)模型时从顶层特征激活处复制数据。<ref name="hinton2006">{{cite journal|last2=Osindero|first2=S.|last3=Teh|first3=Y.|year=2006|title=A fast learning algorithm for deep belief nets|url=http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf|journal=[https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation]|volume=18|issue=7|pages=1527–1554|last1=Hinton|first1=G. E.|authorlink1=Geoffrey Hinton}}</ref><ref>{{Cite journal|year=2009|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|volume=4|issue=5|pages=5947|last1=Hinton|first1=G.}}</ref> 2012年[https://en.wikipedia.org/wiki/Andrew_Ng Ng] 和[https://en.wikipedia.org/wiki/Jeff_Dean_(computer_scientist) Dean]创造了一个只通过看[https://en.wikipedia.org/wiki/YouTube YouTube]视频中未标记的图像学习识别例如猫这样更高层概念的网络。<ref name="ng2012">{{cite journal|url=https://arxiv.org/abs/1112.6209|first2=Jeff|last2=Dean|title=Building High-level Features Using Large Scale Unsupervised Learning|last1=Ng|first1=Andrew|year=2012|class=cs.LG}}</ref>  
    
在训练深度神经网络中早期的挑战被成功地用无监督预训练等方法处理,与此同时可见的计算性能通过GPU和分布计算的使用提升。神经网络被部署在大规模,尤其是在图像和视觉识别问题上。这被称为“[https://en.wikipedia.org/wiki/Deep_learning 深度学习]”
 
在训练深度神经网络中早期的挑战被成功地用无监督预训练等方法处理,与此同时可见的计算性能通过GPU和分布计算的使用提升。神经网络被部署在大规模,尤其是在图像和视觉识别问题上。这被称为“[https://en.wikipedia.org/wiki/Deep_learning 深度学习]”
    
===基于硬件的设计(Hardware-based designs)===
 
===基于硬件的设计(Hardware-based designs)===
用于生物学模拟和[https://en.wikipedia.org/wiki/Neuromorphic_computing 神经形态计算]的计算设备<ref>{{cite journal | last1 = Yang | first1 = J. J. | last2 = Pickett | first2 = M. D. | last3 = Li | first3 = X. M. | last4 = Ohlberg | first4 = D. A. A. | last5 = Stewart | first5 = D. R. | last6 = Williams | first6 = R. S. | year = 2008 | title =  Memristive switching mechanism for metal/oxide/metal nanodevices| url = | journal = Nat. Nanotechnol. | volume = 3 | issue = 7| pages = 429–433 | doi = 10.1038/nnano.2008.160 }}</ref>在[https://en.wikipedia.org/wiki/CMOS CMOS]创建。用于很大规模[https://en.wikipedia.org/wiki/Principal_component 主成分]分析和[https://en.wikipedia.org/wiki/Convolution 卷积]的纳米元件可能创造一类新的神经计算,因为它们根本上是[https://en.wikipedia.org/wiki/Analog_signal 模拟的]而不是[https://en.wikipedia.org/wiki/Digital_data 数字的](尽管第一个实现使用数字设备)<ref>{{cite journal | last1 = Strukov | first1 = D. B. | last2 = Snider | first2 = G. S. | last3 = Stewart | first3 = D. R. | last4 = Williams | first4 = R. S. | year = 2008 | title =  The missing memristor found| url = | journal = Nature | volume = 453 | issue = 7191| pages = 80–83 | doi=10.1038/nature06932 | pmid=18451858| bibcode = 2008Natur.453...80S }}</ref>。在Schmidhuber 组的.Ciresan 和 colleagues<ref name=":3">{{Cite journal|last=Cireşan|first=Dan Claudiu|last2=Meier|first2=Ueli|last3=Gambardella|first3=Luca Maria|last4=Schmidhuber|first4=Jürgen|date=2010-09-21|title=Deep, Big, Simple Neural Nets for Handwritten Digit Recognition|url=http://www.mitpressjournals.org/doi/10.1162/NECO_a_00052|journal=Neural Computation|volume=22|issue=12|pages=3207–3220|doi=10.1162/neco_a_00052|issn=0899-7667}}</ref>表明,尽管有梯度消失问题,GPU使[https://en.wikipedia.org/wiki/Backpropagation 反向传播]对多层前馈神经网络更可行。
+
用于生物学模拟和[https://en.wikipedia.org/wiki/Neuromorphic_computing 神经形态计算]的计算设备<ref>{{cite journal | last1 = Yang | first1 = J. J. | last2 = Pickett | first2 = M. D. | last3 = Li | first3 = X. M. | last4 = Ohlberg | first4 = D. A. A. | last5 = Stewart | first5 = D. R. | last6 = Williams | first6 = R. S. | year = 2008 | title =  Memristive switching mechanism for metal/oxide/metal nanodevices| url = | journal = Nat. Nanotechnol. | volume = 3 | issue = 7| pages = 429–433 }}</ref>在[https://en.wikipedia.org/wiki/CMOS CMOS]创建。用于很大规模[https://en.wikipedia.org/wiki/Principal_component 主成分]分析和[https://en.wikipedia.org/wiki/Convolution 卷积]的纳米元件可能创造一类新的神经计算,因为它们根本上是[https://en.wikipedia.org/wiki/Analog_signal 模拟的]而不是[https://en.wikipedia.org/wiki/Digital_data 数字的](尽管第一个实现使用数字设备)<ref>{{cite journal | last1 = Strukov | first1 = D. B. | last2 = Snider | first2 = G. S. | last3 = Stewart | first3 = D. R. | last4 = Williams | first4 = R. S. | year = 2008 | title =  The missing memristor found| url = | journal = Nature | volume = 453 | issue = 7191| pages = 80–83}}</ref>。在Schmidhuber 组的.Ciresan 和 colleagues<ref name=":3">{{Cite journal|last=Cireşan|first=Dan Claudiu|last2=Meier|first2=Ueli|last3=Gambardella|first3=Luca Maria|last4=Schmidhuber|first4=Jürgen|date=2010-09-21|title=Deep, Big, Simple Neural Nets for Handwritten Digit Recognition|url=http://www.mitpressjournals.org/doi/10.1162/NECO_a_00052|journal=Neural Computation|volume=22|issue=12|pages=3207–3220}}</ref>表明,尽管有梯度消失问题,GPU使[https://en.wikipedia.org/wiki/Backpropagation 反向传播]对多层前馈神经网络更可行。
    
=== 竞赛 ===
 
=== 竞赛 ===
在2009~2012年间,[https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络]和[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Schmidhuber]的研究组发展的深度前馈神经网络赢得了八个在[https://en.wikipedia.org/wiki/Pattern_recognition 模式识别]和[https://en.wikipedia.org/wiki/Machine_learning|机器学习]<ref>[http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions 2012 Kurzweil AI Interview] with [[Jürgen Schmidhuber]] on the eight competitions won by his Deep Learning team 2009–2012</ref><ref>{{Cite web|url=http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions|title=How bio-inspired deep learning keeps winning competitions {{!}} KurzweilAI|last=|first=|date=|website=www.kurzweilai.net|language=en-US|archive-url=|archive-date=|dead-url=|access-date=2017-06-16}}</ref>的国际竞赛。例如,[https://en.wikipedia.org/wiki/Alex_Graves_(computer_scientist) Graves]的双向和多维[https://en.wikipedia.org/wiki/Long_short-term_memory 长短期记忆](LSTM)<ref>Graves, Alex; and Schmidhuber, Jürgen; ''[http://www.idsia.ch/~juergen/nips2009.pdf Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks]'', in Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris K. I.; and Culotta, Aron (eds.), ''Advances in Neural Information Processing Systems 22 (NIPS'22), 7–10 December 2009, Vancouver, BC'', Neural Information Processing Systems (NIPS) Foundation, 2009, pp. 545–552.
+
在2009~2012年间,[https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络]和[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Schmidhuber]的研究组发展的深度前馈神经网络赢得了八个在[https://en.wikipedia.org/wiki/Pattern_recognition 模式识别]和[https://en.wikipedia.org/wiki/Machine_learning|机器学习]<ref>[http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions 2012 Kurzweil AI Interview] with [https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Jürgen Schmidhuber] on the eight competitions won by his Deep Learning team 2009–2012</ref><ref>{{Cite journal|url=http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions|title=How bio-inspired deep learning keeps winning competitions, KurzweilAI|last=|first=|date=|journal=www.kurzweilai.net|language=en-US|archive-url=|archive-date=|dead-url=|access-date=2017-06-16}}</ref>的国际竞赛。例如,[https://en.wikipedia.org/wiki/Alex_Graves_(computer_scientist) Graves]的双向和多维[https://en.wikipedia.org/wiki/Long_short-term_memory 长短期记忆](LSTM)<ref>Graves, Alex; and Schmidhuber, Jürgen; ''[http://www.idsia.ch/~juergen/nips2009.pdf Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks]'', in Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris K. I.; and Culotta, Aron (eds.), ''Advances in Neural Information Processing Systems 22 (NIPS'22), 7–10 December 2009, Vancouver, BC'', Neural Information Processing Systems (NIPS) Foundation, 2009, pp. 545–552.
</ref><ref name="graves 855" /><ref name="graves20093">{{Cite journal|last2=Schmidhuber|first2=Jürgen|date=2009|editor-last=Bengio|editor-first=Yoshua|title=Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks|url=https://papers.nips.cc/paper/3449-offline-handwriting-recognition-with-multidimensional-recurrent-neural-networks|journal=Neural Information Processing Systems (NIPS) Foundation|volume=|pages=545–552|via=|editor-last2=Schuurmans|editor-first2=Dale|editor-last3=Lafferty|editor-first3=John|editor-last4=Williams|editor-first4=Chris editor-K. I.|editor-last5=Culotta|editor-first5=Aron|last1=Graves|first1=Alex}}</ref><ref>{{Cite journal|last=Graves|first=A.|last2=Liwicki|first2=M.|last3=Fernández|first3=S.|last4=Bertolami|first4=R.|last5=Bunke|first5=H.|last6=Schmidhuber|first6=J.|date=May 2009|title=A Novel Connectionist System for Unconstrained Handwriting Recognition|url=http://ieeexplore.ieee.org/document/4531750/|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=31|issue=5|pages=855–868|doi=10.1109/tpami.2008.137|issn=0162-8828}}</ref>在2009[https://en.wikipedia.org/wiki/International_Conference_on_Document_Analysis_and_Recognition 文件分析和识别国际会议]上的连笔手写识别中赢得了三个比赛,而没有任何关于要学习的那三种语言的先验知识。<ref name="graves20093"/><ref name="graves 855">{{cite journal|last2=Liwicki|first2=M.|last3=Fernandez|first3=S.|last4=Bertolami|first4=R.|last5=Bunke|first5=H.|last6=Schmidhuber|first6=J.|year=2009|title=A Novel Connectionist System for Improved Unconstrained Handwriting Recognition|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=31|issue=5|pages=855–868|doi=10.1109/tpami.2008.137|last1=Graves|first1=A.| url = http://www.idsia.ch/~juergen/tpami_2008.pdf | format = PDF}}</ref>
+
</ref><ref name="graves 855" /><ref name="graves20093">{{Cite journal|last2=Schmidhuber|first2=Jürgen|date=2009|editor-last=Bengio|editor-first=Yoshua|title=Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks|url=https://papers.nips.cc/paper/3449-offline-handwriting-recognition-with-multidimensional-recurrent-neural-networks|journal=Neural Information Processing Systems (NIPS) Foundation|volume=|pages=545–552|via=|editor-last2=Schuurmans|editor-first2=Dale|editor-last3=Lafferty|editor-first3=John|editor-last4=Williams|editor-first4=Chris editor-K. I.|editor-last5=Culotta|editor-first5=Aron|last1=Graves|first1=Alex}}</ref><ref>{{Cite journal|last=Graves|first=A.|last2=Liwicki|first2=M.|last3=Fernández|first3=S.|last4=Bertolami|first4=R.|last5=Bunke|first5=H.|last6=Schmidhuber|first6=J.|date=May 2009|title=A Novel Connectionist System for Unconstrained Handwriting Recognition|url=http://ieeexplore.ieee.org/document/4531750/|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=31|issue=5|pages=855–868}}</ref>在2009[https://en.wikipedia.org/wiki/International_Conference_on_Document_Analysis_and_Recognition 文件分析和识别国际会议]上的连笔手写识别中赢得了三个比赛,而没有任何关于要学习的那三种语言的先验知识。<ref name="graves20093"/><ref name="graves 855">{{cite journal|last2=Liwicki|first2=M.|last3=Fernandez|first3=S.|last4=Bertolami|first4=R.|last5=Bunke|first5=H.|last6=Schmidhuber|first6=J.|year=2009|title=A Novel Connectionist System for Improved Unconstrained Handwriting Recognition|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=31|issue=5|pages=855–868|last1=Graves|first1=A.| url = http://www.idsia.ch/~juergen/tpami_2008.pdf | format = PDF}}</ref>
   −
Ciresan 和同事赢得了[https://en.wikipedia.org/wiki/Pattern_recognition 模式识别]比赛,包括IJCNN2011交通信号识别比赛<ref name=":72">{{Cite journal|last=Cireşan|first=Dan|last2=Meier|first2=Ueli|last3=Masci|first3=Jonathan|last4=Schmidhuber|first4=Jürgen|date=August 2012|title=Multi-column deep neural network for traffic sign classification|url=http://www.sciencedirect.com/science/article/pii/S0893608012000524|journal=Neural Networks|series=Selected Papers from IJCNN 2011|volume=32|pages=333–338|doi=10.1016/j.neunet.2012.02.023}}</ref>,ISBI2012叠加电子显微镜中的神经结构分割挑战<ref name=":8"/>和其他比赛。他们的神经网络是第一个在基准测试数据集中达到能挑战甚至超过人类表现<ref name=":92">{{Cite journal|last=Ciresan|first=Dan|last2=Meier|first2=U.|last3=Schmidhuber|first3=J.|date=June 2012|title=Multi-column deep neural networks for image classification|url=http://ieeexplore.ieee.org/document/6248110/|journal=2012 IEEE Conference on Computer Vision and Pattern Recognition|volume=|pages=3642–3649|doi=10.1109/cvpr.2012.6248110|via=|isbn=978-1-4673-1228-8|arxiv=1202.2745}}</ref>的模式识别模型。这些基准数据集例如交通信号识别(ijcnn2012)或者[https://en.wikipedia.org/wiki/MNIST_database MNIST手写数字问题]
+
Ciresan 和同事赢得了[https://en.wikipedia.org/wiki/Pattern_recognition 模式识别]比赛,包括IJCNN2011交通信号识别比赛<ref name=":72">{{Cite journal|last=Cireşan|first=Dan|last2=Meier|first2=Ueli|last3=Masci|first3=Jonathan|last4=Schmidhuber|first4=Jürgen|date=August 2012|title=Multi-column deep neural network for traffic sign classification|url=http://www.sciencedirect.com/science/article/pii/S0893608012000524|journal=Neural Networks|series=Selected Papers from IJCNN 2011|volume=32|pages=333–338}}</ref>,ISBI2012叠加电子显微镜中的神经结构分割挑战<ref name=":8"/>和其他比赛。他们的神经网络是第一个在基准测试数据集中达到能挑战甚至超过人类表现<ref name=":92">{{Cite journal|last=Ciresan|first=Dan|last2=Meier|first2=U.|last3=Schmidhuber|first3=J.|date=June 2012|title=Multi-column deep neural networks for image classification|url=http://ieeexplore.ieee.org/document/6248110/|journal=2012 IEEE Conference on Computer Vision and Pattern Recognition|volume=|pages=3642–3649|via=|url=https://arxiv.org/abs/1202.2745}}</ref>的模式识别模型。这些基准数据集例如交通信号识别(ijcnn2012)或者[https://en.wikipedia.org/wiki/MNIST_database MNIST手写数字问题]
 
   
 
   
 
研究人员演示了深度神经网络接口下的[https://en.wikipedia.org/wiki/Hidden_Markov_model 隐式马尔科夫模型],它依赖上下文定义神经网络输出层的状态,可以降低在大量词汇语音识别——例如语音搜索——中的误差。【?】
 
研究人员演示了深度神经网络接口下的[https://en.wikipedia.org/wiki/Hidden_Markov_model 隐式马尔科夫模型],它依赖上下文定义神经网络输出层的状态,可以降低在大量词汇语音识别——例如语音搜索——中的误差。【?】
   −
这种方法基于GPU的实现<ref name=":6">{{Cite journal|last=Ciresan|first=D. C.|last2=Meier|first2=U.|last3=Masci|first3=J.|last4=Gambardella|first4=L. M.|last5=Schmidhuber|first5=J.|date=2011|editor-last=|title=Flexible, High Performance Convolutional Neural Networks for Image Classification|url=http://ijcai.org/papers11/Papers/IJCAI11-210.pdf|journal=International Joint Conference on Artificial Intelligence|volume=|pages=|doi=10.5591/978-1-57735-516-8/ijcai11-210|via=}}</ref>赢得了很多模式识别竞赛,包括IJCNN2011交通信号识别比赛<ref name=":72"/>,ISBI2012叠加电子显微镜中的神经结构分割挑战<ref name=":8">{{Cite book|url=http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf|title=Advances in Neural Information Processing Systems 25|last=Ciresan|first=Dan|last2=Giusti|first2=Alessandro|last3=Gambardella|first3=Luca M.|last4=Schmidhuber|first4=Juergen|date=2012|publisher=Curran Associates, Inc.|editor-last=Pereira|editor-first=F.|pages=2843–2851|editor-last2=Burges|editor-first2=C. J. C.|editor-last3=Bottou|editor-first3=L.|editor-last4=Weinberger|editor-first4=K. Q.}}</ref>和[https://en.wikipedia.org/wiki/ImageNet_Competition ImageNet竞赛]<ref name="krizhevsky2012">{{cite journal|last2=Sutskever|first2=Ilya|last3=Hinton|first3=Geoffry|date=2012|title=ImageNet Classification with Deep Convolutional Neural Networks|url=https://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf|journal=NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada|last1=Krizhevsky|first1=Alex}}</ref> 以及其他比赛。
+
这种方法基于GPU的实现<ref name=":6">{{Cite journal|last=Ciresan|first=D. C.|last2=Meier|first2=U.|last3=Masci|first3=J.|last4=Gambardella|first4=L. M.|last5=Schmidhuber|first5=J.|date=2011|editor-last=|title=Flexible, High Performance Convolutional Neural Networks for Image Classification|url=http://ijcai.org/papers11/Papers/IJCAI11-210.pdf|journal=International Joint Conference on Artificial Intelligence|volume=|pages=|via=}}</ref>赢得了很多模式识别竞赛,包括IJCNN2011交通信号识别比赛<ref name=":72"/>,ISBI2012叠加电子显微镜中的神经结构分割挑战<ref name=":8">{{Cite book|url=http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf|title=Advances in Neural Information Processing Systems 25|last=Ciresan|first=Dan|last2=Giusti|first2=Alessandro|last3=Gambardella|first3=Luca M.|last4=Schmidhuber|first4=Juergen|date=2012|publisher=Curran Associates, Inc.|editor-last=Pereira|editor-first=F.|pages=2843–2851|editor-last2=Burges|editor-first2=C. J. C.|editor-last3=Bottou|editor-first3=L.|editor-last4=Weinberger|editor-first4=K. Q.}}</ref>和[https://en.wikipedia.org/wiki/ImageNet_Competition ImageNet竞赛]<ref name="krizhevsky2012">{{cite journal|last2=Sutskever|first2=Ilya|last3=Hinton|first3=Geoffry|date=2012|title=ImageNet Classification with Deep Convolutional Neural Networks|url=https://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf|journal=NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada|last1=Krizhevsky|first1=Alex}}</ref> 以及其他比赛。
   −
被[https://en.wikipedia.org/wiki/Simple_cell 简单]和[https://en.wikipedia.org/wiki/Complex_cell 复杂细胞]启发的,与[https://en.wikipedia.org/wiki/Neocognitron 新认知机]<ref name="K. Fukushima. Neocognitron 1980">{{cite journal|year=1980|title=Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position|journal=Biological Cybernetics|volume=36|issue=4|pages=93–202|doi=10.1007/BF00344251|pmid=7370364|author=Fukushima, K.}}</ref> 相似的深度的高度非线性神经结构和“标准视觉结构”<ref>{{cite journal|last2=Poggio|first2=T|year=1999|title=Hierarchical models of object recognition in cortex|journal=Nature Neuroscience|volume=2|issue=11|pages=1019–1025|doi=10.1038/14819|last1=Riesenhuber|first1=M}}</ref>,被Hinton提出的无监督方法预训练<ref name=":1">{{Cite journal|last=Hinton|first=Geoffrey|date=2009-05-31|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|language=en|volume=4|issue=5|pages=5947|doi=10.4249/scholarpedia.5947|issn=1941-6016|bibcode=2009SchpJ...4.5947H}}</ref><ref name="hinton2006" />。他实验室的一个团队赢得了一个2012年的竞赛,这个竞赛由[https://en.wikipedia.org/wiki/Merck_%26_Co. Merck]资助来设计可以帮助找到能识别新药物分子的软件。<ref>{{cite news|url=https://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html|title=Scientists See Promise in Deep-Learning Programs|last=Markoff|first=John|date=November 23, 2012|author=|newspaper=New York Times}}</ref>
+
被[https://en.wikipedia.org/wiki/Simple_cell 简单]和[https://en.wikipedia.org/wiki/Complex_cell 复杂细胞]启发的,与[https://en.wikipedia.org/wiki/Neocognitron 新认知机]<ref name="K. Fukushima. Neocognitron 1980">{{cite journal|year=1980|title=Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position|journal=Biological Cybernetics|volume=36|issue=4|pages=93–202|author=Fukushima, K.}}</ref> 相似的深度的高度非线性神经结构和“标准视觉结构”<ref>{{cite journal|last2=Poggio|first2=T|year=1999|title=Hierarchical models of object recognition in cortex|journal=Nature Neuroscience|volume=2|issue=11|pages=1019–1025|last1=Riesenhuber|first1=M}}</ref>,被Hinton提出的无监督方法预训练<ref name=":1">{{Cite journal|last=Hinton|first=Geoffrey|date=2009-05-31|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|language=en|volume=4|issue=5|pages=5947}}</ref><ref name="hinton2006" />。他实验室的一个团队赢得了一个2012年的竞赛,这个竞赛由[https://en.wikipedia.org/wiki/Merck_%26_Co. Merck]资助来设计可以帮助找到能识别新药物分子的软件。<ref>{{cite news|url=https://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html|title=Scientists See Promise in Deep-Learning Programs|last=Markoff|first=John|date=November 23, 2012|author=|newspaper=New York Times}}</ref>
    
=== 卷积网络(Convolutional networks) ===
 
=== 卷积网络(Convolutional networks) ===
自2011起,深度学习前馈网络的艺术状态在卷积层和最大池化层<ref name=":6" /><ref name="martines2013">{{cite journal|last2=Bengio|first2=Y.|last3=Yannakakis|first3=G. N.|year=2013|title=Learning Deep Physiological Models of Affect|url=|journal=IEEE Computational Intelligence|volume=8|issue=2|pages=20–33|doi=10.1109/mci.2013.2247823|last1=Martines|first1=H.}}</ref>之间切换,位于几层全连接或稀疏连接层和一层最终分类层之上。学习通常不需要非监督预学习。
+
自2011起,深度学习前馈网络的艺术状态在卷积层和最大池化层<ref name=":6" /><ref name="martines2013">{{cite journal|last2=Bengio|first2=Y.|last3=Yannakakis|first3=G. N.|year=2013|title=Learning Deep Physiological Models of Affect|url=|journal=IEEE Computational Intelligence|volume=8|issue=2|pages=20–33|last1=Martines|first1=H.}}</ref>之间切换,位于几层全连接或稀疏连接层和一层最终分类层之上。学习通常不需要非监督预学习。
    
这种监督深度学习方法第一次达到在某些任务中能挑战人类表现的水平。<ref name=":92"/>
 
这种监督深度学习方法第一次达到在某些任务中能挑战人类表现的水平。<ref name=":92"/>