更改
→历史
== 历史 ==
== 历史 ==
[https://en.wikipedia.org/wiki/Warren_McCulloch Warren McCulloch] 和 [https://en.wikipedia.org/wiki/Walter_Pitts Walter Pitts]<ref>{{cite journal|last=McCulloch|first=Warren|author2=Walter Pitts|title=A Logical Calculus of Ideas Immanent in Nervous Activity|journal=Bulletin of Mathematical Biophysics|year=1943|volume=5|pages=115–133|issue=4}}</ref> 构造了一个关于基于[https://en.wikipedia.org/wiki/Mathematics 数学]和[https://en.wikipedia.org/wiki/Algorithm 算法]的神经网络计算模型,称为阈值逻辑。这个模型为神经网络研究铺平了两条道路。一个关注大脑中的生物学过程,而另一个关注神经网络向[https://en.wikipedia.org/wiki/Artificial_intelligence 人工智能]的应用。这个工作引领了神经网络的工作以及他们与[https://en.wikipedia.org/wiki/Finite_state_machine 有限状态机(Finite state machine)]的联系<ref>{{Cite news|url=https://www.degruyter.com/view/books/9781400882618/9781400882618-002/9781400882618-002.xml|title=Representation of Events in Nerve Nets and Finite Automata|last=Kleene|first=S.C.|date=|work=Annals of Mathematics Studies|access-date=2017-06-17|archive-url=|archive-date=|dead-url=|publisher=Princeton University Press|year=1956|issue=34|pages=3–41|language=en}}</ref>。
[https://en.wikipedia.org/wiki/Warren_McCulloch Warren McCulloch] 和 [https://en.wikipedia.org/wiki/Walter_Pitts Walter Pitts]<ref>{{cite journal|last=McCulloch|first=Warren|author2=Walter Pitts|title=A Logical Calculus of Ideas Immanent in Nervous Activity|journal=Bulletin of Mathematical Biophysics|year=1943|volume=5|pages=115–133|issue=4}}</ref> 构造了一个关于基于[https://en.wikipedia.org/wiki/Mathematics 数学]和[https://en.wikipedia.org/wiki/Algorithm 算法]的神经网络计算模型,称为阈值逻辑。这个模型为神经网络研究铺平了两条道路。一个关注大脑中的生物学过程,而另一个关注神经网络向[https://en.wikipedia.org/wiki/Artificial_intelligence 人工智能]的应用。这个工作引领了神经网络的工作以及他们与[https://en.wikipedia.org/wiki/Finite_state_machine 有限状态机(Finite state machine)]的联系<ref>{{Cite jour|url=https://www.degruyter.com/view/books/9781400882618/9781400882618-002/9781400882618-002.xml|title=Representation of Events in Nerve Nets and Finite Automata|last=Kleene|first=S.C.|date=|work=Annals of Mathematics Studies|access-date=2017-06-17|archive-url=|archive-date=|dead-url=|publisher=Princeton University Press|year=1956|issue=34|pages=3–41|language=en}}</ref>。
=== 赫布学习(Hebbian learning)===
=== 赫布学习(Hebbian learning)===
在19世纪40年代晚期,[https://en.wikipedia.org/wiki/Donald_O._Hebb D.O.Hebb]<ref>{{cite book|url={{google books |plainurl=y |id=ddB4AgAAQBAJ}}|title=The Organization of Behavior|last=Hebb|first=Donald|publisher=Wiley|year=1949|location=New York|pages=}}</ref> 基于[https://en.wikipedia.org/wiki/Neuroplasticity 神经可塑性]的机制构造了一个学习假设,被称为[https://en.wikipedia.org/wiki/Hebbian_learning 赫布学习]。赫布学习是[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习(unsupervised learning)]。这形成了[https://en.wikipedia.org/wiki/Long_term_potentiation 长程增强效应]模型。在1948年,研究者开始将这些想法和[https://en.wikipedia.org/wiki/Unorganized_machine B类图灵机]应用到计算模型上。
在19世纪40年代晚期,[https://en.wikipedia.org/wiki/Donald_O._Hebb D.O.Hebb]<ref>{{cite book|url=https://books.google.com/books?id=ddB4AgAAQBAJ|title=The Organization of Behavior|last=Hebb|first=Donald|publisher=Wiley|year=1949|location=New York|pages=}}</ref> 基于[https://en.wikipedia.org/wiki/Neuroplasticity 神经可塑性]的机制构造了一个学习假设,被称为[https://en.wikipedia.org/wiki/Hebbian_learning 赫布学习]。赫布学习是[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习(unsupervised learning)]。这形成了[https://en.wikipedia.org/wiki/Long_term_potentiation 长程增强效应]模型。在1948年,研究者开始将这些想法和[https://en.wikipedia.org/wiki/Unorganized_machine B类图灵机]应用到计算模型上。
Farley 和[https://en.wikipedia.org/wiki/Wesley_A._Clark Clark]<ref>{{cite journal|last=Farley|first=B.G.|author2=W.A. Clark|title=Simulation of Self-Organizing Systems by Digital Computer|journal=IRE Transactions on Information Theory|year=1954|volume=4|pages=76–84|issue=4}}</ref> 首先使用计算机器,后来称作“计算器”,来模拟赫布网络。其他神经网络计算机器被[https://en.wikipedia.org/wiki/Nathaniel_Rochester_(computer_scientist) Rochester]Holland, Habit 和 Duda创造<ref>{{cite journal|last=Rochester|first=N. |author2=J.H. Holland |author3=L.H. Habit |author4=W.L. Duda|title=Tests on a cell assembly theory of the action of the brain, using a large digital computer|journal=IRE Transactions on Information Theory|year=1956|volume=2|pages=80–93|issue=3}}</ref>.
Farley 和[https://en.wikipedia.org/wiki/Wesley_A._Clark Clark]<ref>{{cite journal|last=Farley|first=B.G.|author2=W.A. Clark|title=Simulation of Self-Organizing Systems by Digital Computer|journal=IRE Transactions on Information Theory|year=1954|volume=4|pages=76–84|issue=4}}</ref> 首先使用计算机器,后来称作“计算器”,来模拟赫布网络。其他神经网络计算机器被[https://en.wikipedia.org/wiki/Nathaniel_Rochester_(computer_scientist) Rochester]Holland, Habit 和 Duda创造<ref>{{cite journal|last=Rochester|first=N. |author2=J.H. Holland |author3=L.H. Habit |author4=W.L. Duda|title=Tests on a cell assembly theory of the action of the brain, using a large digital computer|journal=IRE Transactions on Information Theory|year=1956|volume=2|pages=80–93|issue=3}}</ref>.
[https://en.wikipedia.org/wiki/Frank_Rosenblatt Rosenblatt]<ref>{{cite journal|last=Rosenblatt|first=F.|title=The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain|journal=Psychological Review|year=1958|volume=65|pages=386–408|issue=6|citeseerx=10.1.1.588.3775}}</ref> 创造了[https://en.wikipedia.org/wiki/Perceptron 感知机],这是一种模式识别算法。Rosenblatt 使用数学记号描述了无法用基本感知机识别的逻辑电路,如那时无法被神经网络处理的异或电路<ref name="Werbos 1975">{{cite book|url={{google books |plainurl=y |id=z81XmgEACAAJ}}|title=Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences|last=Werbos|first=P.J.|publisher=|year=1975|location=|pages=}}</ref>
[https://en.wikipedia.org/wiki/Frank_Rosenblatt Rosenblatt]<ref>{{cite journal|last=Rosenblatt|first=F.|title=The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain|journal=Psychological Review|year=1958|volume=65|pages=386–408|issue=6|citeseerx=10.1.1.588.3775}}</ref> 创造了[https://en.wikipedia.org/wiki/Perceptron 感知机],这是一种模式识别算法。Rosenblatt 使用数学记号描述了无法用基本感知机识别的逻辑电路,如那时无法被神经网络处理的异或电路<ref name="Werbos 1975">{{cite book|url=https://books.google.com/books?id=z81XmgEACAAJ|title=Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences|last=Werbos|first=P.J.|publisher=|year=1975|location=|pages=}}</ref>
。
。
1959年,[https://en.wikipedia.org/wiki/Nobel_laureate Nobel laureate][https://en.wikipedia.org/wiki/David_H._Hubel Hubel]和[https://en.wikipedia.org/wiki/Torsten_Wiesel Wiesel]在初级视皮层发现了两种类型的细胞:简单细胞(simple cell)和复杂细胞(complex cell)<ref>{{cite book|url=https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106|title=Brain and visual perception: the story of a 25-year collaboration|publisher=Oxford University Press US|year=2005|page=106|author=David H. Hubel and Torsten N. Wiesel}}</ref>,并基于他们的发现提出了一个生物学模型,
1959年,[https://en.wikipedia.org/wiki/Nobel_laureate Nobel laureate][https://en.wikipedia.org/wiki/David_H._Hubel Hubel]和[https://en.wikipedia.org/wiki/Torsten_Wiesel Wiesel]在初级视皮层发现了两种类型的细胞:简单细胞(simple cell)和复杂细胞(complex cell)<ref>{{cite book|url=https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106|title=Brain and visual perception: the story of a 25-year collaboration|publisher=Oxford University Press US|year=2005|page=106|author=David H. Hubel and Torsten N. Wiesel}}</ref>,并基于他们的发现提出了一个生物学模型,
第一个有多层的功能网络由[https://en.wikipedia.org/wiki/Alexey_Grigorevich_Ivakhnenko Ivakhnenko]和Lapa在1965年发表,它成为了[https://en.wikipedia.org/wiki/Group_method_of_data_handling 数据处理的组方法]<ref name="SCHIDHUB2">{{cite journal|last=Schmidhuber|first=J.|year=2015|title=Deep Learning in Neural Networks: An Overview|journal=Neural Networks|volume=61|pages=85–117|url=https://arxiv.org/abs/1404.7828}}</ref><ref name="ivak1965">{{cite book|url={{google books |plainurl=y |id=FhwVNQAACAAJ}}|title=Cybernetic Predicting Devices|last=Ivakhnenko|first=A. G.|publisher=CCM Information Corporation|year=1973}}</ref><ref name="ivak1967">{{cite book|url={{google books |plainurl=y |id=rGFgAAAAMAAJ}}|title=Cybernetics and forecasting techniques|last2=Grigorʹevich Lapa|first2=Valentin|publisher=American Elsevier Pub. Co.|year=1967|first1=A. G.|last1=Ivakhnenko}}</ref>
第一个有多层的功能网络由[https://en.wikipedia.org/wiki/Alexey_Grigorevich_Ivakhnenko Ivakhnenko]和Lapa在1965年发表,它成为了[https://en.wikipedia.org/wiki/Group_method_of_data_handling 数据处理的组方法]<ref name="SCHIDHUB2">{{cite journal|last=Schmidhuber|first=J.|year=2015|title=Deep Learning in Neural Networks: An Overview|journal=Neural Networks|volume=61|pages=85–117|url=https://arxiv.org/abs/1404.7828}}</ref><ref name="ivak1965">{{cite book|url=https://books.google.com/books?id=FhwVNQAACAAJ|title=Cybernetic Predicting Devices|last=Ivakhnenko|first=A. G.|publisher=CCM Information Corporation|year=1973}}</ref><ref name="ivak1967">{{cite book|url=https://books.google.com/books?id=rGFgAAAAMAAJ|title=Cybernetics and forecasting techniques|last2=Grigorʹevich Lapa|first2=Valentin|publisher=American Elsevier Pub. Co.|year=1967|first1=A. G.|last1=Ivakhnenko}}</ref>
在发现了两个执行神经网络的计算机器关键问题的[https://en.wikipedia.org/wiki/Marvin_Minsky Minsky]和[https://en.wikipedia.org/wiki/Seymour_Papert Papert]<ref>{{cite book|url={{google books |plainurl=y |id=Ow1OAQAAIAAJ}}|title=Perceptrons: An Introduction to Computational Geometry|last=Minsky|first=Marvin|first2=Seymour|publisher=MIT Press|year=1969|location=|pages=|author2=Papert}}</ref> 研究的[https://en.wikipedia.org/wiki/Machine_learning|机器学习]后,神经网络的研究停滞了。第一个是基本感知机不能处理异或电路。第二个是计算机没有足够的处理能力来有效地处理大型神经网络需要的任务。神经网络研究步伐放缓直到计算机具有了更好的运算能力。
在发现了两个执行神经网络的计算机器关键问题的[https://en.wikipedia.org/wiki/Marvin_Minsky Minsky]和[https://en.wikipedia.org/wiki/Seymour_Papert Papert]<ref>{{cite book|url=https://books.google.com/books?id=Ow1OAQAAIAAJ|title=Perceptrons: An Introduction to Computational Geometry|last=Minsky|first=Marvin|first2=Seymour|publisher=MIT Press|year=1969|location=|pages=|author2=Papert}}</ref> 研究的[https://en.wikipedia.org/wiki/Machine_learning|机器学习]后,神经网络的研究停滞了。第一个是基本感知机不能处理异或电路。第二个是计算机没有足够的处理能力来有效地处理大型神经网络需要的任务。神经网络研究步伐放缓直到计算机具有了更好的运算能力。
更多的[https://en.wikipedia.org/wiki/Artificial_intelligence 人工智能]专注于[https://en.wikipedia.org/wiki/Algorithm 算法]执行的高层面(符号的)模型,以知识体现在如果-那么规则中的[https://en.wikipedia.org/wiki/Expert_system 专家系统]为特征。直到19世纪80年代末期,研究扩展到低层面(次符号的)[https://en.wikipedia.org/wiki/Machine_learning|机器学习],以知识体现在一个[https://en.wikipedia.org/wiki/Cognitive_model 认知模型]的参数中为特征。
更多的[https://en.wikipedia.org/wiki/Artificial_intelligence 人工智能]专注于[https://en.wikipedia.org/wiki/Algorithm 算法]执行的高层面(符号的)模型,以知识体现在如果-那么规则中的[https://en.wikipedia.org/wiki/Expert_system 专家系统]为特征。直到19世纪80年代末期,研究扩展到低层面(次符号的)[https://en.wikipedia.org/wiki/Machine_learning|机器学习],以知识体现在一个[https://en.wikipedia.org/wiki/Cognitive_model 认知模型]的参数中为特征。
。
。
在19世纪80年代中期,并行分布处理以[https://en.wikipedia.org/wiki/Connectionism 联结主义]的名义变得受欢迎,[https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart]和[https://en.wikipedia.org/wiki/James_McClelland_(psychologist) McClelland]描述了联结主义模拟神经过程的作用。<ref>{{cite book|url={{google books |plainurl=y |id=davmLgzusB8C}}|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|last=Rumelhart|first=D.E|first2=James|publisher=MIT Press|year=1986|location=Cambridge|pages=|author2=McClelland}}</ref>
在19世纪80年代中期,并行分布处理以[https://en.wikipedia.org/wiki/Connectionism 联结主义]的名义变得受欢迎,[https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart]和[https://en.wikipedia.org/wiki/James_McClelland_(psychologist) McClelland]描述了联结主义模拟神经过程的作用。<ref>{{cite book|url=https://books.google.com/books?id=davmLgzusB8C|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|last=Rumelhart|first=D.E|first2=James|publisher=MIT Press|year=1986|location=Cambridge|pages=|author2=McClelland}}</ref>
[https://en.wikipedia.org/wiki/Support_vector_machine 支持向量机(Support vector machine)]和其他更简单的方法如[https://en.wikipedia.org/wiki/Linear_classifier 线性分类器]在机器学习中的受欢迎程度逐步超过了神经网络。然而,使用神经网络改变了一些领域,例如蛋白质结构的预测。<ref>{{cite journal|id=Qian1988|title=
[https://en.wikipedia.org/wiki/Support_vector_machine 支持向量机(Support vector machine)]和其他更简单的方法如[https://en.wikipedia.org/wiki/Linear_classifier 线性分类器]在机器学习中的受欢迎程度逐步超过了神经网络。然而,使用神经网络改变了一些领域,例如蛋白质结构的预测。<ref>{{cite journal|id=Qian1988|title=
[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]影响使用反向传播的多层[前馈神经网络https://en.wikipedia.org/wiki/Feedforward_neural_network 前馈神经网络] 和[https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络](RNN)。<ref name="HOCH19912">S. Hochreiter., "[http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf Untersuchungen zu dynamischen neuronalen Netzen]," ''Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber'', 1991.</ref><ref name="HOCH2001">{{cite book|url={{google books |plainurl=y |id=NWOcMVA64aAC}}|title=A Field Guide to Dynamical Recurrent Networks|last=Hochreiter|first=S.|last2=et al.|date=15 January 2001|publisher=John Wiley & Sons|year=|location=|pages=|chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies|editor-last2=Kremer|editor-first2=Stefan C.|editor-first1=John F.|editor-last1=Kolen}}</ref> 由于梯度从一层到另一层传播,它们随着层数指数级减小,这样阻碍了依赖这些误差的的神经元权重的调整,尤其影响深度网络。
[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]影响使用反向传播的多层[前馈神经网络https://en.wikipedia.org/wiki/Feedforward_neural_network 前馈神经网络] 和[https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络](RNN)。<ref name="HOCH19912">S. Hochreiter., "[http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf Untersuchungen zu dynamischen neuronalen Netzen]," ''Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber'', 1991.</ref><ref name="HOCH2001">{{cite book|url=https://books.google.com/books?id=NWOcMVA64aAC|title=A Field Guide to Dynamical Recurrent Networks|last=Hochreiter|first=S.|last2=et al.|date=15 January 2001|publisher=John Wiley & Sons|year=|location=|pages=|chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies|editor-last2=Kremer|editor-first2=Stefan C.|editor-first1=John F.|editor-last1=Kolen}}</ref> 由于梯度从一层到另一层传播,它们随着层数指数级减小,这样阻碍了依赖这些误差的的神经元权重的调整,尤其影响深度网络。
为了解决这个问题,[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Schmidhuber]采用了一种多层网络结构,通过[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习]每次预训练一级然后使用反向传播很好地调整<ref name="SCHMID1992">J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," ''Neural Computation'', 4, pp. 234–242, 1992.</ref>。例如,Behnke 在图像重建和人脸定位中只依赖梯度符号。<ref>{{cite book|url=http://www.ais.uni-bonn.de/books/LNCS2766.pdf|title=Hierarchical Neural Networks for Image Interpretation.|publisher=Springer|year=2003|series=Lecture Notes in Computer Science|volume=2766|author=Sven Behnke}}</ref>
为了解决这个问题,[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Schmidhuber]采用了一种多层网络结构,通过[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习]每次预训练一级然后使用反向传播很好地调整<ref name="SCHMID1992">J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," ''Neural Computation'', 4, pp. 234–242, 1992.</ref>。例如,Behnke 在图像重建和人脸定位中只依赖梯度符号。<ref>{{cite book|url=http://www.ais.uni-bonn.de/books/LNCS2766.pdf|title=Hierarchical Neural Networks for Image Interpretation.|publisher=Springer|year=2003|series=Lecture Notes in Computer Science|volume=2766|author=Sven Behnke}}</ref>
[https://en.wikipedia.org/wiki/Geoffrey_Hinton Hinton]提出了使用连续层的二进制或潜变量实数[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机]<ref name="smolensky1986">{{cite book|url=http://portal.acm.org/citation.cfm?id=104290|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|year=1986|editors=D. E. Rumelhart, J. L. McClelland, & the PDP Research Group|volume=1|pages=194–281|chapter=Information processing in dynamical systems: Foundations of harmony theory.|last1=Smolensky|first1=P.|authorlink1=Paul Smolensky}}</ref>来模拟每一层学习一种高级别表征。一旦很多层被充分学习,这种深度结构可能像[https://en.wikipedia.org/wiki/Generative_model 生成模型]一样被使用,通过在下采样(一个古老的方法)模型时从顶层特征激活处复制数据。<ref name="hinton2006">{{cite journal|last2=Osindero|first2=S.|last3=Teh|first3=Y.|year=2006|title=A fast learning algorithm for deep belief nets|url=http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf|journal=[https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation]|volume=18|issue=7|pages=1527–1554|last1=Hinton|first1=G. E.|authorlink1=Geoffrey Hinton}}</ref><ref>{{Cite journal|year=2009|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|volume=4|issue=5|pages=5947|last1=Hinton|first1=G.}}</ref> 2012年[https://en.wikipedia.org/wiki/Andrew_Ng Ng] 和[https://en.wikipedia.org/wiki/Jeff_Dean_(computer_scientist) Dean]创造了一个只通过看[https://en.wikipedia.org/wiki/YouTube YouTube]视频中未标记的图像学习识别例如猫这样更高层概念的网络。<ref name="ng2012">{{cite journal|url=https://arxiv.org/abs/1112.6209|first2=Jeff|last2=Dean|title=Building High-level Features Using Large Scale Unsupervised Learning|last1=Ng|first1=Andrew|year=2012|class=cs.LG}}</ref>
[https://en.wikipedia.org/wiki/Geoffrey_Hinton Hinton]提出了使用连续层的二进制或潜变量实数[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机]<ref name="smolensky1986">{{cite book|url=http://portal.acm.org/citation.cfm?id=104290|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition|year=1986|editors=D. E. Rumelhart, J. L. McClelland, & the PDP Research Group|volume=1|pages=194–281|chapter=Information processing in dynamical systems: Foundations of harmony theory.|last1=Smolensky|first1=P.|authorlink1=https://en.wikipedia.org/wiki/Paul_Smolensky}}</ref>来模拟每一层学习一种高级别表征。一旦很多层被充分学习,这种深度结构可能像[https://en.wikipedia.org/wiki/Generative_model 生成模型]一样被使用,通过在下采样(一个古老的方法)模型时从顶层特征激活处复制数据。<ref name="hinton2006">{{cite journal|last2=Osindero|first2=S.|last3=Teh|first3=Y.|year=2006|title=A fast learning algorithm for deep belief nets|url=http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf|journal=[https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation]|volume=18|issue=7|pages=1527–1554|last1=Hinton|first1=G. E.|authorlink1=https://en.wikipedia.org/wiki/Geoffrey_Hinton}}</ref><ref>{{Cite journal|year=2009|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|volume=4|issue=5|pages=5947|last1=Hinton|first1=G.}}</ref> 2012年[https://en.wikipedia.org/wiki/Andrew_Ng Ng] 和[https://en.wikipedia.org/wiki/Jeff_Dean_(computer_scientist) Dean]创造了一个只通过看[https://en.wikipedia.org/wiki/YouTube YouTube]视频中未标记的图像学习识别例如猫这样更高层概念的网络。<ref name="ng2012">{{cite journal|url=https://arxiv.org/abs/1112.6209|first2=Jeff|last2=Dean|title=Building High-level Features Using Large Scale Unsupervised Learning|last1=Ng|first1=Andrew|year=2012|class=cs.LG}}</ref>
在训练深度神经网络中早期的挑战被成功地用无监督预训练等方法处理,与此同时可见的计算性能通过GPU和分布计算的使用提升。神经网络被部署在大规模,尤其是在图像和视觉识别问题上。这被称为“[https://en.wikipedia.org/wiki/Deep_learning 深度学习]”
在训练深度神经网络中早期的挑战被成功地用无监督预训练等方法处理,与此同时可见的计算性能通过GPU和分布计算的使用提升。神经网络被部署在大规模,尤其是在图像和视觉识别问题上。这被称为“[https://en.wikipedia.org/wiki/Deep_learning 深度学习]”
这种方法基于GPU的实现<ref name=":6">{{Cite journal|last=Ciresan|first=D. C.|last2=Meier|first2=U.|last3=Masci|first3=J.|last4=Gambardella|first4=L. M.|last5=Schmidhuber|first5=J.|date=2011|editor-last=|title=Flexible, High Performance Convolutional Neural Networks for Image Classification|url=http://ijcai.org/papers11/Papers/IJCAI11-210.pdf|journal=International Joint Conference on Artificial Intelligence|volume=|pages=|via=}}</ref>赢得了很多模式识别竞赛,包括IJCNN2011交通信号识别比赛<ref name=":72"/>,ISBI2012叠加电子显微镜中的神经结构分割挑战<ref name=":8">{{Cite book|url=http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf|title=Advances in Neural Information Processing Systems 25|last=Ciresan|first=Dan|last2=Giusti|first2=Alessandro|last3=Gambardella|first3=Luca M.|last4=Schmidhuber|first4=Juergen|date=2012|publisher=Curran Associates, Inc.|editor-last=Pereira|editor-first=F.|pages=2843–2851|editor-last2=Burges|editor-first2=C. J. C.|editor-last3=Bottou|editor-first3=L.|editor-last4=Weinberger|editor-first4=K. Q.}}</ref>和[https://en.wikipedia.org/wiki/ImageNet_Competition ImageNet竞赛]<ref name="krizhevsky2012">{{cite journal|last2=Sutskever|first2=Ilya|last3=Hinton|first3=Geoffry|date=2012|title=ImageNet Classification with Deep Convolutional Neural Networks|url=https://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf|journal=NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada|last1=Krizhevsky|first1=Alex}}</ref> 以及其他比赛。
这种方法基于GPU的实现<ref name=":6">{{Cite journal|last=Ciresan|first=D. C.|last2=Meier|first2=U.|last3=Masci|first3=J.|last4=Gambardella|first4=L. M.|last5=Schmidhuber|first5=J.|date=2011|editor-last=|title=Flexible, High Performance Convolutional Neural Networks for Image Classification|url=http://ijcai.org/papers11/Papers/IJCAI11-210.pdf|journal=International Joint Conference on Artificial Intelligence|volume=|pages=|via=}}</ref>赢得了很多模式识别竞赛,包括IJCNN2011交通信号识别比赛<ref name=":72"/>,ISBI2012叠加电子显微镜中的神经结构分割挑战<ref name=":8">{{Cite book|url=http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf|title=Advances in Neural Information Processing Systems 25|last=Ciresan|first=Dan|last2=Giusti|first2=Alessandro|last3=Gambardella|first3=Luca M.|last4=Schmidhuber|first4=Juergen|date=2012|publisher=Curran Associates, Inc.|editor-last=Pereira|editor-first=F.|pages=2843–2851|editor-last2=Burges|editor-first2=C. J. C.|editor-last3=Bottou|editor-first3=L.|editor-last4=Weinberger|editor-first4=K. Q.}}</ref>和[https://en.wikipedia.org/wiki/ImageNet_Competition ImageNet竞赛]<ref name="krizhevsky2012">{{cite journal|last2=Sutskever|first2=Ilya|last3=Hinton|first3=Geoffry|date=2012|title=ImageNet Classification with Deep Convolutional Neural Networks|url=https://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf|journal=NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada|last1=Krizhevsky|first1=Alex}}</ref> 以及其他比赛。
被[https://en.wikipedia.org/wiki/Simple_cell 简单]和[https://en.wikipedia.org/wiki/Complex_cell 复杂细胞]启发的,与[https://en.wikipedia.org/wiki/Neocognitron 新认知机]<ref name="K. Fukushima. Neocognitron 1980">{{cite journal|year=1980|title=Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position|journal=Biological Cybernetics|volume=36|issue=4|pages=93–202|author=Fukushima, K.}}</ref> 相似的深度的高度非线性神经结构和“标准视觉结构”<ref>{{cite journal|last2=Poggio|first2=T|year=1999|title=Hierarchical models of object recognition in cortex|journal=Nature Neuroscience|volume=2|issue=11|pages=1019–1025|last1=Riesenhuber|first1=M}}</ref>,被Hinton提出的无监督方法预训练<ref name=":1">{{Cite journal|last=Hinton|first=Geoffrey|date=2009-05-31|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|language=en|volume=4|issue=5|pages=5947}}</ref><ref name="hinton2006" />。他实验室的一个团队赢得了一个2012年的竞赛,这个竞赛由[https://en.wikipedia.org/wiki/Merck_%26_Co. Merck]资助来设计可以帮助找到能识别新药物分子的软件。<ref>{{cite news|url=https://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html|title=Scientists See Promise in Deep-Learning Programs|last=Markoff|first=John|date=November 23, 2012|author=|newspaper=New York Times}}</ref>
被[https://en.wikipedia.org/wiki/Simple_cell 简单]和[https://en.wikipedia.org/wiki/Complex_cell 复杂细胞]启发的,与[https://en.wikipedia.org/wiki/Neocognitron 新认知机]<ref name="K. Fukushima. Neocognitron 1980">{{cite journal|year=1980|title=Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position|journal=Biological Cybernetics|volume=36|issue=4|pages=93–202|author=Fukushima, K.}}</ref> 相似的深度的高度非线性神经结构和“标准视觉结构”<ref>{{cite journal|last2=Poggio|first2=T|year=1999|title=Hierarchical models of object recognition in cortex|journal=Nature Neuroscience|volume=2|issue=11|pages=1019–1025|last1=Riesenhuber|first1=M}}</ref>,被Hinton提出的无监督方法预训练<ref name=":1">{{Cite journal|last=Hinton|first=Geoffrey|date=2009-05-31|title=Deep belief networks|url=http://www.scholarpedia.org/article/Deep_belief_networks|journal=Scholarpedia|language=en|volume=4|issue=5|pages=5947}}</ref><ref name="hinton2006" />。他实验室的一个团队赢得了一个2012年的竞赛,这个竞赛由[https://en.wikipedia.org/wiki/Merck_%26_Co. Merck]资助来设计可以帮助找到能识别新药物分子的软件。<ref>{{cite journal|url=https://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html|title=Scientists See Promise in Deep-Learning Programs|last=Markoff|first=John|date=November 23, 2012|author=|newspaper=New York Times}}</ref>
=== 卷积网络(Convolutional networks) ===
=== 卷积网络(Convolutional networks) ===