更改

添加203,908字节、 2020年5月12日 (二) 17:55

Moved page from wikipedia:en:Deep learning (history)

此词条暂由彩云小译翻译，未经人工整理和审校，带来阅读不便，请见谅。

{{About||deep versus shallow learning in educational psychology|Student approaches to learning|more information|Artificial neural network}}

{{short description|Branch of machine learning}}

{{machine learning bar}}

'''Deep learning''' (also known as '''deep structured learning''') is part of a broader family of [[machine learning]] methods based on [[artificial neural networks]] with [[representation learning]]. Learning can be [[Supervised learning|supervised]], [[Semi-supervised learning|semi-supervised]] or [[Unsupervised learning|unsupervised]].<ref name="BENGIO2012" /><ref name="SCHIDHUB" /><ref name="NatureBengio">{{cite journal |last1=Bengio |first1=Yoshua |last2=LeCun |first2= Yann| last3=Hinton | first3= Geoffrey|year=2015 |title=Deep Learning |journal=Nature |volume=521 |issue=7553 |pages=436–444 |doi=10.1038/nature14539 |pmid=26017442|bibcode=2015Natur.521..436L |url=https://www.semanticscholar.org/paper/a4cec122a08216fe8a3bc19b22e78fbaea096256 }}</ref>

Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

深度学习(也称为深度学习学习)是基于表征学习的人工神经网络的机器学习方法大家庭的一部分。学习可以是有监督的，半监督的或者无监督的。

Deep learning architectures such as [[#Deep_neural_networks|deep neural network]]s, [[deep belief network]]s, [[recurrent neural networks]] and [[convolutional neural networks]] have been applied to fields including [[computer vision]], [[automatic speech recognition|speech recognition]], [[natural language processing]], [[audio recognition]], social network filtering, [[machine translation]], [[bioinformatics]], [[drug design]], medical image analysis, material inspection and [[board game]] programs, where they have produced results comparable to and in some cases surpassing human expert performance.<ref name=":9">{{Cite book |doi=10.1109/cvpr.2012.6248110 |isbn=978-1-4673-1228-8|arxiv=1202.2745|chapter=Multi-column deep neural networks for image classification|title=2012 IEEE Conference on Computer Vision and Pattern Recognition|pages=3642–3649|year=2012|last1=Ciresan|first1=D.|last2=Meier|first2=U.|last3=Schmidhuber|first3=J.}}</ref><ref name="krizhevsky2012">{{cite journal|last1=Krizhevsky|first1=Alex|last2=Sutskever|first2=Ilya|last3=Hinton|first3=Geoffry|date=2012|title=ImageNet Classification with Deep Convolutional Neural Networks|url=https://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf|journal=NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada}}

Deep learning architectures such as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.<ref name="krizhevsky2012">

深度神经网络、深度信念网络、回归神经网络和卷积神经网络等深度学习体系结构已被应用于计算机视觉、语音识别、自然语言处理、音频识别、社会网络滤波、机器翻译、生物信息学、药物设计、医学图像分析、材料检测和棋盘游戏等领域，并取得了与人类专家表现相当甚至超过的结果。 2012年12月12日

</ref><ref>{{cite web |title=Google's AlphaGo AI wins three-match series against the world's best Go player |url=https://techcrunch.com/2017/05/24/alphago-beats-planets-best-human-go-player-ke-jie/amp/ |website=TechCrunch |date=25 May 2017}}</ref>

</ref>

/ 参考

[[Artificial neural network]]s (ANNs) were inspired by information processing and distributed communication nodes in biological systems. ANNs have various differences from biological [[brain]]s. Specifically, neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analog.<ref>{{Cite journal|last=Marblestone|first=Adam H.|last2=Wayne|first2=Greg|last3=Kording|first3=Konrad P.|date=2016|title=Toward an Integration of Deep Learning and Neuroscience |journal=Frontiers in Computational Neuroscience |volume=10|pages=94|doi=10.3389/fncom.2016.00094 |pmc=5021692|pmid=27683554|bibcode=2016arXiv160603813M|arxiv=1606.03813|url=https://www.semanticscholar.org/paper/2dec4f52b1ce552b416f086d4ea1040626675dfa}}</ref><ref>{{cite journal|last1=Olshausen|first1=B. A.|year=1996|title=Emergence of simple-cell receptive field properties by learning a sparse code for natural images|journal=Nature|volume=381|issue=6583|pages=607–609|bibcode=1996Natur.381..607O|doi=10.1038/381607a0|pmid=8637596|url=https://www.semanticscholar.org/paper/8012c4a1e2ca663f1a04e80cbb19631a00cbab27}}</ref><ref>{{cite arxiv|last=Bengio|first=Yoshua|last2=Lee|first2=Dong-Hyun|last3=Bornschein|first3=Jorg|last4=Mesnard|first4=Thomas|last5=Lin|first5=Zhouhan|date=2015-02-13|title=Towards Biologically Plausible Deep Learning|eprint=1502.04156|class=cs.LG}}</ref>

Artificial neural networks (ANNs) were inspired by information processing and distributed communication nodes in biological systems. ANNs have various differences from biological brains. Specifically, neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analog.

人工神经网络是受生物系统中信息处理和分布式通信节点的启发而产生的。人工神经网络与生物大脑有很多不同之处。具体来说，神经网络往往是静态的和符号化的，而大多数生物有机体的生物大脑是动态的(可塑的)和类似的。

The adjective "deep" in deep learning comes from the use of multiple layers in the network. Early work showed that a linear [[perceptron]] cannot be a universal classifier, and then that a network with a nonpolynomial activation function with one hidden layer of unbounded width can on the other hand so be. Deep learning is a modern variation which is concerned with an unbounded number of layers of bounded size, which permits practical application and optimized implementation, while retaining theoretical universality under mild conditions. In deep learning the layers are also permitted to be heterogeneous and to deviate widely from biologically informed [[connectionism|connectionist]] models, for the sake of efficiency, trainability and understandability, whence the "structured" part.

The adjective "deep" in deep learning comes from the use of multiple layers in the network. Early work showed that a linear perceptron cannot be a universal classifier, and then that a network with a nonpolynomial activation function with one hidden layer of unbounded width can on the other hand so be. Deep learning is a modern variation which is concerned with an unbounded number of layers of bounded size, which permits practical application and optimized implementation, while retaining theoretical universality under mild conditions. In deep learning the layers are also permitted to be heterogeneous and to deviate widely from biologically informed connectionist models, for the sake of efficiency, trainability and understandability, whence the "structured" part.

深度学习中的形容词“深度”来自于网络中多层次的使用。早期的工作表明线性感知器不可能是一个通用的分类器，然后一个拥有一个无限宽度的隐藏层的非多项式激活函数的网络可以是这样的。深度学习是一种现代变体，它涉及到有限的层数，允许实际应用和优化实现，同时在温和的条件下保持理论的普遍性。在深度学习中，为了提高效率、可训练性和可理解性，允许层次结构具有异质性和广泛偏离生物信息连接主义模型。

{{toclimit|3}}

== Definition ==

[[File:Deep Learning.jpg|alt=Representing Images on Multiple Layers of Abstraction in Deep Learning|thumb|Representing Images on Multiple Layers of Abstraction in Deep Learning <ref>{{Cite journal|last=Schulz|first=Hannes|last2=Behnke|first2=Sven|date=2012-11-01|title=Deep Learning|journal=KI - Künstliche Intelligenz|language=en|volume=26|issue=4|pages=357–363|doi=10.1007/s13218-012-0198-z|issn=1610-1987|url=https://www.semanticscholar.org/paper/51a80649d16a38d41dbd20472deb3bc9b61b59a0}}</ref>]]

Representing Images on Multiple Layers of Abstraction in Deep Learning

深度学习中基于多层抽象的图像表示

Deep learning is a class of [[machine learning]] [[algorithm]]s that<ref name="BOOK2014">{{cite journal|last2=Yu|first2=D.|year=2014|title=Deep Learning: Methods and Applications|url=http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol7-SIG-039.pdf|journal=Foundations and Trends in Signal Processing|volume=7|issue=3–4|pages=1–199|doi=10.1561/2000000039|last1=Deng|first1=L.}}</ref>{{rp|pages=199–200}} uses multiple layers to progressively extract higher level features from the raw input. For example, in [[image processing]], lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.

Deep learning is a class of machine learning algorithms that uses multiple layers to progressively extract higher level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.

深度学习是一类机器学习算法，它使用多层逐步从原始输入中提取更高层次的特征。例如，在图像处理中，较低的图层可以识别边缘，而较高的图层可以识别与人相关的概念，如数字、字母或面孔。

== Overview ==

Most modern deep learning models are based on artificial neural networks, specifically, [[Convolutional Neural Network]]s (CNN)s, although they can also include [[propositional formula]]s or latent variables organized layer-wise in deep [[generative model]]s such as the nodes in [[deep belief network]]s and deep [[Boltzmann machine]]s.<ref name="BENGIODEEP">{{cite journal|last=Bengio|first=Yoshua|year=2009|title=Learning Deep Architectures for AI|url=http://sanghv.com/download/soft/machine%20learning,%20artificial%20intelligence,%20mathematics%20ebooks/ML/learning%20deep%20architectures%20for%20AI%20%282009%29.pdf|journal=Foundations and Trends in Machine Learning|volume=2|issue=1|pages=1–127|doi=10.1561/2200000006|citeseerx=10.1.1.701.9550|access-date=2015-09-03|archive-url=https://web.archive.org/web/20160304084250/http://sanghv.com/download/soft/machine%20learning,%20artificial%20intelligence,%20mathematics%20ebooks/ML/learning%20deep%20architectures%20for%20AI%20(2009).pdf|archive-date=2016-03-04|url-status=dead}}</ref>

Most modern deep learning models are based on artificial neural networks, specifically, Convolutional Neural Networks (CNN)s, although they can also include propositional formulas or latent variables organized layer-wise in deep generative models such as the nodes in deep belief networks and deep Boltzmann machines.

大多数现代深度学习模型都是基于人工神经网络，特别是基于卷积神经网络，虽然它们也可以包括命题公式或潜在变量组织在深度生成模型，如深度信念网络和深度玻尔兹曼机器的节点。

In deep learning, each level learns to transform its input data into a slightly more abstract and composite representation. In an image recognition application, the raw input may be a [[Matrix (mathematics)|matrix]] of pixels; the first representational layer may abstract the pixels and encode edges; the second layer may compose and encode arrangements of edges; the third layer may encode a nose and eyes; and the fourth layer may recognize that the image contains a face. Importantly, a deep learning process can learn which features to optimally place in which level ''on its own''. (Of course, this does not completely eliminate the need for hand-tuning; for example, varying numbers of layers and layer sizes can provide different degrees of abstraction.)<ref name="BENGIO2012">{{cite journal|last2=Courville|first2=A.|last3=Vincent|first3=P.|year=2013|title=Representation Learning: A Review and New Perspectives|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=35|issue=8|pages=1798–1828|arxiv=1206.5538|doi=10.1109/tpami.2013.50|pmid=23787338|last1=Bengio|first1=Y.}}</ref><ref>{{cite journal|last1=LeCun|first1=Yann|last2=Bengio|first2=Yoshua|last3=Hinton|first3=Geoffrey|title=Deep learning|journal=Nature|date=28 May 2015|volume=521|issue=7553|pages=436–444|doi=10.1038/nature14539|pmid=26017442|bibcode=2015Natur.521..436L|url=https://www.semanticscholar.org/paper/a4cec122a08216fe8a3bc19b22e78fbaea096256}}</ref>

In deep learning, each level learns to transform its input data into a slightly more abstract and composite representation. In an image recognition application, the raw input may be a matrix of pixels; the first representational layer may abstract the pixels and encode edges; the second layer may compose and encode arrangements of edges; the third layer may encode a nose and eyes; and the fourth layer may recognize that the image contains a face. Importantly, a deep learning process can learn which features to optimally place in which level on its own. (Of course, this does not completely eliminate the need for hand-tuning; for example, varying numbers of layers and layer sizes can provide different degrees of abstraction.)

在深度学习中，每个级别都学习将其输入数据转换成稍微更抽象和更复杂的表示。在图像识别应用中，原始输入可以是像素矩阵; 第一代表层可以提取像素并编码边缘; 第二层可以编码边缘排列; 第三层可以编码鼻子和眼睛; 第四层可以识别图像包含人脸。重要的是，深度学习过程可以了解哪些特性可以最佳地放置在哪个级别本身。(当然，这并不能完全消除手工调优的需要; 例如，不同的层数和层大小可以提供不同程度的抽象。)

The word "deep" in "deep learning" refers to the number of layers through which the data is transformed. More precisely, deep learning systems have a substantial ''credit assignment path'' (CAP) depth. The CAP is the chain of transformations from input to output. CAPs describe potentially causal connections between input and output. For a [[feedforward neural network]], the depth of the CAPs is that of the network and is the number of hidden layers plus one (as the output layer is also parameterized). For [[recurrent neural network]]s, in which a signal may propagate through a layer more than once, the CAP depth is potentially unlimited.<ref name="SCHIDHUB" /> No universally agreed upon threshold of depth divides shallow learning from deep learning, but most researchers agree that deep learning involves CAP depth higher than 2. CAP of depth 2 has been shown to be a universal approximator in the sense that it can emulate any function.<ref>{{Cite book|url=https://books.google.com/books?id=9CqQDwAAQBAJ&pg=PA15&dq#v=onepage&q&f=false|title=Human Behavior and Another Kind in Consciousness: Emerging Research and Opportunities: Emerging Research and Opportunities|last=Shigeki|first=Sugiyama|date=2019-04-12|publisher=IGI Global|isbn=978-1-5225-8218-2|language=en}}</ref> Beyond that, more layers do not add to the function approximator ability of the network. Deep models (CAP > 2) are able to extract better features than shallow models and hence, extra layers help in learning the features effectively.

The word "deep" in "deep learning" refers to the number of layers through which the data is transformed. More precisely, deep learning systems have a substantial credit assignment path (CAP) depth. The CAP is the chain of transformations from input to output. CAPs describe potentially causal connections between input and output. For a feedforward neural network, the depth of the CAPs is that of the network and is the number of hidden layers plus one (as the output layer is also parameterized). For recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP depth is potentially unlimited. Beyond that, more layers do not add to the function approximator ability of the network. Deep models (CAP > 2) are able to extract better features than shallow models and hence, extra layers help in learning the features effectively.

“深度学习”中的“深度”一词指的是数据转换所经过的层数。更准确地说，深度学习系统有一个实质性的学分分配路径(CAP)深度。Cap 是从输入到输出的转换链。Caps 描述了输入和输出之间潜在的因果关系。对于一个前馈神经网络，CAPs 的深度是网络的深度，是隐藏层的数量加上1(因为输出层也是参数化的)。对于回归神经网络，其中一个信号可以通过一个层传播多次，CAP 的深度是潜在的无限的。除此之外，更多的层不会增加网络的函数逼近能力。深层模型(CAP 2)能够比浅层模型更好地提取特征，因此，额外的层有助于有效地学习特征。

Deep learning architectures can be constructed with a [[greedy algorithm|greedy]] layer-by-layer method.<ref name=BENGIO2007>{{cite conference | first1=Yoshua | last1=Bengio | first2=Pascal | last2=Lamblin | first3=Dan|last3=Popovici |first4=Hugo|last4=Larochelle | title=Greedy layer-wise training of deep networks| year=2007 | url=http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks.pdf| conference = Advances in neural information processing systems | pages= 153–160}}</ref> Deep learning helps to disentangle these abstractions and pick out which features improve performance.<ref name="BENGIO2012" />

Deep learning architectures can be constructed with a greedy layer-by-layer method. Deep learning helps to disentangle these abstractions and pick out which features improve performance.

深度学习架构可以用贪婪的层层方法构建。深度学习有助于理清这些抽象概念，并找出哪些特性可以提高性能。

For [[supervised learning]] tasks, deep learning methods eliminate [[feature engineering]], by translating the data into compact intermediate representations akin to [[Principal Component Analysis|principal components]], and derive layered structures that remove redundancy in representation.

For supervised learning tasks, deep learning methods eliminate feature engineering, by translating the data into compact intermediate representations akin to principal components, and derive layered structures that remove redundancy in representation.

对于监督式学习任务，深度学习方法通过将数据转换成类似于主成分的紧凑的中间表示形式，消除了特征工程，并派生出层次结构，消除了表示中的冗余。

Deep learning algorithms can be applied to unsupervised learning tasks. This is an important benefit because unlabeled data are more abundant than the labeled data. Examples of deep structures that can be trained in an unsupervised manner are neural history compressors<ref name="scholarpedia">Jürgen Schmidhuber (2015). Deep Learning. Scholarpedia, 10(11):32832. [http://www.scholarpedia.org/article/Deep_Learning Online]</ref> and [[deep belief network]]s.<ref name="BENGIO2012" /><ref name="SCHOLARDBNS">{{cite journal | last1 = Hinton | first1 = G.E. | year = 2009| title = Deep belief networks | url= | journal = Scholarpedia | volume = 4 | issue = 5| page = 5947 | doi=10.4249/scholarpedia.5947| bibcode = 2009SchpJ...4.5947H}}</ref>

Deep learning algorithms can be applied to unsupervised learning tasks. This is an important benefit because unlabeled data are more abundant than the labeled data. Examples of deep structures that can be trained in an unsupervised manner are neural history compressors and deep belief networks.

深度学习算法可以应用于非监督式学习任务。这是一个重要的好处，因为未标记的数据比标记的数据更加丰富。可以用无监督方式训练的深层结构的例子有神经历史压缩器和深度信念网络。

== Interpretations ==

Deep neural networks are generally interpreted in terms of the [[universal approximation theorem]]<ref name="ReferenceB">Balázs Csanád Csáji (2001). Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary</ref><ref name=cyb>{{cite journal | last1 = Cybenko | year = 1989 | title = Approximations by superpositions of sigmoidal functions | url = http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf | journal = [[Mathematics of Control, Signals, and Systems]] | volume = 2 | issue = 4 | pages = 303–314 | doi = 10.1007/bf02551274 | url-status = dead | archiveurl = https://web.archive.org/web/20151010204407/http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf | archivedate = 2015-10-10 }}</ref><ref name=horn>{{cite journal | last1 = Hornik | first1 = Kurt | year = 1991 | title = Approximation Capabilities of Multilayer Feedforward Networks | url= | journal = Neural Networks | volume = 4 | issue = 2| pages = 251–257 | doi=10.1016/0893-6080(91)90009-t}}</ref><ref name="Haykin, Simon 1998">{{cite book|first=Simon S. |last=Haykin|title=Neural Networks: A Comprehensive Foundation|url={{google books |plainurl=y |id=bX4pAQAAMAAJ}}|year=1999|publisher=Prentice Hall|isbn=978-0-13-273350-2}}</ref><ref name="Hassoun, M. 1995 p. 48">{{cite book|first=Mohamad H. |last=Hassoun|title=Fundamentals of Artificial Neural Networks|url={{google books |plainurl=y |id=Otk32Y3QkxQC|page=48}}|year=1995|publisher=MIT Press|isbn=978-0-262-08239-6|p=48}}</ref><ref name=ZhouLu>Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). [http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width The Expressive Power of Neural Networks: A View from the Width]. Neural Information Processing Systems, 6231-6239.

Deep neural networks are generally interpreted in terms of the universal approximation theorem<ref name=ZhouLu>Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). [http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width The Expressive Power of Neural Networks: A View from the Width]. Neural Information Processing Systems, 6231-6239.

深层神经网络通常是根据泛近似定理来解释的，该定理的名称为: [[[] ，[] ，[]] ，[] ，[]] ，[] ，[]] ，[] ，[] ，[] ，[]]。Http://papers.nips.cc/paper/7203-The-Expressive-Power-of-Neural-Networks-a-View-from-The-Width : 神经网络的表达能力: 从宽度来看。神经信息处理系统，6231-6239。

</ref> or [[Bayesian inference|probabilistic inference]].<ref name="BOOK2014" /><ref name="BENGIODEEP" /><ref name="BENGIO2012" /><ref name="SCHIDHUB">{{cite journal|last=Schmidhuber|first=J.|year=2015|title=Deep Learning in Neural Networks: An Overview|journal=Neural Networks|volume=61|pages=85–117|arxiv=1404.7828|doi=10.1016/j.neunet.2014.09.003|pmid=25462637|url=https://www.semanticscholar.org/paper/126df9f24e29feee6e49e135da102fbbd9154a48}}</ref><ref name="SCHOLARDBNS" /><ref name = MURPHY>{{cite book|first=Kevin P. |last=Murphy|title=Machine Learning: A Probabilistic Perspective|url={{google books |plainurl=y |id=NZP6AQAAQBAJ}}|date=24 August 2012|publisher=MIT Press|isbn=978-0-262-01802-9}}</ref><ref name= "Patel NIPS 2016">{{Cite journal|url=https://papers.nips.cc/paper/6231-a-probabilistic-framework-for-deep-learning.pdf|title=A Probabilistic Framework for Deep Learning|last=Patel|first=Ankit|last2=Nguyen|first2=Tan|last3=Baraniuk|first3=Richard|date=2016|journal=Advances in Neural Information Processing Systems|pages=|bibcode=2016arXiv161201936P|arxiv=1612.01936}}</ref>

</ref> or probabilistic inference.

/ ref 或概率推断。

The classic universal approximation theorem concerns the capacity of [[feedforward neural networks]] with a single hidden layer of finite size to approximate [[continuous functions]].<ref name="ReferenceB"/><ref name="cyb"/><ref name="horn"/><ref name="Haykin, Simon 1998"/><ref name="Hassoun, M. 1995 p. 48"/> In 1989, the first proof was published by [[George Cybenko]] for [[sigmoid function|sigmoid]] activation functions<ref name="cyb" /> and was generalised to feed-forward multi-layer architectures in 1991 by Kurt Hornik.<ref name="horn" /> Recent work also showed that universal approximation also holds for non-bounded activation functions such as the rectified linear unit.<ref name=sonoda17>{{cite journal | last1 = Sonoda | first1 = Sho | last2=Murata | first2=Noboru | year = 2017 | title = Neural network with unbounded activation functions is universal approximator | journal = Applied and Computational Harmonic Analysis | volume = 43 | issue = 2 | pages = 233–268 | doi = 10.1016/j.acha.2015.12.005| arxiv = 1505.03654 | url = https://www.semanticscholar.org/paper/d0e48a4d5d6d0b4aa2dbab2c50560945e62a3817 }}</ref>

The classic universal approximation theorem concerns the capacity of feedforward neural networks with a single hidden layer of finite size to approximate continuous functions.

经典的通用逼近定理是关于具有一个有限大小的隐层的前向神经网络逼近连续函数的能力。

The universal approximation theorem for [[deep neural network]]s concerns the capacity of networks with bounded width but the depth is allowed to grow. Lu et al.<ref name=ZhouLu/> proved that if the width of a [[deep neural network]] with [[ReLU]] activation is strictly larger than the input dimension, then the network can approximate any [[Lebesgue integration|Lebesgue integrable function]]; If the width is smaller or equal to the input dimension, then [[deep neural network]] is not a universal approximator.

The universal approximation theorem for deep neural networks concerns the capacity of networks with bounded width but the depth is allowed to grow. Lu et al. proved that if the width of a deep neural network with ReLU activation is strictly larger than the input dimension, then the network can approximate any Lebesgue integrable function; If the width is smaller or equal to the input dimension, then deep neural network is not a universal approximator.

深度神经网络的通用近似定理是关于宽度有界但深度可以增长的网络的容量。卢等人。证明了如果具有 ReLU 激活的深层神经网络的宽度严格大于输入维数，则该网络可以逼近任意 Lebesgue 可积函数; 如果宽度小于或等于输入维数，则深层神经网络不是通用逼近器。

The [[probabilistic]] interpretation<ref name="MURPHY" /> derives from the field of [[machine learning]]. It features inference,<ref name="BOOK2014" /><ref name="BENGIODEEP" /><ref name="BENGIO2012" /><ref name="SCHIDHUB" /><ref name="SCHOLARDBNS" /><ref name="MURPHY" /> as well as the [[optimization]] concepts of [[training]] and [[test (assessment)|testing]], related to fitting and [[generalization]], respectively. More specifically, the probabilistic interpretation considers the activation nonlinearity as a [[cumulative distribution function]].<ref name="MURPHY" /> The probabilistic interpretation led to the introduction of [[dropout (neural networks)|dropout]] as [[Regularization (mathematics)|regularizer]] in neural networks.<ref name="DROPOUT">{{cite arXiv |last1=Hinton |first1=G. E. |last2=Srivastava| first2 =N.|last3=Krizhevsky| first3=A.| last4 =Sutskever| first4=I.| last5=Salakhutdinov| first5=R.R.|eprint=1207.0580 |class=math.LG |title=Improving neural networks by preventing co-adaptation of feature detectors |date=2012}}</ref> The probabilistic interpretation was introduced by researchers including [[John Hopfield|Hopfield]], [[Bernard Widrow|Widrow]] and [[Kumpati S. Narendra|Narendra]] and popularized in surveys such as the one by [[Christopher Bishop|Bishop]].<ref name="prml">{{cite book|title=Pattern Recognition and Machine Learning|author=Bishop, Christopher M.|year=2006|publisher=Springer|url=http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf|isbn=978-0-387-31073-2}}</ref>

The probabilistic interpretation The probabilistic interpretation was introduced by researchers including Hopfield, Widrow and Narendra and popularized in surveys such as the one by Bishop.

概率解释概率解释是由包括霍普菲尔德、维德罗和纳伦德拉在内的研究人员提出的，并在毕肖普等人的调查中得到推广。

== History ==

The first general, working learning algorithm for supervised, deep, feedforward, multilayer [[perceptron]]s was published by [[Alexey Ivakhnenko]] and Lapa in 1967.<ref name="ivak1965">{{cite book|first1=A. G. |last1=Ivakhnenko |first2=V. G. |last2=Lapa |title=Cybernetics and Forecasting Techniques|url={{google books |plainurl=y |id=rGFgAAAAMAAJ}}|year=1967|publisher=American Elsevier Publishing Co.|isbn=978-0-444-00020-0}}</ref> A 1971 paper described already a deep network with 8 layers trained by the [[group method of data handling]] algorithm.<ref name="ivak1971">{{Cite journal|last=Ivakhnenko|first=Alexey|date=1971|title=Polynomial theory of complex systems|url=http://gmdh.net/articles/history/polynomial.pdf |journal=IEEE Transactions on Systems, Man and Cybernetics |pages=364–378|doi=10.1109/TSMC.1971.4308320|pmid=|accessdate=|volume=SMC-1|issue=4}}</ref> Other deep learning working architectures, specifically those built for [[computer vision]], began with the [[Neocognitron]] introduced by [[Kunihiko Fukushima]] in 1980.<ref name="FUKU1980">{{cite journal | last1 = Fukushima | first1 = K. | year = 1980 | title = Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position | url= | journal = Biol. Cybern. | volume = 36 | issue = 4| pages = 193–202 | doi=10.1007/bf00344251 | pmid=7370364}}</ref>

The first general, working learning algorithm for supervised, deep, feedforward, multilayer perceptrons was published by Alexey Ivakhnenko and Lapa in 1967. A 1971 paper described already a deep network with 8 layers trained by the group method of data handling algorithm. Other deep learning working architectures, specifically those built for computer vision, began with the Neocognitron introduced by Kunihiko Fukushima in 1980.

第一个一般的，工作学习算法的监督，深，前馈，多层感知器是由 Alexey Ivakhnenko 和 Lapa 在1967年发表。1971年的一篇论文已经描述了一个用数据处理算法的组合方法训练的8层深度网络。其他的深度学习工作架构，特别是那些为计算机视觉建造的架构，始于1980年由福岛邦彦介绍的 Neocognitron。

The term ''Deep Learning'' was introduced to the machine learning community by [[Rina Dechter]] in 1986,<ref name="dechter1986">[[Rina Dechter]] (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory.[https://www.researchgate.net/publication/221605378_Learning_While_Searching_in_Constraint-Satisfaction-Problems Online]</ref><ref name="scholarpedia" /> and to [[Artificial Neural Networks|artificial neural networks]] by Igor Aizenberg and colleagues in 2000, in the context of [[Boolean network|Boolean]] threshold neurons.<ref name="aizenberg2000">Igor Aizenberg, Naum N. Aizenberg, Joos P.L. Vandewalle (2000). Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications. Springer Science & Business Media.</ref><ref>Co-evolving recurrent neurons learn deep memory POMDPs. Proc. GECCO, Washington, D. C., pp. 1795-1802, ACM Press, New York, NY, USA, 2005.</ref>

The term Deep Learning was introduced to the machine learning community by Rina Dechter in 1986,

深度学习这个术语是由 Rina Dechter 在1986年引入机器学习社区的,

In 1989, [[Yann LeCun]] et al. applied the standard backpropagation algorithm, which had been around as the reverse mode of [[automatic differentiation]] since 1970,<ref name="lin1970">[[Seppo Linnainmaa]] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6-7.</ref><ref name="grie2012">{{Cite journal|last=Griewank|first=Andreas|date=2012|title=Who Invented the Reverse Mode of Differentiation?|url=http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|journal=Documenta Mathematica|issue=Extra Volume ISMP|pages=389–400|access-date=2017-06-11|archive-url=https://web.archive.org/web/20170721211929/http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|archive-date=2017-07-21|url-status=dead}}</ref><ref name="WERBOS1974">{{Cite journal|last=Werbos|first=P.|date=1974|title=Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences |url=https://www.researchgate.net/publication/35657389 |journal=Harvard University |accessdate=12 June 2017}}</ref><ref name="werbos1982">{{Cite book|chapter-url=ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf|title=System modeling and optimization|last=Werbos|first=Paul|publisher=Springer|year=1982|isbn=|location=|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis}}</ref> to a deep neural network with the purpose of [[Handwriting_recognition|recognizing handwritten ZIP code]]s on mail. While the algorithm worked, training required 3 days.<ref name="LECUN1989">LeCun ''et al.'', "Backpropagation Applied to Handwritten Zip Code Recognition," ''Neural Computation'', 1, pp. 541–551, 1989.</ref>

In 1989, Yann LeCun et al. applied the standard backpropagation algorithm, which had been around as the reverse mode of automatic differentiation since 1970, to a deep neural network with the purpose of recognizing handwritten ZIP codes on mail. While the algorithm worked, training required 3 days.

In 1989, Yann LeCun et al.将标准的反向传播算法应用于深度神经网络，该算法自1970年以来一直被用作自动微分邮政编码的反向模式，目的是识别手写邮政编码。当算法工作时，训练需要3天。

By 1991 such systems were used for recognizing isolated 2-D hand-written digits, while [[3D object recognition|recognizing 3-D objects]] was done by matching 2-D images with a handcrafted 3-D object model. Weng ''et al.'' suggested that a human brain does not use a monolithic 3-D object model and in 1992 they published Cresceptron,<ref name="Weng1992">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf Cresceptron: a self-organizing neural network which grows adaptively]," ''Proc. International Joint Conference on Neural Networks'', Baltimore, Maryland, vol I, pp. 576-581, June, 1992.</ref><ref name="Weng1993">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf Learning recognition and segmentation of 3-D objects from 2-D images]," ''Proc. 4th International Conf. Computer Vision'', Berlin, Germany, pp. 121-128, May, 1993.</ref><ref name="Weng1997">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCV.pdf Learning recognition and segmentation using the Cresceptron]," ''International Journal of Computer Vision'', vol. 25, no. 2, pp. 105-139, Nov. 1997.</ref> a method for performing 3-D object recognition in cluttered scenes. Because it directly used natural images, Cresceptron started the beginning of general-purpose visual learning for natural 3D worlds. Cresceptron is a cascade of layers similar to Neocognitron. But while Neocognitron required a human programmer to hand-merge features, Cresceptron learned an open number of features in each layer without supervision, where each feature is represented by a [[Convolution|convolution kernel]]. Cresceptron segmented each learned object from a cluttered scene through back-analysis through the network. [[Max pooling]], now often adopted by deep neural networks (e.g. [[ImageNet]] tests), was first used in Cresceptron to reduce the position resolution by a factor of (2x2) to 1 through the cascade for better generalization.

By 1991 such systems were used for recognizing isolated 2-D hand-written digits, while recognizing 3-D objects was done by matching 2-D images with a handcrafted 3-D object model. Weng et al. suggested that a human brain does not use a monolithic 3-D object model and in 1992 they published Cresceptron, a method for performing 3-D object recognition in cluttered scenes. Because it directly used natural images, Cresceptron started the beginning of general-purpose visual learning for natural 3D worlds. Cresceptron is a cascade of layers similar to Neocognitron. But while Neocognitron required a human programmer to hand-merge features, Cresceptron learned an open number of features in each layer without supervision, where each feature is represented by a convolution kernel. Cresceptron segmented each learned object from a cluttered scene through back-analysis through the network. Max pooling, now often adopted by deep neural networks (e.g. ImageNet tests), was first used in Cresceptron to reduce the position resolution by a factor of (2x2) to 1 through the cascade for better generalization.

到1991年，这种系统被用于识别孤立的二维手写数字，同时识别三维物体是通过将二维图像与手工制作的三维物体模型进行匹配来完成的。翁等人。1992年，他们发表了 crescetron，这是一种在凌乱的场景中进行三维物体识别的方法。因为它直接使用自然图像，crescetron 开始了自然3D 世界的通用视觉学习。甲酚管是一个层叠的类似于 Neocognitron。但是当 Neocognitron 需要一个人类程序员手工合并特性时，Cresceptron 在没有监督的情况下学习了每一层的开放数量的特性，其中每一个特性都由一个卷积内核来表示。通过网络反向分析，从混乱的场景中分割出每个学习对象。最大池，现在通常采用的深度神经网络(例如:。Imagenet 测试) ，首先在 Cresceptron 使用，通过级联将位置分辨率降低一个因子(2x2)到1，以便更好地泛化。

In 1994, André de Carvalho, together with Mike Fairhurst and David Bisset, published experimental results of a multi-layer boolean neural network, also known as a weightless neural network, composed of a 3-layers self-organising feature extraction neural network module (SOFT) followed by a multi-layer classification neural network module (GSN), which were independently trained. Each layer in the feature extraction module extracted features with growing complexity regarding the previous layer.<ref>{{Cite journal |title=An integrated Boolean neural network for pattern classification |journal=Pattern Recognition Letters |date=1994-08-08 |pages=807–813 |volume=15 |issue=8 |doi=10.1016/0167-8655(94)90009-4 |first=Andre C. L. F. |last1=de Carvalho |first2 = Mike C. |last2=Fairhurst |first3=David |last3 = Bisset}}</ref>

In 1994, André de Carvalho, together with Mike Fairhurst and David Bisset, published experimental results of a multi-layer boolean neural network, also known as a weightless neural network, composed of a 3-layers self-organising feature extraction neural network module (SOFT) followed by a multi-layer classification neural network module (GSN), which were independently trained. Each layer in the feature extraction module extracted features with growing complexity regarding the previous layer.

1994年，andr de Carvalho 与 Mike Fairhurst 和 David Bisset 一起发表了一个多层布尔神经网络(也称为无重力神经网络)的实验结果，该网络由一个三层自组织特征提取神经网络模块(SOFT)和一个多层分类神经网络模块(GSN)组成，后者经过独立训练。特征提取模块中的每一层对于前一层提取的特征复杂度越来越高。

In 1995, [[Brendan Frey]] demonstrated that it was possible to train (over two days) a network containing six fully connected layers and several hundred hidden units using the [[wake-sleep algorithm]], co-developed with [[Peter Dayan]] and [[Geoffrey Hinton|Hinton]].<ref>{{Cite journal|title = The wake-sleep algorithm for unsupervised neural networks |journal = Science|date = 1995-05-26|pages = 1158–1161|volume = 268|issue = 5214|doi = 10.1126/science.7761831|pmid = 7761831|first = Geoffrey E.|last = Hinton|first2 = Peter|last2 = Dayan|first3 = Brendan J.|last3 = Frey|first4 = Radford|last4 = Neal|bibcode = 1995Sci...268.1158H}}</ref> Many factors contribute to the slow speed, including the [[vanishing gradient problem]] analyzed in 1991 by [[Sepp Hochreiter]].<ref name="HOCH1991">S. Hochreiter., "[http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf Untersuchungen zu dynamischen neuronalen Netzen]," ''Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber'', 1991.</ref><ref name="HOCH2001">{{cite book|chapter-url={{google books |plainurl=y |id=NWOcMVA64aAC}}|title=A Field Guide to Dynamical Recurrent Networks|last=Hochreiter|first=S.|display-authors=etal|date=15 January 2001|publisher=John Wiley & Sons|isbn=978-0-7803-5369-5|location=|pages=|chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies|editor-last2=Kremer|editor-first2=Stefan C.|editor-first1=John F.|editor-last1=Kolen}}</ref>

In 1995, Brendan Frey demonstrated that it was possible to train (over two days) a network containing six fully connected layers and several hundred hidden units using the wake-sleep algorithm, co-developed with Peter Dayan and Hinton. Many factors contribute to the slow speed, including the vanishing gradient problem analyzed in 1991 by Sepp Hochreiter.

1995年，Brendan Frey 证明了使用唤醒-睡眠算法(wake-sleep algorithm)训练一个包含六个完全连接的层和数百个隐藏单元的网络是可能的(超过两天) ，这个算法是与 Peter Dayan 和 Hinton 共同开发的。导致速度慢的因素很多，包括 Sepp Hochreiter 在1991年分析的消失梯度问题。

Simpler models that use task-specific handcrafted features such as [[Gabor filter]]s and [[support vector machine]]s (SVMs) were a popular choice in the 1990s and 2000s, because of [[artificial neural network]]'s (ANN) computational cost and a lack of understanding of how the brain wires its biological networks.

Simpler models that use task-specific handcrafted features such as Gabor filters and support vector machines (SVMs) were a popular choice in the 1990s and 2000s, because of artificial neural network's (ANN) computational cost and a lack of understanding of how the brain wires its biological networks.

由于人工神经网络(ANN)的计算成本以及对大脑如何连接其生物网络缺乏了解，使得使用诸如 Gabor 过滤器和支持向量机(SVMs)等特定任务的简单模型在上世纪九十年代和本世纪初成为一种流行的选择。

Both shallow and deep learning (e.g., recurrent nets) of ANNs have been explored for many years.<ref>{{Cite journal|last=Morgan|first=Nelson|last2=Bourlard |first2=Hervé |last3=Renals |first3=Steve |last4=Cohen |first4=Michael|last5=Franco |first5=Horacio |date=1993-08-01 |title=Hybrid neural network/hidden markov model systems for continuous speech recognition |journal=International Journal of Pattern Recognition and Artificial Intelligence|volume=07|issue=4|pages=899–916|doi=10.1142/s0218001493000455|issn=0218-0014}}</ref><ref name="Robinson1992">{{Cite journal|last=Robinson|first=T.|authorlink=Tony Robinson (speech recognition)|date=1992|title=A real-time recurrent error propagation network word recognition system|url=http://dl.acm.org/citation.cfm?id=1895720|journal=ICASSP|pages=617–620|via=|isbn=9780780305328|series=Icassp'92}}</ref><ref>{{Cite journal|last=Waibel|first=A.|last2=Hanazawa|first2=T.|last3=Hinton|first3=G.|last4=Shikano|first4=K.|last5=Lang|first5=K. J.|date=March 1989|title=Phoneme recognition using time-delay neural networks|journal=IEEE Transactions on Acoustics, Speech, and Signal Processing|volume=37|issue=3|pages=328–339|doi=10.1109/29.21701|issn=0096-3518|hdl=10338.dmlcz/135496|url=http://dml.cz/bitstream/handle/10338.dmlcz/135496/Kybernetika_38-2002-6_2.pdf}}</ref> These methods never outperformed non-uniform internal-handcrafting Gaussian [[mixture model]]/[[Hidden Markov model]] (GMM-HMM) technology based on generative models of speech trained discriminatively.<ref name="Baker2009">{{cite journal | last1 = Baker | first1 = J. | last2 = Deng | first2 = Li | last3 = Glass | first3 = Jim | last4 = Khudanpur | first4 = S. | last5 = Lee | first5 = C.-H. | last6 = Morgan | first6 = N. | last7 = O'Shaughnessy | first7 = D. | year = 2009 | title = Research Developments and Directions in Speech Recognition and Understanding, Part 1 | url= | journal = IEEE Signal Processing Magazine | volume = 26 | issue = 3| pages = 75–80 | doi=10.1109/msp.2009.932166| bibcode = 2009ISPM...26...75B }}</ref> Key difficulties have been analyzed, including gradient diminishing<ref name="HOCH1991" /> and weak temporal correlation structure in neural predictive models.<ref name="Bengio1991">{{Cite web|url=https://www.researchgate.net/publication/41229141|title=Artificial Neural Networks and their Application to Speech/Sequence Recognition|last=Bengio|first=Y.|date=1991|website=|publisher=McGill University Ph.D. thesis|accessdate=}}</ref><ref name="Deng1994">{{cite journal | last1 = Deng | first1 = L. | last2 = Hassanein | first2 = K. | last3 = Elmasry | first3 = M. | year = 1994 | title = Analysis of correlation structure for a neural predictive model with applications to speech recognition | url= | journal = Neural Networks | volume = 7 | issue = 2| pages = 331–339 | doi=10.1016/0893-6080(94)90027-2}}</ref> Additional difficulties were the lack of training data and limited computing power.

Both shallow and deep learning (e.g., recurrent nets) of ANNs have been explored for many years. These methods never outperformed non-uniform internal-handcrafting Gaussian mixture model/Hidden Markov model (GMM-HMM) technology based on generative models of speech trained discriminatively. Key difficulties have been analyzed, including gradient diminishing Additional difficulties were the lack of training data and limited computing power.

人工神经网络的浅学习和深学习(如复发网络)已经探索了很多年。这些方法从未优于基于辨别训练语音生成模型的非均匀内手工高斯混合模型 / 隐马尔可夫模型(GMM-HMM)技术。关键的困难已经分析，包括梯度递减额外的困难是缺乏训练数据和有限的计算能力。

Most [[speech recognition]] researchers moved away from neural nets to pursue generative modeling. An exception was at [[SRI International]] in the late 1990s. Funded by the US government's [[National Security Agency|NSA]] and [[DARPA]], SRI studied deep neural networks in speech and speaker recognition. The speaker recognition team led by [[Larry Heck]] reported significant success with deep neural networks in speech processing in the 1998 [[National Institute of Standards and Technology]] Speaker Recognition evaluation.<ref name="Doddington2000">{{cite journal | last1 = Doddington | first1 = G. | last2 = Przybocki | first2 = M. | last3 = Martin | first3 = A. | last4 = Reynolds | first4 = D. | year = 2000 | title = The NIST speaker recognition evaluation ± Overview, methodology, systems, results, perspective | url= | journal = Speech Communication | volume = 31 | issue = 2| pages = 225–254 | doi=10.1016/S0167-6393(99)00080-1}}</ref> The SRI deep neural network was then deployed in the Nuance Verifier, representing the first major industrial application of deep learning.<ref name="Heck2000">{{cite journal | last1 = Heck | first1 = L. | last2 = Konig | first2 = Y. | last3 = Sonmez | first3 = M. | last4 = Weintraub | first4 = M. | year = 2000 | title = Robustness to Telephone Handset Distortion in Speaker Recognition by Discriminative Feature Design | url= | journal = Speech Communication | volume = 31 | issue = 2| pages = 181–192 | doi=10.1016/s0167-6393(99)00077-1}}</ref>

Most speech recognition researchers moved away from neural nets to pursue generative modeling. An exception was at SRI International in the late 1990s. Funded by the US government's NSA and DARPA, SRI studied deep neural networks in speech and speaker recognition. The speaker recognition team led by Larry Heck reported significant success with deep neural networks in speech processing in the 1998 National Institute of Standards and Technology Speaker Recognition evaluation. The SRI deep neural network was then deployed in the Nuance Verifier, representing the first major industrial application of deep learning.

大多数语音识别研究人员离开了神经网络，转而追求生成模型。上世纪90年代末，斯坦福国际研究所(SRI International)是一个例外。由美国国家安全局和美国国防部高级研究计划局资助，SRI 研究了语音和讲话者识别深层神经网络。由 Larry Heck 领导的讲话者识别研究小组在1998年国家标准和技术研究所的讲话者识别评估中报告了深层神经网络在语音处理方面取得的显著成功。然后，在 Nuance Verifier 中部署了 SRI 深度神经网络，代表了深度学习的第一个主要工业应用。

The principle of elevating "raw" features over hand-crafted optimization was first explored successfully in the architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features in the late 1990s,<ref name="Heck2000" /> showing its superiority over the Mel-Cepstral features that contain stages of fixed transformation from spectrograms. The raw features of speech, [[waveform]]s, later produced excellent larger-scale results.<ref>{{Cite web|url=https://www.researchgate.net/publication/266030526|title=Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR (PDF Download Available)|website=ResearchGate|accessdate=2017-06-14}}</ref>

The principle of elevating "raw" features over hand-crafted optimization was first explored successfully in the architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features in the late 1990s,

提升“原始”特性的原则超过手工优化是第一次探索成功的结构深自动编码器的“原始”光谱图或线性滤波器组功能，在20世纪90年代后期,

Many aspects of speech recognition were taken over by a deep learning method called [[long short-term memory]] (LSTM), a recurrent neural network published by Hochreiter and [[Jürgen Schmidhuber|Schmidhuber]] in 1997.<ref name=":0">{{Cite journal|last=Hochreiter|first=Sepp|last2=Schmidhuber|first2=Jürgen|date=1997-11-01|title=Long Short-Term Memory|journal=Neural Computation|volume=9|issue=8|pages=1735–1780|doi=10.1162/neco.1997.9.8.1735|issn=0899-7667|pmid=9377276|url=https://www.semanticscholar.org/paper/44d2abe2175df8153f465f6c39b68b76a0d40ab9}}</ref> LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks<ref name="SCHIDHUB" /> that require memories of events that happened thousands of discrete time steps before, which is important for speech. In 2003, LSTM started to become competitive with traditional speech recognizers on certain tasks.<ref name="graves2003">{{Cite web|url=Ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf|title=Biologically Plausible Speech Recognition with LSTM Neural Nets|last=Graves|first=Alex|last2=Eck|first2=Douglas|date=2003|website=1st Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland|pages=175–184|last3=Beringer|first3=Nicole|last4=Schmidhuber|first4=Jürgen}}</ref> Later it was combined with connectionist temporal classification (CTC)<ref name=":1">{{Cite journal|last=Graves|first=Alex|last2=Fernández|first2=Santiago|last3=Gomez|first3=Faustino|date=2006|title=Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks|journal=Proceedings of the International Conference on Machine Learning, ICML 2006|pages=369–376|citeseerx=10.1.1.75.6306}}</ref> in stacks of LSTM RNNs.<ref name="fernandez2007keyword">Santiago Fernandez, Alex Graves, and Jürgen Schmidhuber (2007). [https://mediatum.ub.tum.de/doc/1289941/file.pdf An application of recurrent neural networks to discriminative keyword spotting]. Proceedings of ICANN (2), pp. 220–229.</ref> In 2015, Google's speech recognition reportedly experienced a dramatic performance jump of 49% through CTC-trained LSTM, which they made available through [[Google Voice Search]].<ref name="sak2015">{{Cite web|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html|title=Google voice search: faster and more accurate|last=Sak|first=Haşim|last2=Senior|first2=Andrew|date=September 2015|website=|accessdate=|last3=Rao|first3=Kanishka|last4=Beaufays|first4=Françoise|last5=Schalkwyk|first5=Johan}}</ref>

Many aspects of speech recognition were taken over by a deep learning method called long short-term memory (LSTM), a recurrent neural network published by Hochreiter and Schmidhuber in 1997. LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks Later it was combined with connectionist temporal classification (CTC) in stacks of LSTM RNNs. In 2015, Google's speech recognition reportedly experienced a dramatic performance jump of 49% through CTC-trained LSTM, which they made available through Google Voice Search.

语音识别的许多方面被一种叫做长短期记忆的深度学习方法所取代，这是 Hochreiter 和 Schmidhuber 在1997年发表的一篇递归神经网络。Lstm RNNs 避免了梯度消失问题，能够学习“甚深度学习”任务，后来它与连接主义时间分类(CTC)结合在 LSTM RNNs 的堆栈中。2015年，据报道，Google 的语音识别系统通过受过 ctc 训练的语音识别系统(LSTM)获得了49% 的巨大性能提升，这些语音识别系统可以通过 Google 语音搜索获得。

In 2006, publications by [[Geoffrey Hinton|Geoff Hinton]], [[Russ Salakhutdinov|Ruslan Salakhutdinov]], Osindero and [[Yee Whye Teh|Teh]]<ref>{{Cite journal|last=Hinton|first=Geoffrey E.|date=2007-10-01|title=Learning multiple layers of representation|url=http://www.cell.com/trends/cognitive-sciences/abstract/S1364-6613(07)00217-3|journal=Trends in Cognitive Sciences|volume=11|issue=10|pages=428–434|doi=10.1016/j.tics.2007.09.004|issn=1364-6613|pmid=17921042}}</ref>

In 2006, publications by Geoff Hinton, Ruslan Salakhutdinov, Osindero and Teh

2006年，Geoff Hinton，Ruslan Salakhutdinov，Osindero and Teh 出版

<ref name=hinton06>{{Cite journal | last1 = Hinton | first1 = G. E. |authorlink1=Geoff Hinton| last2 = Osindero | first2 = S. | last3 = Teh | first3 = Y. W. | doi = 10.1162/neco.2006.18.7.1527 | title = A Fast Learning Algorithm for Deep Belief Nets | journal = [[Neural Computation (journal)|Neural Computation]]| volume = 18 | issue = 7 | pages = 1527–1554 | year = 2006 | pmid = 16764513| pmc = | url = http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf}}</ref><ref name=bengio2012>{{cite arXiv |last=Bengio |first=Yoshua |author-link=Yoshua Bengio |eprint=1206.5533 |title=Practical recommendations for gradient-based training of deep architectures |class=cs.LG|year=2012 }}</ref> showed how a many-layered [[feedforward neural network]] could be effectively pre-trained one layer at a time, treating each layer in turn as an unsupervised [[restricted Boltzmann machine]], then fine-tuning it using supervised [[backpropagation]].<ref name="HINTON2007">G. E. Hinton., "[http://www.csri.utoronto.ca/~hinton/absps/ticsdraft.pdf Learning multiple layers of representation]," ''Trends in Cognitive Sciences'', 11, pp. 428–434, 2007.</ref> The papers referred to ''learning'' for ''deep belief nets.''

showed how a many-layered feedforward neural network could be effectively pre-trained one layer at a time, treating each layer in turn as an unsupervised restricted Boltzmann machine, then fine-tuning it using supervised backpropagation. The papers referred to learning for deep belief nets.

展示了如何有效地预先训练一个层次的多层前馈神经网络，将每个层次视为无监督的受限玻尔兹曼机，然后使用有监督的反向传播对其进行微调。论文提到了深度信念网的学习。

Deep learning is part of state-of-the-art systems in various disciplines, particularly computer vision and [[automatic speech recognition]] (ASR). Results on commonly used evaluation sets such as [[TIMIT]] (ASR) and [[MNIST database|MNIST]] ([[image classification]]), as well as a range of large-vocabulary speech recognition tasks have steadily improved.<ref name="HintonDengYu2012" /><ref>{{cite journal|url=https://www.microsoft.com/en-us/research/publication/new-types-of-deep-neural-network-learning-for-speech-recognition-and-related-applications-an-overview/|title=New types of deep neural network learning for speech recognition and related applications: An overview|journal=Microsoft Research|first1=Li|last1=Deng|first2=Geoffrey|last2=Hinton|first3=Brian|last3=Kingsbury|date=1 May 2013|via=research.microsoft.com|citeseerx=10.1.1.368.1123}}</ref><ref>{{Cite book |doi=10.1109/icassp.2013.6639345|isbn=978-1-4799-0356-6|chapter=Recent advances in deep learning for speech research at Microsoft|title=2013 IEEE International Conference on Acoustics, Speech and Signal Processing|pages=8604–8608|year=2013|last1=Deng|first1=Li|last2=Li|first2=Jinyu|last3=Huang|first3=Jui-Ting|last4=Yao|first4=Kaisheng|last5=Yu|first5=Dong|last6=Seide|first6=Frank|last7=Seltzer|first7=Michael|last8=Zweig|first8=Geoff|last9=He|first9=Xiaodong|last10=Williams|first10=Jason|last11=Gong|first11=Yifan|last12=Acero|first12=Alex}}</ref> [[Convolutional neural network]]s (CNNs) were superseded for ASR by CTC<ref name=":1" /> for LSTM.<ref name=":0" /><ref name="sak2015" /><ref name="sak2014">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling|last=Sak|first=Hasim|last2=Senior|first2=Andrew|date=2014|website=|accessdate=|last3=Beaufays|first3=Francoise|archive-url=https://web.archive.org/web/20180424203806/https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|archive-date=2018-04-24|url-status=dead}}</ref><ref name="liwu2015">{{cite arxiv |eprint=1410.4281|last1=Li|first1=Xiangang|title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition|last2=Wu|first2=Xihong|class=cs.CL|year=2014}}</ref><ref name="zen2015">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis|last=Zen|first=Heiga|last2=Sak|first2=Hasim|date=2015|website=Google.com|publisher=ICASSP|pages=4470–4474|accessdate=}}</ref><ref name="CNNspeech2013">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion|last=Deng|first=L.|last2=Abdel-Hamid|first2=O.|date=2013|website=Google.com|publisher=ICASSP|accessdate=|last3=Yu|first3=D.}}</ref><ref name=":2">{{Cite book |doi=10.1109/icassp.2013.6639347|isbn=978-1-4799-0356-6|chapter=Deep convolutional neural networks for LVCSR|title=2013 IEEE International Conference on Acoustics, Speech and Signal Processing|pages=8614–8618|year=2013|last1=Sainath|first1=Tara N.|last2=Mohamed|first2=Abdel-Rahman|last3=Kingsbury|first3=Brian|last4=Ramabhadran|first4=Bhuvana}}</ref> but are more successful in computer vision.

Deep learning is part of state-of-the-art systems in various disciplines, particularly computer vision and automatic speech recognition (ASR). Results on commonly used evaluation sets such as TIMIT (ASR) and MNIST (image classification), as well as a range of large-vocabulary speech recognition tasks have steadily improved. Convolutional neural networks (CNNs) were superseded for ASR by CTC but are more successful in computer vision.

深度学习是各学科最先进系统的一部分，特别是计算机视觉和自动语音识别(ASR)。在常用的评价集(如 TIMIT (ASR)和 MNIST (图像分类) ，以及一系列大词汇量语音识别任务的结果都有了稳步的提高。卷积神经网络在计算机视觉领域的应用较为成功，但被 CTC 所取代。

The impact of deep learning in industry began in the early 2000s, when CNNs already processed an estimated 10% to 20% of all the checks written in the US, according to Yann LeCun.<ref name="lecun2016slides">[[Yann LeCun]] (2016). Slides on Deep Learning [https://indico.cern.ch/event/510372/ Online]</ref> Industrial applications of deep learning to large-scale speech recognition started around 2010.

The impact of deep learning in industry began in the early 2000s, when CNNs already processed an estimated 10% to 20% of all the checks written in the US, according to Yann LeCun. Industrial applications of deep learning to large-scale speech recognition started around 2010.

据 Yann LeCun 说，深度学习对工业的影响始于21世纪初，当时 cnn 已经处理了美国所有签发的支票中的10% 到20% 。深度学习在大规模语音识别中的工业应用始于2010年左右。

The 2009 NIPS Workshop on Deep Learning for Speech Recognition<ref name="NIPS2009" /> was motivated by the limitations of deep generative models of speech, and the possibility that given more capable hardware and large-scale data sets that deep neural nets (DNN) might become practical. It was believed that pre-training DNNs using generative models of deep belief nets (DBN) would overcome the main difficulties of neural nets.<ref name="HintonKeynoteICASSP2013" /> However, it was discovered that replacing pre-training with large amounts of training data for straightforward backpropagation when using DNNs with large, context-dependent output layers produced error rates dramatically lower than then-state-of-the-art Gaussian mixture model (GMM)/Hidden Markov Model (HMM) and also than more-advanced generative model-based systems.<ref name="HintonDengYu2012">{{cite journal | last1 = Hinton | first1 = G. | last2 = Deng | first2 = L. | last3 = Yu | first3 = D. | last4 = Dahl | first4 = G. | last5 = Mohamed | first5 = A. | last6 = Jaitly | first6 = N. | last7 = Senior | first7 = A. | last8 = Vanhoucke | first8 = V. | last9 = Nguyen | first9 = P. | last10 = Sainath | first10 = T. | last11 = Kingsbury | first11 = B. | year = 2012 | title = Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups| url= | journal = IEEE Signal Processing Magazine | volume = 29 | issue = 6| pages = 82–97 | doi=10.1109/msp.2012.2205597}}</ref><ref name="patent2011">D. Yu, L. Deng, G. Li, and F. Seide (2011). "Discriminative pretraining of deep neural networks," U.S. Patent Filing.</ref> The nature of the recognition errors produced by the two types of systems was characteristically different,<ref name="ReferenceICASSP2013" /><ref name="NIPS2009">NIPS Workshop: Deep Learning for Speech Recognition and Related Applications, Whistler, BC, Canada, Dec. 2009 (Organizers: Li Deng, Geoff Hinton, D. Yu).</ref> offering technical insights into how to integrate deep learning into the existing highly efficient, run-time speech decoding system deployed by all major speech recognition systems.<ref name="BOOK2014" /><ref name="ReferenceA">{{cite book|last2=Deng|first2=L.|date=2014|title=Automatic Speech Recognition: A Deep Learning Approach (Publisher: Springer)|url={{google books |plainurl=y |id=rUBTBQAAQBAJ}}|pages=|isbn=978-1-4471-5779-3|via=|last1=Yu|first1=D.}}</ref><ref>{{cite web|title=Deng receives prestigious IEEE Technical Achievement Award - Microsoft Research|url=https://www.microsoft.com/en-us/research/blog/deng-receives-prestigious-ieee-technical-achievement-award/|website=Microsoft Research|date=3 December 2015}}</ref> Analysis around 2009–2010, contrasted the GMM (and other generative speech models) vs. DNN models, stimulated early industrial investment in deep learning for speech recognition,<ref name="ReferenceICASSP2013" /><ref name="NIPS2009" /> eventually leading to pervasive and dominant use in that industry. That analysis was done with comparable performance (less than 1.5% in error rate) between discriminative DNNs and generative models.<ref name="HintonDengYu2012" /><ref name="ReferenceICASSP2013">{{cite journal|last2=Hinton|first2=G.|last3=Kingsbury|first3=B.|date=2013|title=New types of deep neural network learning for speech recognition and related applications: An overview (ICASSP)|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ICASSP-2013-DengHintonKingsbury-revised.pdf|journal=|pages=|via=|last1=Deng|first1=L.}}</ref><ref name="HintonKeynoteICASSP2013">Keynote talk: Recent Developments in Deep Neural Networks. ICASSP, 2013 (by Geoff Hinton).</ref><ref name="interspeech2014Keynote">{{Cite web|url=https://www.superlectures.com/interspeech2014/downloadFile?id=6&type=slides&filename=achievements-and-challenges-of-deep-learning-from-speech-analysis-and-recognition-to-language-and-multimodal-processing|title=Keynote talk: 'Achievements and Challenges of Deep Learning - From Speech Analysis and Recognition To Language and Multimodal Processing'|last=Li|first=Deng|date=September 2014|website=Interspeech|accessdate=}}</ref>

The 2009 NIPS Workshop on Deep Learning for Speech Recognition The nature of the recognition errors produced by the two types of systems was characteristically different, offering technical insights into how to integrate deep learning into the existing highly efficient, run-time speech decoding system deployed by all major speech recognition systems. Analysis around 2009–2010, contrasted the GMM (and other generative speech models) vs. DNN models, stimulated early industrial investment in deep learning for speech recognition,

2009年 NIPS 语音识别深度学习工作坊两种系统所产生的识别错误的性质有所不同，为如何将深度学习融入现有的高效率的运行时语音解码系统提供了技术上的见解。2009-2010年的分析，对比了 GMM (和其他生成语音模型)和 DNN 模型，刺激了早期工业投资在语音识别的深度学习,

In 2010, researchers extended deep learning from TIMIT to large vocabulary speech recognition, by adopting large output layers of the DNN based on context-dependent HMM states constructed by [[decision tree]]s.<ref name="Roles2010">{{cite journal|last1=Yu|first1=D.|last2=Deng|first2=L.|date=2010|title=Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition|url=https://www.microsoft.com/en-us/research/publication/roles-of-pre-training-and-fine-tuning-in-context-dependent-dbn-hmms-for-real-world-speech-recognition/|journal=NIPS Workshop on Deep Learning and Unsupervised Feature Learning|pages=|via=}}</ref><ref>{{Cite journal|last=Seide|first=F.|last2=Li|first2=G.|last3=Yu|first3=D.|date=2011|title=Conversational speech transcription using context-dependent deep neural networks|url=https://www.microsoft.com/en-us/research/publication/conversational-speech-transcription-using-context-dependent-deep-neural-networks|journal=Interspeech|pages=|via=}}</ref><ref>{{Cite journal|last=Deng|first=Li|last2=Li|first2=Jinyu|last3=Huang|first3=Jui-Ting|last4=Yao|first4=Kaisheng|last5=Yu|first5=Dong|last6=Seide|first6=Frank|last7=Seltzer|first7=Mike|last8=Zweig|first8=Geoff|last9=He|first9=Xiaodong|date=2013-05-01|title=Recent Advances in Deep Learning for Speech Research at Microsoft|url=https://www.microsoft.com/en-us/research/publication/recent-advances-in-deep-learning-for-speech-research-at-microsoft/|journal=Microsoft Research}}</ref><ref name="ReferenceA" />

In 2010, researchers extended deep learning from TIMIT to large vocabulary speech recognition, by adopting large output layers of the DNN based on context-dependent HMM states constructed by decision trees.

2010年，研究人员将深度学习从 TIMIT 扩展到大词汇量语音识别，采用了基于决策树构建的上下文相关 HMM 状态的大输出层 DNN。

Advances in hardware have enabled renewed interest in deep learning. In 2009, [[Nvidia]] was involved in what was called the “big bang” of deep learning, “as deep-learning neural networks were trained with Nvidia [[graphics processing unit]]s (GPUs).”<ref>{{cite web|url=https://venturebeat.com/2016/04/05/nvidia-ceo-bets-big-on-deep-learning-and-vr/|title=Nvidia CEO bets big on deep learning and VR|date=April 5, 2016|publisher=[[Venture Beat]]}}</ref> That year, [[Google Brain]] used Nvidia GPUs to create capable DNNs. While there, [[Andrew Ng]] determined that GPUs could increase the speed of deep-learning systems by about 100 times.<ref>{{cite news|url=https://www.economist.com/news/special-report/21700756-artificial-intelligence-boom-based-old-idea-modern-twist-not|title=From not working to neural networking|newspaper=[[The Economist]]}}</ref> In particular, GPUs are well-suited for the matrix/vector computations involved in machine learning.<ref name="jung2004">{{cite journal | last1 = Oh | first1 = K.-S. | last2 = Jung | first2 = K. | year = 2004 | title = GPU implementation of neural networks | url= | journal = Pattern Recognition | volume = 37 | issue = 6| pages = 1311–1314 | doi=10.1016/j.patcog.2004.01.013}}</ref><ref>"[https://www.academia.edu/40135801 A Survey of Techniques for Optimizing Deep Learning on GPUs]", S. Mittal and S. Vaishay, Journal of Systems Architecture, 2019</ref><ref name="chellapilla2006">Chellapilla, K., Puri, S., and Simard, P. (2006). High performance convolutional neural networks for document processing. International Workshop on Frontiers in Handwriting Recognition.</ref> GPUs speed up training algorithms by orders of magnitude, reducing running times from weeks to days.<ref name=":3">{{Cite journal|last=Cireşan|first=Dan Claudiu|last2=Meier|first2=Ueli|last3=Gambardella|first3=Luca Maria|last4=Schmidhuber|first4=Jürgen|date=2010-09-21|title=Deep, Big, Simple Neural Nets for Handwritten Digit Recognition|journal=Neural Computation|volume=22|issue=12|pages=3207–3220|doi=10.1162/neco_a_00052|pmid=20858131|issn=0899-7667|arxiv=1003.0358}}</ref><ref>{{Cite journal|last=Raina|first=Rajat|last2=Madhavan|first2=Anand|last3=Ng|first3=Andrew Y.|date=2009|title=Large-scale Deep Unsupervised Learning Using Graphics Processors|journal=Proceedings of the 26th Annual International Conference on Machine Learning|series=ICML '09|location=New York, NY, USA|publisher=ACM|pages=873–880|doi=10.1145/1553374.1553486|isbn=9781605585161|citeseerx=10.1.1.154.372|url=https://www.semanticscholar.org/paper/e337c5e4c23999c36f64bcb33ebe6b284e1bcbf1}}</ref> Further, specialized hardware and algorithm optimizations can be used for efficient processing of deep learning models.<ref name="sze2017">{{cite arXiv

Advances in hardware have enabled renewed interest in deep learning. In 2009, Nvidia was involved in what was called the “big bang” of deep learning, “as deep-learning neural networks were trained with Nvidia graphics processing units (GPUs).” That year, Google Brain used Nvidia GPUs to create capable DNNs. While there, Andrew Ng determined that GPUs could increase the speed of deep-learning systems by about 100 times. In particular, GPUs are well-suited for the matrix/vector computations involved in machine learning. GPUs speed up training algorithms by orders of magnitude, reducing running times from weeks to days. Further, specialized hardware and algorithm optimizations can be used for efficient processing of deep learning models.<ref name="sze2017">{{cite arXiv

硬件的进步使人们对深度学习重新产生了兴趣。2009年，Nvidia 参与了所谓的深度学习的“大爆炸” ，“因为深度学习神经网络是用 Nvidia 的图形处理单元(gpu)训练的。” 那一年，谷歌大脑使用 Nvidia gpu 创建了功能强大的 dnn。在那里，Andrew Ng 决定 gpu 可以将深度学习系统的速度提高大约100倍。特别是 gpu 非常适合机器学习中的矩阵 / 向量计算。Gpu 通过数量级加速训练算法，将运行时间从几周减少到几天。此外，专门的硬件和算法优化可用于深度学习模型的高效处理

|title= Efficient Processing of Deep Neural Networks: A Tutorial and Survey

|title= Efficient Processing of Deep Neural Networks: A Tutorial and Survey

深层神经网络的有效处理: 一个教程和调查

|last1=Sze |first1=Vivienne

|last1=Sze |first1=Vivienne

|last1=Sze |first1=Vivienne

|last2=Chen |first2=Yu-Hsin

|last2=Chen |first2=Yu-Hsin

最后二陈最初二余新

|last3=Yang |first3=Tien-Ju

|last3=Yang |first3=Tien-Ju

最后3个杨先生3个天菊

|last4=Emer |first4=Joel

|last4=Emer |first4=Joel

4 | last 4 Emer | first4 Joel

|eprint=1703.09039

|eprint=1703.09039

1703.09039

|year=2017

|year=2017

2017年

|class=cs.CV }}</ref>

|class=cs.CV }}</ref>

| class cs.CV } / ref

=== Deep learning revolution ===

[[File:AI-ML-DL.png|thumb|How deep learning is a subset of machine learning and how machine learning is a subset of artificial intelligence (AI).]]

How deep learning is a subset of machine learning and how machine learning is a subset of artificial intelligence (AI).

深度学习是机器学习的一个子集，机器学习是人工智能的一个子集。

In 2012, a team led by George E. Dahl won the "Merck Molecular Activity Challenge" using multi-task deep neural networks to predict the [[biomolecular target]] of one drug.<ref name="MERCK2012">{{cite web|url=https://www.kaggle.com/c/MerckActivity/details/winners|title=Announcement of the winners of the Merck Molecular Activity Challenge}}</ref><ref name=":5">{{Cite web|url=http://www.datascienceassn.org/content/multi-task-neural-networks-qsar-predictions|title=Multi-task Neural Networks for QSAR Predictions {{!}} Data Science Association|website=www.datascienceassn.org|accessdate=2017-06-14}}</ref> In 2014, Hochreiter's group used deep learning to detect off-target and toxic effects of environmental chemicals in nutrients, household products and drugs and won the "Tox21 Data Challenge" of [[NIH]], [[FDA]] and [[National Center for Advancing Translational Sciences|NCATS]].<ref name="TOX21">"Toxicology in the 21st century Data Challenge"</ref><ref name="TOX21Data">{{cite web|url=https://tripod.nih.gov/tox21/challenge/leaderboard.jsp|title=NCATS Announces Tox21 Data Challenge Winners}}</ref><ref name=":11">{{cite web|url=http://www.ncats.nih.gov/news-and-events/features/tox21-challenge-winners.html|title=Archived copy|archiveurl=https://web.archive.org/web/20150228225709/http://www.ncats.nih.gov/news-and-events/features/tox21-challenge-winners.html|archivedate=2015-02-28|url-status=dead|accessdate=2015-03-05}}</ref>

In 2012, a team led by George E. Dahl won the "Merck Molecular Activity Challenge" using multi-task deep neural networks to predict the biomolecular target of one drug. In 2014, Hochreiter's group used deep learning to detect off-target and toxic effects of environmental chemicals in nutrients, household products and drugs and won the "Tox21 Data Challenge" of NIH, FDA and NCATS.

2012年，由乔治 · e · 达尔领导的团队赢得了“默克分子活性挑战” ，他们使用多任务深度神经网络来预测一种药物的生物分子靶点。2014年，Hochreiter 的团队利用深度学习技术检测营养物质、家用产品和药物中环境化学物质的偏离目标和毒性影响，并赢得了 NIH、 FDA 和 NCATS 的“毒素21数据挑战”。

Significant additional impacts in image or object recognition were felt from 2011 to 2012. Although CNNs trained by backpropagation had been around for decades, and GPU implementations of NNs for years, including CNNs, fast implementations of CNNs with max-pooling on GPUs in the style of Ciresan and colleagues were needed to progress on computer vision.<ref name="jung2004" /><ref name="chellapilla2006" /><ref name="LECUN1989" /><ref name=":6">{{Cite journal|last=Ciresan|first=D. C.|last2=Meier|first2=U.|last3=Masci|first3=J.|last4=Gambardella|first4=L. M.|last5=Schmidhuber|first5=J.|date=2011|title=Flexible, High Performance Convolutional Neural Networks for Image Classification|url=http://ijcai.org/papers11/Papers/IJCAI11-210.pdf|journal=International Joint Conference on Artificial Intelligence|pages=|doi=10.5591/978-1-57735-516-8/ijcai11-210|via=}}</ref><ref name="SCHIDHUB" /> In 2011, this approach achieved for the first time superhuman performance in a visual pattern recognition contest. Also in 2011, it won the ICDAR Chinese handwriting contest, and in May 2012, it won the ISBI image segmentation contest.<ref name=":8">{{Cite book|url=http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf|title=Advances in Neural Information Processing Systems 25|last=Ciresan|first=Dan|last2=Giusti|first2=Alessandro|last3=Gambardella|first3=Luca M.|last4=Schmidhuber|first4=Juergen|date=2012|publisher=Curran Associates, Inc.|editor-last=Pereira|editor-first=F.|pages=2843–2851|editor-last2=Burges|editor-first2=C. J. C.|editor-last3=Bottou|editor-first3=L.|editor-last4=Weinberger|editor-first4=K. Q.}}</ref> Until 2011, CNNs did not play a major role at computer vision conferences, but in June 2012, a paper by Ciresan et al. at the leading conference CVPR<ref name=":9" /> showed how max-pooling CNNs on GPU can dramatically improve many vision benchmark records. In October 2012, a similar system by Krizhevsky et al.<ref name="krizhevsky2012" /> won the large-scale [[ImageNet competition]] by a significant margin over shallow machine learning methods. In November 2012, Ciresan et al.'s system also won the ICPR contest on analysis of large medical images for cancer detection, and in the following year also the MICCAI Grand Challenge on the same topic.<ref name="ciresan2013miccai">{{Cite journal|last=Ciresan|first=D.|last2=Giusti|first2=A.|last3=Gambardella|first3=L.M.|last4=Schmidhuber|first4=J.|date=2013|title=Mitosis Detection in Breast Cancer Histology Images using Deep Neural Networks|journal=Proceedings MICCAI|volume=7908|issue=Pt 2|pages=411–418|doi=10.1007/978-3-642-40763-5_51|pmid=24579167|series=Lecture Notes in Computer Science|isbn=978-3-642-38708-1}}</ref> In 2013 and 2014, the error rate on the ImageNet task using deep learning was further reduced, following a similar trend in large-scale speech recognition. The [[Stephen Wolfram|Wolfram]] Image Identification project publicized these improvements.<ref>{{Cite web|url=https://www.imageidentify.com/|title=The Wolfram Language Image Identification Project|website=www.imageidentify.com|accessdate=2017-03-22}}</ref>

Significant additional impacts in image or object recognition were felt from 2011 to 2012. Although CNNs trained by backpropagation had been around for decades, and GPU implementations of NNs for years, including CNNs, fast implementations of CNNs with max-pooling on GPUs in the style of Ciresan and colleagues were needed to progress on computer vision. Until 2011, CNNs did not play a major role at computer vision conferences, but in June 2012, a paper by Ciresan et al. at the leading conference CVPR In 2013 and 2014, the error rate on the ImageNet task using deep learning was further reduced, following a similar trend in large-scale speech recognition. The Wolfram Image Identification project publicized these improvements.

2011年至2012年，感受到了图像或物体识别方面的重大额外影响。虽然经过反向传播训练的 cnn 已经存在了几十年，而且 GPU 实现 NNs (包括 cnn)也已经有几年了，但要在计算机视觉方面取得进展，还需要在 GPU 上快速实现具有 Ciresan 风格的最大池的 cnn。直到2011年，cnn 还没有在计算机视觉会议上扮演重要角色，但在2012年6月，Ciresan 等人发表了一篇论文。在领先的会议 CVPR 在2013年和2014年，使用深度学习的 ImageNet 任务的错误率进一步降低，在大规模语音识别中也有类似的趋势。Wolfram Image Identification 项目公布了这些改进。

Image classification was then extended to the more challenging task of [[Automatic image annotation|generating descriptions]] (captions) for images, often as a combination of CNNs and LSTMs.<ref name="1411.4555">{{cite arxiv |eprint=1411.4555|last1=Vinyals|first1=Oriol|title=Show and Tell: A Neural Image Caption Generator|last2=Toshev|first2=Alexander|last3=Bengio|first3=Samy|last4=Erhan|first4=Dumitru|class=cs.CV|year=2014}}.</ref><ref name="1411.4952">{{cite arxiv |eprint=1411.4952|last1=Fang|first1=Hao|title=From Captions to Visual Concepts and Back|last2=Gupta|first2=Saurabh|last3=Iandola|first3=Forrest|last4=Srivastava|first4=Rupesh|last5=Deng|first5=Li|last6=Dollár|first6=Piotr|last7=Gao|first7=Jianfeng|last8=He|first8=Xiaodong|last9=Mitchell|first9=Margaret|last10=Platt|first10=John C|last11=Lawrence Zitnick|first11=C|last12=Zweig|first12=Geoffrey|class=cs.CV|year=2014}}.</ref><ref name="1411.2539">{{cite arxiv |eprint=1411.2539|last1=Kiros|first1=Ryan|title=Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models|last2=Salakhutdinov|first2=Ruslan|last3=Zemel|first3=Richard S|class=cs.LG|year=2014}}.</ref><ref>{{Cite journal|last=Zhong|first=Sheng-hua|last2=Liu|first2=Yan|last3=Liu|first3=Yang|date=2011|title=Bilinear Deep Learning for Image Classification|journal=Proceedings of the 19th ACM International Conference on Multimedia|series=MM '11|location=New York, NY, USA|publisher=ACM|pages=343–352|doi=10.1145/2072298.2072344|isbn=9781450306164|url=https://www.semanticscholar.org/paper/e1bbfb2c7ef74445b4fad9199b727464129df582}}</ref>

Image classification was then extended to the more challenging task of generating descriptions (captions) for images, often as a combination of CNNs and LSTMs.

然后将图像分类扩展到更具挑战性的任务，即为图像生成描述(字幕) ，通常是将 CNNs 和 LSTMs 相结合。

Some researchers assess that the October 2012 ImageNet victory anchored the start of a "deep learning revolution" that has transformed the AI industry.<ref>{{cite news|title=Why Deep Learning Is Suddenly Changing Your Life|url=http://fortune.com/ai-artificial-intelligence-deep-machine-learning/|accessdate=13 April 2018|work=Fortune|date=2016}}</ref>

Some researchers assess that the October 2012 ImageNet victory anchored the start of a "deep learning revolution" that has transformed the AI industry.

一些研究人员认为，2012年10月 ImageNet 的胜利标志着一场“深度学习革命”的开始，这场革命已经改变了人工智能产业。

In March 2019, [[Yoshua Bengio]], [[Geoffrey Hinton]] and [[Yann LeCun]] were awarded the [[Turing Award]] for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.

In March 2019, Yoshua Bengio, Geoffrey Hinton and Yann LeCun were awarded the Turing Award for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.

2019年3月，Yoshua Bengio，Geoffrey Hinton 和 Yann LeCun 被授予图灵奖，以表彰深层神经网络在概念和工程方面的突破，这些突破使得深层神经网络成为计算的关键组成部分。

== Neural networks ==

=== Artificial neural networks ===

{{Main|Artificial neural network}}

'''Artificial neural networks''' ('''ANNs''') or '''[[Connectionism|connectionist]] systems''' are computing systems inspired by the [[biological neural network]]s that constitute animal brains. Such systems learn (progressively improve their ability) to do tasks by considering examples, generally without task-specific programming. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually [[Labeled data|labeled]] as "cat" or "no cat" and using the analytic results to identify cats in other images. They have found most use in applications difficult to express with a traditional computer algorithm using [[rule-based programming]].

Artificial neural networks (ANNs) or connectionist systems are computing systems inspired by the biological neural networks that constitute animal brains. Such systems learn (progressively improve their ability) to do tasks by considering examples, generally without task-specific programming. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the analytic results to identify cats in other images. They have found most use in applications difficult to express with a traditional computer algorithm using rule-based programming.

人工神经网络(ann)或连接主义系统是由构成动物大脑的生物神经网络启发的计算系统。这样的系统通过考虑例子来学习(逐步提高他们的能力)完成任务，通常不需要特定任务的编程。例如，在图像识别方面，他们可以通过分析手动标记为“猫”或“没有猫”的例子图像，并利用分析结果来识别其他图像中的猫，从而学会识别包含猫的图像。他们发现大多数应用程序难以表达与传统的计算机算法使用基于规则的编程。

An ANN is based on a collection of connected units called [[artificial neuron]]s, (analogous to biological neurons in a [[Brain|biological brain]]). Each connection ([[synapse]]) between neurons can transmit a signal to another neuron. The receiving (postsynaptic) neuron can process the signal(s) and then signal downstream neurons connected to it. Neurons may have state, generally represented by [[real numbers]], typically between 0 and 1. Neurons and synapses may also have a weight that varies as learning proceeds, which can increase or decrease the strength of the signal that it sends downstream.

An ANN is based on a collection of connected units called artificial neurons, (analogous to biological neurons in a biological brain). Each connection (synapse) between neurons can transmit a signal to another neuron. The receiving (postsynaptic) neuron can process the signal(s) and then signal downstream neurons connected to it. Neurons may have state, generally represented by real numbers, typically between 0 and 1. Neurons and synapses may also have a weight that varies as learning proceeds, which can increase or decrease the strength of the signal that it sends downstream.

人工神经网络是基于一组被称为人工神经元的连接单元(类似于生物大脑中的生物神经元)。神经元之间的每一个连接(突触)都可以将信号传递给另一个神经元。接收(突触后)神经元可以处理信号，然后向连接到它的下游神经元发出信号。神经元可能有状态，一般用实数表示，通常介于0和1之间。神经元和突触的重量也可能随着学习的进行而变化，这可能会增加或减少下游信号的强度。

Typically, neurons are organized in layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple times.

Typically, neurons are organized in layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple times.

通常情况下，神经元是分层组织的。不同的层可以对其输入执行不同类型的转换。信号从第一个(输入)传输到最后一个(输出)层，可能是在多次遍历这些层之后。

The original goal of the neural network approach was to solve problems in the same way that a human brain would. Over time, attention focused on matching specific mental abilities, leading to deviations from biology such as backpropagation, or passing information in the reverse direction and adjusting the network to reflect that information.

The original goal of the neural network approach was to solve problems in the same way that a human brain would. Over time, attention focused on matching specific mental abilities, leading to deviations from biology such as backpropagation, or passing information in the reverse direction and adjusting the network to reflect that information.

神经网络方法的最初目标是用人类大脑解决问题的同样方式。随着时间的推移，注意力集中在匹配特定的心理能力上，导致偏离生物学，如反向传播，或以相反的方向传递信息，并调整网络来反映这些信息。

Neural networks have been used on a variety of tasks, including computer vision, [[speech recognition]], [[machine translation]], [[social network]] filtering, [[general game playing|playing board and video games]] and medical diagnosis.

Neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis.

神经网络已被用于各种任务，包括计算机视觉、语音识别、机器翻译、社会网络过滤、玩棋盘和视频游戏以及医疗诊断。

As of 2017, neural networks typically have a few thousand to a few million units and millions of connections. Despite this number being several order of magnitude less than the number of neurons on a human brain, these networks can perform many tasks at a level beyond that of humans (e.g., recognizing faces, playing "Go"<ref>{{Cite journal|last=Silver|first=David|last2=Huang|first2=Aja|last3=Maddison|first3=Chris J.|last4=Guez|first4=Arthur|last5=Sifre|first5=Laurent|last6=Driessche|first6=George van den|last7=Schrittwieser|first7=Julian|last8=Antonoglou|first8=Ioannis|last9=Panneershelvam|first9=Veda|date=January 2016|title=Mastering the game of Go with deep neural networks and tree search|journal=Nature|volume=529|issue=7587|pages=484–489|doi=10.1038/nature16961|issn=1476-4687|pmid=26819042|bibcode=2016Natur.529..484S|url=https://www.semanticscholar.org/paper/846aedd869a00c09b40f1f1f35673cb22bc87490}}</ref> ).

As of 2017, neural networks typically have a few thousand to a few million units and millions of connections. Despite this number being several order of magnitude less than the number of neurons on a human brain, these networks can perform many tasks at a level beyond that of humans (e.g., recognizing faces, playing "Go" ).

截至2017年，神经网络通常有几千到几百万个单元和数百万个连接。尽管这个数字比人类大脑的神经元数量少几个数量级，但这些神经网络可以执行许多超出人类水平的任务(例如，识别面孔，玩“围棋”)。

=== Deep neural networks ===

{{technical|section|date=July 2016}}

A deep neural network (DNN) is an [[artificial neural network]] (ANN) with multiple layers between the input and output layers.<ref name="BENGIODEEP" /><ref name="SCHIDHUB" /> The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a [[linear relationship]] or a non-linear relationship. The network moves through the layers calculating the probability of each output. For example, a DNN that is trained to recognize dog breeds will go over the given image and calculate the probability that the dog in the image is a certain breed. The user can review the results and select which probabilities the network should display (above a certain threshold, etc.) and return the proposed label. Each mathematical manipulation as such is considered a layer, and complex DNN have many layers, hence the name "deep" networks.

A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship. The network moves through the layers calculating the probability of each output. For example, a DNN that is trained to recognize dog breeds will go over the given image and calculate the probability that the dog in the image is a certain breed. The user can review the results and select which probabilities the network should display (above a certain threshold, etc.) and return the proposed label. Each mathematical manipulation as such is considered a layer, and complex DNN have many layers, hence the name "deep" networks.

深层神经网络(DNN)是一种在输入层和输出层之间具有多层结构的人工神经网络(ANN)。Dnn 找到了正确的数学操作，将输入转化为输出，无论是线性关系还是非线性关系。网络通过层移动计算每个输出的概率。例如，一个经过训练能够识别狗的品种的 DNN 会检查给定的图像并计算图像中的狗是特定品种的概率。用户可以查看结果并选择网络应该显示的概率(超过某个阈值，等等)并返回建议的标签。每个数学操作本身被认为是一个层，而复杂的 DNN 有许多层，因此被称为“深层”网络。

DNNs can model complex non-linear relationships. DNN architectures generate compositional models where the object is expressed as a layered composition of [[Primitive data type|primitives]].<ref>{{Cite journal|last=Szegedy|first=Christian|last2=Toshev|first2=Alexander|last3=Erhan|first3=Dumitru|date=2013|title=Deep neural networks for object detection|url=https://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection|journal=Advances in Neural Information Processing Systems|pages=2553–2561|via=}}</ref> The extra layers enable composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network.<ref name="BENGIODEEP" />

DNNs can model complex non-linear relationships. DNN architectures generate compositional models where the object is expressed as a layered composition of primitives. The extra layers enable composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network.

Dnn 可以模拟复杂的非线性关系。Dnn 架构生成组合模型，其中对象表示为原语的分层组合。这些额外的层可以组合较低层的特征，可以用较少的单元对复杂的数据进行建模，而不必使用性能相似的浅层网络。

Deep architectures include many variants of a few basic approaches. Each architecture has found success in specific domains. It is not always possible to compare the performance of multiple architectures, unless they have been evaluated on the same data sets.

Deep architectures include many variants of a few basic approaches. Each architecture has found success in specific domains. It is not always possible to compare the performance of multiple architectures, unless they have been evaluated on the same data sets.

深层架构包括一些基本方法的许多变体。每个架构都在特定领域取得了成功。并不总是可以比较多个体系结构的性能，除非它们已经在相同的数据集上进行了评估。

DNNs are typically feedforward networks in which data flows from the input layer to the output layer without looping back. At first, the DNN creates a map of virtual neurons and assigns random numerical values, or "weights", to connections between them. The weights and inputs are multiplied and return an output between 0 and 1. If the network did not accurately recognize a particular pattern, an algorithm would adjust the weights.<ref>{{Cite news|url=https://www.technologyreview.com/s/513696/deep-learning/|title=Is Artificial Intelligence Finally Coming into Its Own?|last=Hof|first=Robert D.|work=MIT Technology Review|access-date=2018-07-10}}</ref> That way the algorithm can make certain parameters more influential, until it determines the correct mathematical manipulation to fully process the data.

DNNs are typically feedforward networks in which data flows from the input layer to the output layer without looping back. At first, the DNN creates a map of virtual neurons and assigns random numerical values, or "weights", to connections between them. The weights and inputs are multiplied and return an output between 0 and 1. If the network did not accurately recognize a particular pattern, an algorithm would adjust the weights. That way the algorithm can make certain parameters more influential, until it determines the correct mathematical manipulation to fully process the data.

Dnn 是典型的前馈网络，其中的数据流从输入层到输出层，没有循环回来。首先，DNN 创建一个虚拟神经元的映射，并为它们之间的连接分配随机的数值或“权重”。权重和输入被乘以并返回0到1之间的输出。如果网络不能准确识别特定的模式，算法就会调整权重。这样，算法可以使某些参数更有影响力，直到它确定正确的数学操作，以完全处理数据。

[[Recurrent neural networks]] (RNNs), in which data can flow in any direction, are used for applications such as [[language model]]ing.<ref name="gers2001">{{cite journal|last1=Gers|first1=Felix A.|last2=Schmidhuber|first2=Jürgen|year=2001|title=LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages|url=http://elartu.tntu.edu.ua/handle/lib/30719|journal= IEEE Transactions on Neural Networks|volume=12|issue=6|pages=1333–1340|doi=10.1109/72.963769|pmid=18249962}}</ref><ref name="NIPS2014"/><ref name="vinyals2016">{{cite arxiv |eprint=1602.02410|last1=Jozefowicz|first1=Rafal|title=Exploring the Limits of Language Modeling|last2=Vinyals|first2=Oriol|last3=Schuster|first3=Mike|last4=Shazeer|first4=Noam|last5=Wu|first5=Yonghui|class=cs.CL|year=2016}}</ref><ref name="gillick2015">{{cite arxiv |eprint=1512.00103|last1=Gillick|first1=Dan|title=Multilingual Language Processing from Bytes|last2=Brunk|first2=Cliff|last3=Vinyals|first3=Oriol|last4=Subramanya|first4=Amarnag|class=cs.CL|year=2015}}</ref><ref name="MIKO2010">{{Cite journal|last=Mikolov|first=T.|display-authors=etal|date=2010|title=Recurrent neural network based language model|url=http://www.fit.vutbr.cz/research/groups/speech/servite/2010/rnnlm_mikolov.pdf|journal=Interspeech|pages=|via=}}</ref> Long short-term memory is particularly effective for this use.<ref name=":0" /><ref name=":10">{{Cite web|url=https://www.researchgate.net/publication/220320057|title=Learning Precise Timing with LSTM Recurrent Networks (PDF Download Available)|website=ResearchGate|accessdate=2017-06-13}}</ref>

Recurrent neural networks (RNNs), in which data can flow in any direction, are used for applications such as language modeling. Long short-term memory is particularly effective for this use.

回归神经网络(RNNs) ，其中的数据可以在任何方向流动，被用于应用，如语言建模。长期短期记忆对于这种使用特别有效。

[[Convolutional neural network|Convolutional deep neural networks (CNNs)]] are used in computer vision.<ref name="LECUN86">{{cite journal |last1=LeCun |first1=Y. |display-authors=etal |year= 1998|title=Gradient-based learning applied to document recognition |url= |journal=Proceedings of the IEEE |volume=86 |issue=11 |pages=2278–2324 |doi=10.1109/5.726791}}</ref> CNNs also have been applied to [[acoustic model]]ing for automatic speech recognition (ASR).<ref name=":2" />

Convolutional deep neural networks (CNNs) are used in computer vision. CNNs also have been applied to acoustic modeling for automatic speech recognition (ASR).

卷积深层神经网络用于计算机视觉。Cnns 也被应用到自动语音识别(ASR)的声学建模中。

==== Challenges ====

As with ANNs, many issues can arise with naively trained DNNs. Two common issues are [[overfitting]] and computation time.

As with ANNs, many issues can arise with naively trained DNNs. Two common issues are overfitting and computation time.

与人工神经网络一样，经过天真训练的 dna 也会产生许多问题。两个常见的问题是过于合身和计算时间。

DNNs are prone to overfitting because of the added layers of abstraction, which allow them to model rare dependencies in the training data. [[Regularization (mathematics)|Regularization]] methods such as Ivakhnenko's unit pruning<ref name="ivak1971"/> or [[weight decay]] (<math> \ell_2 </math>-regularization) or [[sparse matrix|sparsity]] (<math> \ell_1 </math>-regularization) can be applied during training to combat overfitting.<ref>{{Cite book |doi=10.1109/icassp.2013.6639349|isbn=978-1-4799-0356-6|arxiv=1212.0901|citeseerx=10.1.1.752.9151|chapter=Advances in optimizing recurrent networks|title=2013 IEEE International Conference on Acoustics, Speech and Signal Processing|pages=8624–8628|year=2013|last1=Bengio|first1=Yoshua|last2=Boulanger-Lewandowski|first2=Nicolas|last3=Pascanu|first3=Razvan}}</ref> Alternatively dropout regularization randomly omits units from the hidden layers during training. This helps to exclude rare dependencies.<ref name="DAHL2013">{{Cite journal|last=Dahl|first=G.|display-authors=etal|date=2013|title=Improving DNNs for LVCSR using rectified linear units and dropout|url=http://www.cs.toronto.edu/~gdahl/papers/reluDropoutBN_icassp2013.pdf|journal=ICASSP|pages=|via=}}</ref> Finally, data can be augmented via methods such as cropping and rotating such that smaller training sets can be increased in size to reduce the chances of overfitting.<ref>{{Cite web|url=https://www.coursera.org/learn/convolutional-neural-networks/lecture/AYzbX/data-augmentation|title=Data Augmentation - deeplearning.ai {{!}} Coursera|website=Coursera|accessdate=2017-11-30}}</ref>

DNNs are prone to overfitting because of the added layers of abstraction, which allow them to model rare dependencies in the training data. Regularization methods such as Ivakhnenko's unit pruning Alternatively dropout regularization randomly omits units from the hidden layers during training. This helps to exclude rare dependencies. Finally, data can be augmented via methods such as cropping and rotating such that smaller training sets can be increased in size to reduce the chances of overfitting.

由于增加了抽象层，dnn 容易过度拟合，这使得它们能够对训练数据中的稀有依赖关系建模。正则化方法，如 Ivakhnenko 的单位剪枝或者在训练期间从隐藏层随机删除正则化单位。这有助于排除罕见的依赖关系。最后，数据可以通过剪切和旋转等方法得到增强，这样可以增加较小的训练集的大小，以减少过拟合的机会。

DNNs must consider many training parameters, such as the size (number of layers and number of units per layer), the [[learning rate]], and initial weights. [[Hyperparameter optimization#Grid search|Sweeping through the parameter space]] for optimal parameters may not be feasible due to the cost in time and computational resources. Various tricks, such as batching (computing the gradient on several training examples at once rather than individual examples)<ref name="RBMTRAIN">{{Cite journal|last=Hinton|first=G. E.|date=2010|title=A Practical Guide to Training Restricted Boltzmann Machines|url=https://www.researchgate.net/publication/221166159|journal=Tech. Rep. UTML TR 2010-003|pages=|via=}}</ref> speed up computation. Large processing capabilities of many-core architectures (such as GPUs or the Intel Xeon Phi) have produced significant speedups in training, because of the suitability of such processing architectures for the matrix and vector computations.<ref>{{cite book|last1=You|first1=Yang|title=Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17|pages=1–12|last2=Buluç|first2=Aydın|last3=Demmel|first3=James|chapter=Scaling deep learning on GPU and knights landing clusters|chapter-url=https://dl.acm.org/citation.cfm?doid=3126908.3126912|publisher=SC '17, ACM|date=November 2017|accessdate=5 March 2018|doi=10.1145/3126908.3126912|isbn=9781450351140|url=http://www.escholarship.org/uc/item/6ch40821}}</ref><ref>{{cite journal|last1=Viebke|first1=André|last2=Memeti|first2=Suejb|last3=Pllana|first3=Sabri|last4=Abraham|first4=Ajith|title=CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi|journal=The Journal of Supercomputing|volume=75|pages=197–227|doi=10.1007/s11227-017-1994-x|accessdate=|arxiv=1702.07908|bibcode=2017arXiv170207908V|url=https://www.semanticscholar.org/paper/aa8a4d2de94cc0a8ccff21f651c005613e8ec0e8|year=2019}}</ref>

DNNs must consider many training parameters, such as the size (number of layers and number of units per layer), the learning rate, and initial weights. Sweeping through the parameter space for optimal parameters may not be feasible due to the cost in time and computational resources. Various tricks, such as batching (computing the gradient on several training examples at once rather than individual examples) speed up computation. Large processing capabilities of many-core architectures (such as GPUs or the Intel Xeon Phi) have produced significant speedups in training, because of the suitability of such processing architectures for the matrix and vector computations.

Dnn 必须考虑许多训练参数，如大小(层的数量和每层的单位数量) ，学习率和初始权重。在参数空间中搜索最优参数可能由于时间和计算资源的开销而不可行。各种技巧，例如批处理(一次计算多个训练示例的梯度，而不是单个示例)加快了计算速度。多核架构(如 gpu 或 Intel Xeon Phi)的大型处理能力在培训中产生了显著的加速，因为这种处理架构适合于矩阵和向量计算。

Alternatively, engineers may look for other types of neural networks with more straightforward and convergent training algorithms. CMAC ([[cerebellar model articulation controller]]) is one such kind of neural network. It doesn't require learning rates or randomized initial weights for CMAC. The training process can be guaranteed to converge in one step with a new batch of data, and the computational complexity of the training algorithm is linear with respect to the number of neurons involved.<ref name=Qin1>Ting Qin, et al. "A learning algorithm of CMAC based on RLS." Neural Processing Letters 19.1 (2004): 49-61.</ref><ref name=Qin2>Ting Qin, et al. "[http://www-control.eng.cam.ac.uk/Homepage/papers/cued_control_997.pdf Continuous CMAC-QRLS and its systolic array]." Neural Processing Letters 22.1 (2005): 1-16.</ref>

Alternatively, engineers may look for other types of neural networks with more straightforward and convergent training algorithms. CMAC (cerebellar model articulation controller) is one such kind of neural network. It doesn't require learning rates or randomized initial weights for CMAC. The training process can be guaranteed to converge in one step with a new batch of data, and the computational complexity of the training algorithm is linear with respect to the number of neurons involved.

另外，工程师可能会寻找其他类型的神经网络与更直接和收敛的训练算法。小脑模型神经网络(CMAC)就是这样一种神经网络。它不需要学习率或 CMAC 的随机初始权重。该算法能够保证训练过程在新的一批数据中一步收敛，并且训练算法的计算复杂度与所涉及的神经元数目成线性关系。

== Applications ==

=== Automatic speech recognition ===

{{Main|Speech recognition}}

Large-scale automatic speech recognition is the first and most convincing successful case of deep learning. LSTM RNNs can learn "Very Deep Learning" tasks<ref name="SCHIDHUB"/> that involve multi-second intervals containing speech events separated by thousands of discrete time steps, where one time step corresponds to about 10 ms. LSTM with forget gates<ref name=":10" /> is competitive with traditional speech recognizers on certain tasks.<ref name="graves2003"/>

Large-scale automatic speech recognition is the first and most convincing successful case of deep learning. LSTM RNNs can learn "Very Deep Learning" tasks that involve multi-second intervals containing speech events separated by thousands of discrete time steps, where one time step corresponds to about 10 ms. LSTM with forget gates is competitive with traditional speech recognizers on certain tasks.

大规模自动语音识别是深度学习的第一个也是最有说服力的成功案例。Lstm RNNs 可以学习“深度学习”任务，这些任务涉及到包含语音事件的多秒间隔，这些语音事件由数千个离散时间步长分隔开，其中一个时间步长相当于10毫秒左右的 LSTM，带有忘记门，在某些任务上与传统的语音识别器相竞争。

The initial success in speech recognition was based on small-scale recognition tasks based on TIMIT. The data set contains 630 speakers from eight major [[dialect]]s of [[American English]], where each speaker reads 10 sentences.<ref name="LDCTIMIT">''TIMIT Acoustic-Phonetic Continuous Speech Corpus'' Linguistic Data Consortium, Philadelphia.</ref> Its small size lets many configurations be tried. More importantly, the TIMIT task concerns phone-sequence recognition, which, unlike word-sequence recognition, allows weak phone [[bigram]] language models. This lets the strength of the acoustic modeling aspects of speech recognition be more easily analyzed. The error rates listed below, including these early results and measured as percent phone error rates (PER), have been summarized since 1991.

The initial success in speech recognition was based on small-scale recognition tasks based on TIMIT. The data set contains 630 speakers from eight major dialects of American English, where each speaker reads 10 sentences. Its small size lets many configurations be tried. More importantly, the TIMIT task concerns phone-sequence recognition, which, unlike word-sequence recognition, allows weak phone bigram language models. This lets the strength of the acoustic modeling aspects of speech recognition be more easily analyzed. The error rates listed below, including these early results and measured as percent phone error rates (PER), have been summarized since 1991.

语音识别的初步成功是基于基于 TIMIT 的小规模识别任务。该数据集包含来自美国英语八种主要方言的630名说话者，每人读10个句子。它体积小，可以尝试多种配置。更重要的是，TIMIT 任务涉及到电话序列识别，这不同于字序列识别，允许弱电话双字母语言模型。这使得语音识别的声学建模方面的强度更容易分析。下面列出的错误率，包括这些早期的结果和测量百分比电话错误率(PER) ，已经总结自1991年以来。

{| class="wikitable"

{| class="wikitable"

{ | class“ wikitable”

|-

|-

|-

! Method !! Percent phone<br>error rate (PER) (%)

! Method !! Percent phone<br>error rate (PER) (%)

!方法! ！电话误码率(PER)(%)

|-

|-

|-

| Randomly Initialized RNN<ref>{{cite journal |last1=Robinson |first1=Tony |authorlink=Tony Robinson (speech recognition)|title=Several Improvements to a Recurrent Error Propagation Network Phone Recognition System |journal=Cambridge University Engineering Department Technical Report |date=30 September 1991 |volume=CUED/F-INFENG/TR82 |doi=10.13140/RG.2.2.15418.90567 }}</ref>|| 26.1

| Randomly Initialized RNN|| 26.1

随机初始化的 RNN | | 26.1

|-

|-

|-

| Bayesian Triphone GMM-HMM || 25.6

| Bayesian Triphone GMM-HMM || 25.6

| Bayesian Triphone GMM-HMM | | 25.6

|-

|-

|-

| Hidden Trajectory (Generative) Model|| 24.8

| Hidden Trajectory (Generative) Model|| 24.8

隐藏轨迹(生成)模型 | 24.8

|-

|-

|-

| Monophone Randomly Initialized DNN|| 23.4

| Monophone Randomly Initialized DNN|| 23.4

随机初始化的 DNN | | 23.4

|-

|-

|-

| Monophone DBN-DNN|| 22.4

| Monophone DBN-DNN|| 22.4

| Monophone DBN-DNN | | 22.4

|-

|-

|-

| Triphone GMM-HMM with BMMI Training|| 21.7

| Triphone GMM-HMM with BMMI Training|| 21.7

| Triphone GMM-HMM with BMMI Training | | 21.7

|-

|-

|-

| Monophone DBN-DNN on fbank || 20.7

| Monophone DBN-DNN on fbank || 20.7

| Monophone DBN-DNN on fbank | 20.7

|-

|-

|-

| Convolutional DNN<ref name="CNN-2014">{{cite journal|last1=Abdel-Hamid|first1=O.|title=Convolutional Neural Networks for Speech Recognition|journal=IEEE/ACM Transactions on Audio, Speech, and Language Processing|date=2014|volume=22|issue=10|pages=1533–1545|doi=10.1109/taslp.2014.2339736|display-authors=etal|url=https://zenodo.org/record/891433}}</ref>|| 20.0

| Convolutional DNN|| 20.0

20.0

|-

|-

|-

| Convolutional DNN w. Heterogeneous Pooling|| 18.7

| Convolutional DNN w. Heterogeneous Pooling|| 18.7

| 卷积 dnw 异质池 | | 18.7

|-

|-

|-

| Ensemble DNN/CNN/RNN<ref name="EnsembleDL">{{cite journal|last2=Platt|first2=J.|date=2014|title=Ensemble Deep Learning for Speech Recognition|url=https://pdfs.semanticscholar.org/8201/55ecb57325503183253b8796de5f4535eb16.pdf|journal=Proc. Interspeech|pages=|via=|last1=Deng|first1=L.}}</ref>|| 18.3

| Ensemble DNN/CNN/RNN|| 18.3

| ensemblednn / cnn / rnn | | 18.3

|-

|-

|-

| Bidirectional LSTM|| 17.9

| Bidirectional LSTM|| 17.9

双向 LSTM | | 17.9

|-

|-

|-

| Hierarchical Convolutional Deep Maxout Network<ref name="HCDMM">{{cite journal|last1=Tóth|first1=Laszló|date=2015|title=Phone Recognition with Hierarchical Convolutional Deep Maxout Networks|journal=EURASIP Journal on Audio, Speech, and Music Processing|volume=2015|doi=10.1186/s13636-015-0068-3|url=http://publicatio.bibl.u-szeged.hu/5976/1/EURASIP2015.pdf}}</ref> || 16.5

| Hierarchical Convolutional Deep Maxout Network || 16.5

| 分层卷积深度最大网络 | | 16.5

|}

|}

|}

The debut of DNNs for speaker recognition in the late 1990s and speech recognition around 2009-2011 and of LSTM around 2003–2007, accelerated progress in eight major areas:<ref name="BOOK2014" /><ref name="interspeech2014Keynote" /><ref name="ReferenceA" />

The debut of DNNs for speaker recognition in the late 1990s and speech recognition around 2009-2011 and of LSTM around 2003–2007, accelerated progress in eight major areas:

20世纪90年代末，讲话者识别的 dnn 和2009-2011年左右的语音识别技术以及2003-2007年左右的语音识别技术，加速了以下八个主要领域的进展:

* Scale-up/out and accelerated DNN training and decoding

* Sequence discriminative training

* Feature processing by deep models with solid understanding of the underlying mechanisms

* Adaptation of DNNs and related deep models

* [[Multi-task learning|Multi-task]] and [[Inductive transfer|transfer learning]] by DNNs and related deep models

* CNNs and how to design them to best exploit [[domain knowledge]] of speech

* RNN and its rich LSTM variants

* Other types of deep models including tensor-based models and integrated deep generative/discriminative models.

All major commercial speech recognition systems (e.g., Microsoft [[Cortana (software)|Cortana]], [[Xbox]], [[Skype Translator]], [[Amazon Alexa]], [[Google Now]], [[Siri|Apple Siri]], [[Baidu]] and [[IFlytek|iFlyTek]] voice search, and a range of [[Nuance Communications|Nuance]] speech products, etc.) are based on deep learning.<ref name=BOOK2014 /><ref>{{Cite journal|url=https://www.wired.com/2014/12/skype-used-ai-build-amazing-new-language-translator/|title=How Skype Used AI to Build Its Amazing New Language Translator {{!}} WIRED|journal=Wired|accessdate=2017-06-14|date=2014-12-17|last1=McMillan|first1=Robert}}</ref><ref name="Baidu">{{cite arxiv |eprint=1412.5567|last1=Hannun|first1=Awni|title=Deep Speech: Scaling up end-to-end speech recognition|last2=Case|first2=Carl|last3=Casper|first3=Jared|last4=Catanzaro|first4=Bryan|last5=Diamos|first5=Greg|last6=Elsen|first6=Erich|last7=Prenger|first7=Ryan|last8=Satheesh|first8=Sanjeev|last9=Sengupta|first9=Shubho|last10=Coates|first10=Adam|last11=Ng|first11=Andrew Y|class=cs.CL|year=2014}}</ref><ref>{{Cite web|url=http://research.microsoft.com/en-US/people/deng/ieee-icassp-plenary-2016-mar24-lideng-posted.pdf|title=Plenary presentation at ICASSP-2016|date=|website=|accessdate=}}</ref>

All major commercial speech recognition systems (e.g., Microsoft Cortana, Xbox, Skype Translator, Amazon Alexa, Google Now, Apple Siri, Baidu and iFlyTek voice search, and a range of Nuance speech products, etc.) are based on deep learning.

所有主要的商业语音识别系统(例如，Cortana 语音，Xbox，Skype Translator，Amazon Alexa，Google Now，Apple Siri，百度和科大讯飞语音搜索，以及一系列 Nuance 语音产品等等)都是基于深度学习。

=== Image recognition ===

{{Main|Computer vision}}

A common evaluation set for image classification is the MNIST database data set. MNIST is composed of handwritten digits and includes 60,000 training examples and 10,000 test examples. As with TIMIT, its small size lets users test multiple configurations. A comprehensive list of results on this set is available.<ref name="YANNMNIST">{{cite web|url=http://yann.lecun.com/exdb/mnist/.|title=MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges|website=yann.lecun.com}}</ref>

A common evaluation set for image classification is the MNIST database data set. MNIST is composed of handwritten digits and includes 60,000 training examples and 10,000 test examples. As with TIMIT, its small size lets users test multiple configurations. A comprehensive list of results on this set is available.

图像分类常用的评价集是 MNIST 数据库数据集。Mnist 由手写数字组成，包括60,000个训练样本和10,000个测试样本。与 TIMIT 一样，它的小尺寸允许用户测试多个配置。在这个集合中有一个全面的结果列表可用。

Deep learning-based image recognition has become "superhuman", producing more accurate results than human contestants. This first occurred in 2011.<ref name=":7">{{Cite journal|last=Cireşan|first=Dan|last2=Meier|first2=Ueli|last3=Masci|first3=Jonathan|last4=Schmidhuber|first4=Jürgen|date=August 2012|title=Multi-column deep neural network for traffic sign classification|journal=Neural Networks|series=Selected Papers from IJCNN 2011|volume=32|pages=333–338|doi=10.1016/j.neunet.2012.02.023|pmid=22386783|citeseerx=10.1.1.226.8219}}</ref>

Deep learning-based image recognition has become "superhuman", producing more accurate results than human contestants. This first occurred in 2011.

基于深度学习的图像识别已经成为“超人” ，比人类参赛者产生更准确的结果。这第一次发生在2011年。

Deep learning-trained vehicles now interpret 360° camera views.<ref>[http://www.technologyreview.com/news/533936/nvidia-demos-a-car-computer-trained-with-deep-learning/ Nvidia Demos a Car Computer Trained with "Deep Learning"] (2015-01-06), David Talbot, ''[[MIT Technology Review]]''</ref> Another example is Facial Dysmorphology Novel Analysis (FDNA) used to analyze cases of human malformation connected to a large database of genetic syndromes.

Deep learning-trained vehicles now interpret 360° camera views. Another example is Facial Dysmorphology Novel Analysis (FDNA) used to analyze cases of human malformation connected to a large database of genetic syndromes.

深度学习训练的车辆现在可以解释360个摄像机视图。另一个例子是面部形态学异常新颖分析(FDNA)用于分析与大型遗传综合征数据库有关的人类畸形病例。

=== Visual art processing ===

Closely related to the progress that has been made in image recognition is the increasing application of deep learning techniques to various visual art tasks. DNNs have proven themselves capable, for example, of a) identifying the style period of a given painting, b) [[Neural Style Transfer]] - capturing the style of a given artwork and applying it in a visually pleasing manner to an arbitrary photograph or video, and c) generating striking imagery based on random visual input fields.<ref>{{cite journal |author1=G. W. Smith|author2=Frederic Fol Leymarie|date=10 April 2017|title=The Machine as Artist: An Introduction|journal=Arts|volume=6|issue=4|pages=5|doi=10.3390/arts6020005}}</ref><ref>{{cite journal |author=Blaise Agüera y Arcas|date=29 September 2017|title=Art in the Age of Machine Intelligence|journal=Arts|volume=6|issue=4|pages=18|doi=10.3390/arts6040018}}</ref>

Closely related to the progress that has been made in image recognition is the increasing application of deep learning techniques to various visual art tasks. DNNs have proven themselves capable, for example, of a) identifying the style period of a given painting, b) Neural Style Transfer - capturing the style of a given artwork and applying it in a visually pleasing manner to an arbitrary photograph or video, and c) generating striking imagery based on random visual input fields.

与图像识别技术的进步密切相关的是深度学习技术在各种视觉艺术任务中的应用日益增多。例如，dnn 已经证明自己能够: a)识别特定绘画的风格时期; b)神经风格转换——捕捉特定艺术作品的风格，并以视觉愉悦的方式将其应用于任意的图片或视频; c)基于随机视觉输入域生成引人注目的图像。

=== Natural language processing ===

{{Main|Natural language processing}}

Neural networks have been used for implementing language models since the early 2000s.<ref name="gers2001" /><ref>{{Cite journal|last=Bengio|first=Yoshua|last2=Ducharme|first2=Réjean|last3=Vincent|first3=Pascal|last4=Janvin|first4=Christian|date=March 2003|title=A Neural Probabilistic Language Model|url=http://dl.acm.org/citation.cfm?id=944919.944966|journal=J. Mach. Learn. Res.|volume=3|pages=1137–1155|issn=1532-4435}}</ref> LSTM helped to improve machine translation and language modeling.<ref name="NIPS2014" /><ref name="vinyals2016" /><ref name="gillick2015" />

Neural networks have been used for implementing language models since the early 2000s. LSTM helped to improve machine translation and language modeling.

自2000年以来，神经网络已经被用于实现语言模型。Lstm 有助于改进机器翻译和语言建模。

Other key techniques in this field are negative sampling<ref name="GoldbergLevy2014">{{cite arXiv|last1=Goldberg|first1=Yoav|last2=Levy|first2=Omar|title=word2vec Explained: Deriving Mikolov et al.'s Negative-Sampling Word-Embedding Method|eprint=1402.3722|class=cs.CL|year=2014}}</ref> and [[word embedding]]. Word embedding, such as ''[[word2vec]]'', can be thought of as a representational layer in a deep learning architecture that transforms an atomic word into a positional representation of the word relative to other words in the dataset; the position is represented as a point in a [[vector space]]. Using word embedding as an RNN input layer allows the network to parse sentences and phrases using an effective compositional vector grammar. A compositional vector grammar can be thought of as [[probabilistic context free grammar]] (PCFG) implemented by an RNN.<ref name="SocherManning2014">{{cite web|last1=Socher|first1=Richard|last2=Manning|first2=Christopher|title=Deep Learning for NLP|url=http://nlp.stanford.edu/courses/NAACL2013/NAACL2013-Socher-Manning-DeepLearning.pdf|accessdate=26 October 2014}}</ref> Recursive auto-encoders built atop word embeddings can assess sentence similarity and detect paraphrasing.<ref name="SocherManning2014" /> Deep neural architectures provide the best results for [[Statistical parsing|constituency parsing]],<ref>{{Cite journal |url= http://aclweb.org/anthology/P/P13/P13-1045.pdf|title = Parsing With Compositional Vector Grammars|last = Socher|first = Richard|date = 2013|journal = Proceedings of the ACL 2013 Conference|accessdate = |doi = |pmid = |last2 = Bauer|first2 = John|last3 = Manning|first3 = Christopher|last4 = Ng|first4 = Andrew}}</ref> [[sentiment analysis]],<ref>{{Cite journal |url= http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf|title = Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank|last = Socher|first = Richard|date = 2013 |accessdate = |doi = |pmid =}}</ref> information retrieval,<ref>{{Cite journal|last=Shen|first=Yelong|last2=He|first2=Xiaodong|last3=Gao|first3=Jianfeng|last4=Deng|first4=Li|last5=Mesnil|first5=Gregoire|date=2014-11-01|title=A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval|url=https://www.microsoft.com/en-us/research/publication/a-latent-semantic-model-with-convolutional-pooling-structure-for-information-retrieval/|journal=Microsoft Research}}</ref><ref>{{Cite journal|last=Huang|first=Po-Sen|last2=He|first2=Xiaodong|last3=Gao|first3=Jianfeng|last4=Deng|first4=Li|last5=Acero|first5=Alex|last6=Heck|first6=Larry|date=2013-10-01|title=Learning Deep Structured Semantic Models for Web Search using Clickthrough Data|url=https://www.microsoft.com/en-us/research/publication/learning-deep-structured-semantic-models-for-web-search-using-clickthrough-data/|journal=Microsoft Research}}</ref> spoken language understanding,<ref name="IEEE-TASL2015">{{cite journal | last1 = Mesnil | first1 = G. | last2 = Dauphin | first2 = Y. | last3 = Yao | first3 = K. | last4 = Bengio | first4 = Y. | last5 = Deng | first5 = L. | last6 = Hakkani-Tur | first6 = D. | last7 = He | first7 = X. | last8 = Heck | first8 = L. | last9 = Tur | first9 = G. | last10 = Yu | first10 = D. | last11 = Zweig | first11 = G. | year = 2015 | title = Using recurrent neural networks for slot filling in spoken language understanding | url= https://www.semanticscholar.org/paper/41911ef90a225a82597a2b576346759ea9c34247| journal = IEEE Transactions on Audio, Speech, and Language Processing | volume = 23 | issue = 3| pages = 530–539 | doi=10.1109/taslp.2014.2383614}}</ref> machine translation,<ref name="NIPS2014">{{Cite journal|last=Sutskever|first=L.|last2=Vinyals|first2=O.|last3=Le|first3=Q.|date=2014|title=Sequence to Sequence Learning with Neural Networks|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|journal=Proc. NIPS|pages=|via=|bibcode=2014arXiv1409.3215S|arxiv=1409.3215}}</ref><ref name="auto">{{Cite journal|last=Gao|first=Jianfeng|last2=He|first2=Xiaodong|last3=Yih|first3=Scott Wen-tau|last4=Deng|first4=Li|date=2014-06-01|title=Learning Continuous Phrase Representations for Translation Modeling|url=https://www.microsoft.com/en-us/research/publication/learning-continuous-phrase-representations-for-translation-modeling/|journal=Microsoft Research}}</ref> contextual entity linking,<ref name="auto"/> writing style recognition,<ref name="BROC2017">{{Cite journal |doi = 10.1002/dac.3259|title = Authorship verification using deep belief network systems|journal = International Journal of Communication Systems|volume = 30|issue = 12|pages = e3259|year = 2017|last1 = Brocardo|first1 = Marcelo Luiz|last2 = Traore|first2 = Issa|last3 = Woungang|first3 = Isaac|last4 = Obaidat|first4 = Mohammad S.}}</ref> Text classification and others.<ref>{{Cite news|url=https://www.microsoft.com/en-us/research/project/deep-learning-for-natural-language-processing-theory-and-practice-cikm2014-tutorial/|title=Deep Learning for Natural Language Processing: Theory and Practice (CIKM2014 Tutorial) - Microsoft Research|work=Microsoft Research|accessdate=2017-06-14}}</ref>

Other key techniques in this field are negative sampling and word embedding. Word embedding, such as word2vec, can be thought of as a representational layer in a deep learning architecture that transforms an atomic word into a positional representation of the word relative to other words in the dataset; the position is represented as a point in a vector space. Using word embedding as an RNN input layer allows the network to parse sentences and phrases using an effective compositional vector grammar. A compositional vector grammar can be thought of as probabilistic context free grammar (PCFG) implemented by an RNN. Recursive auto-encoders built atop word embeddings can assess sentence similarity and detect paraphrasing. sentiment analysis, information retrieval, spoken language understanding, machine translation, contextual entity linking, Text classification and others.

该领域的其他关键技术包括负采样和字嵌入。单词嵌入，比如 word2vec，可以被认为是深度学习架构中的一个表征层，该架构将一个原子单词转换为该单词相对于数据集中其他单词的位置表示; 位置表示为矢量空间中的一个点。使用词嵌入作为一个 RNN 输入层允许网络解析句子和短语使用一个有效的组合向量文法。合成向量文法可以看作是由 RNN 实现的概率上下文无关文法(PCFG)。构建在词嵌入之上的递归自动编码器可以评估句子相似度和检测复述。情感分析、信息检索分析、口语理解、机器翻译、语境实体链接、文本分类等。

Recent developments generalize [[word embedding]] to [[sentence embedding]].

Recent developments generalize word embedding to sentence embedding.

最近的发展将嵌入词概括为嵌入句。

[[Google Translate]] (GT) uses a large [[End-to-end principle|end-to-end]] long short-term memory network.<ref name="GT_Turovsky_2016">{{cite web|url=https://blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/|title=Found in translation: More accurate, fluent sentences in Google Translate|last=Turovsky|first=Barak|date=November 15, 2016|website=The Keyword Google Blog|accessdate=March 23, 2017}}</ref><ref name="googleblog_GNMT_2016">{{cite web|url=https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html|title=Zero-Shot Translation with Google's Multilingual Neural Machine Translation System|last1=Schuster|first1=Mike|last2=Johnson|first2=Melvin|date=November 22, 2016|website=Google Research Blog|accessdate=March 23, 2017|last3=Thorat|first3=Nikhil}}</ref><ref name="lstm1997">{{Cite journal|author=Sepp Hochreiter|author2=Jürgen Schmidhuber|year=1997|title=Long short-term memory|url=https://www.researchgate.net/publication/13853244|journal=[[Neural Computation (journal)|Neural Computation]]|volume=9|issue=8|pages=1735–1780|doi=10.1162/neco.1997.9.8.1735|pmid=9377276|via=}}</ref><ref name="lstm2000">{{Cite journal|author=Felix A. Gers|author2=Jürgen Schmidhuber|author3=Fred Cummins|year=2000|title=Learning to Forget: Continual Prediction with LSTM|journal=[[Neural Computation (journal)|Neural Computation]]|volume=12|issue=10|pages=2451–2471|doi=10.1162/089976600300015015|pmid=11032042|citeseerx=10.1.1.55.5709|url=https://www.semanticscholar.org/paper/11540131eae85b2e11d53df7f1360eeb6476e7f4}}</ref><ref name="GoogleTranslate">{{cite arXiv |eprint=1609.08144|last1=Wu|first1=Yonghui|title=Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation|last2=Schuster|first2=Mike|last3=Chen|first3=Zhifeng|last4=Le|first4=Quoc V|last5=Norouzi|first5=Mohammad|last6=Macherey|first6=Wolfgang|last7=Krikun|first7=Maxim|last8=Cao|first8=Yuan|last9=Gao|first9=Qin|last10=Macherey|first10=Klaus|last11=Klingner|first11=Jeff|last12=Shah|first12=Apurva|last13=Johnson|first13=Melvin|last14=Liu|first14=Xiaobing|last15=Kaiser|first15=Łukasz|last16=Gouws|first16=Stephan|last17=Kato|first17=Yoshikiyo|last18=Kudo|first18=Taku|last19=Kazawa|first19=Hideto|last20=Stevens|first20=Keith|last21=Kurian|first21=George|last22=Patil|first22=Nishant|last23=Wang|first23=Wei|last24=Young|first24=Cliff|last25=Smith|first25=Jason|last26=Riesa|first26=Jason|last27=Rudnick|first27=Alex|last28=Vinyals|first28=Oriol|last29=Corrado|first29=Greg|last30=Hughes|first30=Macduff|display-authors=29|class=cs.CL|year=2016}}</ref><ref name="WiredGoogleTranslate">"An Infusion of AI Makes Google Translate More Powerful Than Ever." Cade Metz, WIRED, Date of Publication: 09.27.16. https://www.wired.com/2016/09/google-claims-ai-breakthrough-machine-translation/</ref> [[Google Neural Machine Translation|Google Neural Machine Translation (GNMT)]] uses an [[example-based machine translation]] method in which the system "learns from millions of examples."<ref name="googleblog_GNMT_2016" /> It translates "whole sentences at a time, rather than pieces. Google Translate supports over one hundred languages.<ref name="googleblog_GNMT_2016" /> The network encodes the "semantics of the sentence rather than simply memorizing phrase-to-phrase translations".<ref name="googleblog_GNMT_2016" /><ref name="Biotet">{{cite web|url=http://www-clips.imag.fr/geta/herve.blanchon/Pdfs/NLP-KE-10.pdf|title=MT on and for the Web|last1=Boitet|first1=Christian|last2=Blanchon|first2=Hervé|date=2010|accessdate=December 1, 2016|last3=Seligman|first3=Mark|last4=Bellynck|first4=Valérie}}</ref> GT uses English as an intermediate between most language pairs.<ref name="Biotet" />

Google Translate (GT) uses a large end-to-end long short-term memory network. Google Neural Machine Translation (GNMT) uses an example-based machine translation method in which the system "learns from millions of examples." GT uses English as an intermediate between most language pairs.

谷歌翻译(GT)使用一个大型的端到端长短期记忆网络。Google 神经机器翻译(GNMT)使用了一种基于示例的机器翻译方法，在这种方法中，系统“从数百万个示例中学习”Gt 使用英语作为大多数语言对的中间语。

=== Drug discovery and toxicology ===

{{For|more information|Drug discovery|Toxicology}}

A large percentage of candidate drugs fail to win regulatory approval. These failures are caused by insufficient efficacy (on-target effect), undesired interactions (off-target effects), or unanticipated [[Toxicity|toxic effects]].<ref name="ARROWSMITH2013">{{Cite journal

A large percentage of candidate drugs fail to win regulatory approval. These failures are caused by insufficient efficacy (on-target effect), undesired interactions (off-target effects), or unanticipated toxic effects.<ref name="ARROWSMITH2013">{{Cite journal

很大一部分候选药物未能获得监管部门的批准。这些失败是由于功效不足(靶向效应) ，非预期的相互作用(靶向效应) ，或者意外的毒性效应

| pmid = 23903212

| pmid = 23903212

23903212

| year = 2013

| year = 2013

2013年

| last1 = Arrowsmith

| last1 = Arrowsmith

最后一个 Arrowsmith

| first1 = J

| first1 = J

1 j

| title = Trial watch: Phase II and phase III attrition rates 2011-2012

| title = Trial watch: Phase II and phase III attrition rates 2011-2012

审判观察: 2011-2012年第二阶段和第三阶段的减员率

| journal = Nature Reviews Drug Discovery

| journal = Nature Reviews Drug Discovery

自然杂志评论药物发现

| volume = 12

| volume = 12

第12卷

| issue = 8

| issue = 8

第八期

| pages = 569

| pages = 569

第569页

| last2 = Miller

| last2 = Miller

2 Miller

| first2 = P

| first2 = P

| first2 p

| doi = 10.1038/nrd4090

| doi = 10.1038/nrd4090

10.1038 / nrd4090

| url = https://www.semanticscholar.org/paper/9ab0f468a64762ca5069335c776e1ab07fa2b3e2

| url = https://www.semanticscholar.org/paper/9ab0f468a64762ca5069335c776e1ab07fa2b3e2

Https://www.semanticscholar.org/paper/9ab0f468a64762ca5069335c776e1ab07fa2b3e2

}}</ref><ref name="VERBIEST2015">{{Cite journal

}}</ref><ref name="VERBIEST2015">{{Cite journal

2015"{ Cite journal"

| pmid = 25582842

| pmid = 25582842

25582842

| year = 2015

| year = 2015

2015年

| last1 = Verbist

| last1 = Verbist

1 Verbist

| first1 = B

| first1 = B

| first1 b

| title = Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project

| title = Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project

| 题目使用转录组学指导药物发现项目的领导优化: 从 QSTAR 项目中学到的经验教训

| journal = Drug Discovery Today

| journal = Drug Discovery Today

今日药物发现杂志

| last2 = Klambauer

| last2 = Klambauer

最后2克拉姆鲍尔

| first2 = G

| first2 = G

| first2 g

| last3 = Vervoort

| last3 = Vervoort

| last 3 Vervoort

| first3 = L

| first3 = L

| first3 l

| last4 = Talloen

| last4 = Talloen

4 Talloen

| first4 = W

| first4 = W

| first4 w

| last5 = The Qstar

| last5 = The Qstar

5 The Qstar

| first5 = Consortium

| first5 = Consortium

| first5 = Consortium

| last6 = Shkedy

| last6 = Shkedy

6 Shkedy

| first6 = Z

| first6 = Z

| first6 z

| last7 = Thas

| last7 = Thas

| 最后7个

| first7 = O

| first7 = O

| first7 o

| last8 = Bender

| last8 = Bender

最后8本德尔

| first8 = A

| first8 = A

| first8 a

| last9 = Göhlmann

| last9 = Göhlmann

| last9 = Göhlmann

| first9 = H. W.

| first9 = H. W.

第一个9 h w。

| last10 = Hochreiter

| last10 = Hochreiter

10 Hochreiter

| first10 = S

| first10 = S

| first10 s

| doi = 10.1016/j.drudis.2014.12.014

| doi = 10.1016/j.drudis.2014.12.014

10.1016 / j.drudis. 2014.12.014

| volume=20

| volume=20

第20卷

| issue = 5

| issue = 5

第五期

| pages=505–513

| pages=505–513

第505-513页

}}</ref> Research has explored use of deep learning to predict the [[biomolecular target]]s,<ref name="MERCK2012" /><ref name=":5" /> [[off-target]]s, and [[Toxicity|toxic effects]] of environmental chemicals in nutrients, household products and drugs.<ref name="TOX21" /><ref name="TOX21Data" /><ref name=":11" />

}}</ref> Research has explored use of deep learning to predict the biomolecular targets, off-targets, and toxic effects of environmental chemicals in nutrients, household products and drugs.

} / ref 研究已经探索了利用深度学习来预测营养物质、家用产品和药物中环境化学物质的生物分子靶点、非靶点和毒性效应。

AtomNet is a deep learning system for structure-based [[Drug design|rational drug design]].<ref>{{cite arXiv|title = AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery|eprint= 1510.02855|date = 2015-10-09|first = Izhar|last = Wallach|first2 = Michael|last2 = Dzamba|first3 = Abraham|last3 = Heifets|class= cs.LG}}</ref> AtomNet was used to predict novel candidate biomolecules for disease targets such as the [[Ebola virus]]<ref>{{Cite news|title = Toronto startup has a faster way to discover effective medicines |url= https://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/|website = The Globe and Mail |accessdate= 2015-11-09}}</ref> and [[multiple sclerosis]].<ref>{{Cite web|title = Startup Harnesses Supercomputers to Seek Cures |url= http://ww2.kqed.org/futureofyou/2015/05/27/startup-harnesses-supercomputers-to-seek-cures/|website = KQED Future of You|accessdate = 2015-11-09}}</ref><ref>{{cite web|url=https://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/%5D%20and%20multiple%20sclerosis%20%5B/|title=Toronto startup has a faster way to discover effective medicines}}</ref>

AtomNet is a deep learning system for structure-based rational drug design. AtomNet was used to predict novel candidate biomolecules for disease targets such as the Ebola virus and multiple sclerosis.

Atomnet 是一个基于结构的合理药物设计的深度学习系统。原子网络被用来预测像埃博拉病毒和多发性硬化症这样的疾病靶标的新的候选生物分子。

In 2019 generative neural networks were used to produce molecules that were validated experimentally all the way into mice.<ref>{{cite journal |last1=Zhavoronkov |first1=Alex|date=2019|title=Deep learning enables rapid identification of potent DDR1 kinase inhibitors |journal=Nature Biotechnology |volume=37|issue=9|pages=1038–1040|doi=10.1038/s41587-019-0224-x |pmid=31477924|url=https://www.semanticscholar.org/paper/d44ac0a7fd4734187bccafc4a2771027b8bb595e}}</ref><ref>{{cite journal |last1=Gregory |first1=Barber |title=A Molecule Designed By AI Exhibits 'Druglike' Qualities |url=https://www.wired.com/story/molecule-designed-ai-exhibits-druglike-qualities/ |journal=Wired}}</ref>

In 2019 generative neural networks were used to produce molecules that were validated experimentally all the way into mice.

2019年，生成神经网络被用于制造分子，这些分子在小鼠体内得到了实验验证。

=== Customer relationship management ===

{{Main|Customer relationship management}}

Deep reinforcement learning has been used to approximate the value of possible [[direct marketing]] actions, defined in terms of [[RFM (customer value)|RFM]] variables. The estimated value function was shown to have a natural interpretation as [[customer lifetime value]].<ref>{{cite arxiv|last=Tkachenko |first=Yegor |title=Autonomous CRM Control via CLV Approximation with Deep Reinforcement Learning in Discrete and Continuous Action Space |date=April 8, 2015 |eprint=1504.01840|class=cs.LG }}</ref>

Deep reinforcement learning has been used to approximate the value of possible direct marketing actions, defined in terms of RFM variables. The estimated value function was shown to have a natural interpretation as customer lifetime value.

深强化学习已经被用来近似的价值可能的直接营销行动，根据 RFM 变量的定义。估计的价值函数作为客户生命周期价值有一个自然的解释。

=== Recommendation systems ===

{{Main|Recommender system}}

Recommendation systems have used deep learning to extract meaningful features for a latent factor model for content-based music and journal recommendations.<ref>{{Cite book|url=http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf|title=Advances in Neural Information Processing Systems 26|last=van den Oord|first=Aaron|last2=Dieleman|first2=Sander|last3=Schrauwen|first3=Benjamin|date=2013|publisher=Curran Associates, Inc.|editor-last=Burges|editor-first=C. J. C.|pages=2643–2651|editor-last2=Bottou|editor-first2=L.|editor-last3=Welling|editor-first3=M.|editor-last4=Ghahramani|editor-first4=Z.|editor-last5=Weinberger|editor-first5=K. Q.}}</ref><ref>X.Y. Feng, H. Zhang, Y.J. Ren, P.H. Shang, Y. Zhu, Y.C. Liang, R.C. Guan, D. Xu, (2019), "[https://www.jmir.org/2019/5/e12957/ The Deep Learning–Based Recommender System “Pubmender” for Choosing a Biomedical Publication Venue: Development and Validation Study]", ''[[Journal of Medical Internet Research]]'', 21 (5): e12957</ref> Multi-view deep learning has been applied for learning user preferences from multiple domains.<ref>{{Cite journal|last=Elkahky|first=Ali Mamdouh|last2=Song|first2=Yang|last3=He|first3=Xiaodong|date=2015-05-01|title=A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems|url=https://www.microsoft.com/en-us/research/publication/a-multi-view-deep-learning-approach-for-cross-domain-user-modeling-in-recommendation-systems/|journal=Microsoft Research}}</ref> The model uses a hybrid collaborative and content-based approach and enhances recommendations in multiple tasks.

Recommendation systems have used deep learning to extract meaningful features for a latent factor model for content-based music and journal recommendations. Multi-view deep learning has been applied for learning user preferences from multiple domains. The model uses a hybrid collaborative and content-based approach and enhances recommendations in multiple tasks.

推荐系统使用深度学习为基于内容的音乐和期刊推荐的潜在因素模型提取有意义的特征。多视角深度学习已被应用于从多个领域学习用户偏好。该模型采用了一种混合的协作和基于内容的方法，增强了对多项任务的建议。

=== Bioinformatics ===

{{Main|Bioinformatics}}

An [[autoencoder]] ANN was used in [[bioinformatics]], to predict [[Gene Ontology|gene ontology]] annotations and gene-function relationships.<ref>{{cite book|title=Deep Autoencoder Neural Networks for Gene Ontology Annotation Predictions |first1=Davide |last1=Chicco|first2=Peter|last2=Sadowski|first3=Pierre |last3=Baldi |date=1 January 2014|publisher=ACM|pages=533–540|doi=10.1145/2649387.2649442|journal=Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '14|isbn=9781450328944 |hdl = 11311/964622|url=https://www.semanticscholar.org/paper/09f3132fdf103bdef1125ffbccb8b46f921b2ab7 }}</ref>

An autoencoder ANN was used in bioinformatics, to predict gene ontology annotations and gene-function relationships.

将自动编码人工神经网络应用于生物信息学，预测基因本体和基因功能关系。

In medical informatics, deep learning was used to predict sleep quality based on data from wearables<ref>{{Cite journal|last=Sathyanarayana|first=Aarti|date=2016-01-01|title=Sleep Quality Prediction From Wearable Data Using Deep Learning|journal=JMIR mHealth and uHealth|volume=4|issue=4|doi=10.2196/mhealth.6562|pmid=27815231|pmc=5116102|pages=e125|url=https://www.semanticscholar.org/paper/c82884f9d6d39c8a89ac46b8f688669fb2931144}}</ref> and predictions of health complications from [[electronic health record]] data.<ref>{{Cite journal|last=Choi|first=Edward|last2=Schuetz|first2=Andy|last3=Stewart|first3=Walter F.|last4=Sun|first4=Jimeng|date=2016-08-13|title=Using recurrent neural network models for early detection of heart failure onset|url=http://jamia.oxfordjournals.org/content/early/2016/08/13/jamia.ocw112|journal=Journal of the American Medical Informatics Association|volume=24|issue=2|pages=361–370|doi=10.1093/jamia/ocw112|issn=1067-5027|pmid=27521897|pmc=5391725}}</ref> Deep learning has also showed efficacy in [[Artificial intelligence in healthcare|healthcare]].<ref>{{Cite web|url=https://medium.com/the-mission/deep-learning-in-healthcare-challenges-and-opportunities-d2eee7e2545|title=Deep Learning in Healthcare: Challenges and Opportunities|date=2016-08-12|website=Medium|access-date=2018-04-10}}</ref>

In medical informatics, deep learning was used to predict sleep quality based on data from wearables and predictions of health complications from electronic health record data. Deep learning has also showed efficacy in healthcare.

在医学信息学中，深度学习被用来根据可穿戴设备的数据和电子健康记录数据中健康并发症的预测来预测睡眠质量。深度学习在医疗保健方面也显示出了效果。

=== Medical Image Analysis ===

Deep learning has been shown to produce competitive results in medical application such as cancer cell classification, lesion detection, organ segmentation and image enhancement<ref>{{Cite journal|last=Litjens|first=Geert|last2=Kooi|first2=Thijs|last3=Bejnordi|first3=Babak Ehteshami|last4=Setio|first4=Arnaud Arindra Adiyoso|last5=Ciompi|first5=Francesco|last6=Ghafoorian|first6=Mohsen|last7=van der Laak|first7=Jeroen A.W.M.|last8=van Ginneken|first8=Bram|last9=Sánchez|first9=Clara I.|date=December 2017|title=A survey on deep learning in medical image analysis|journal=Medical Image Analysis|volume=42|pages=60–88|doi=10.1016/j.media.2017.07.005|pmid=28778026|arxiv=1702.05747|bibcode=2017arXiv170205747L|url=https://www.semanticscholar.org/paper/2abde28f75a9135c8ed7c50ea16b7b9e49da0c09}}</ref><ref>{{Cite book |doi=10.1109/ICCVW.2017.18|isbn=9781538610343|chapter=Deep Convolutional Neural Networks for Detecting Cellular Changes Due to Malignancy|title=2017 IEEE International Conference on Computer Vision Workshops (ICCVW)|pages=82–89|year=2017|last1=Forslid|first1=Gustav|last2=Wieslander|first2=Hakan|last3=Bengtsson|first3=Ewert|last4=Wahlby|first4=Carolina|last5=Hirsch|first5=Jan-Michael|last6=Stark|first6=Christina Runow|last7=Sadanandan|first7=Sajith Kecheril|chapter-url=http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-326160|url=https://www.semanticscholar.org/paper/6ae67bb4528bd5d922fd5a0c1a180ff1940f803c}}</ref>

Deep learning has been shown to produce competitive results in medical application such as cancer cell classification, lesion detection, organ segmentation and image enhancement

深度学习在肿瘤细胞分类、病变检测、器官分割和图像增强等医学应用领域具有广阔的应用前景

=== Mobile advertising ===

Finding the appropriate mobile audience for [[mobile advertising]] is always challenging, since many data points must be considered and analyzed before a target segment can be created and used in ad serving by any ad server.<ref>{{cite book |doi=10.1109/CSCITA.2017.8066548 |isbn=978-1-5090-4381-1|chapter=Predicting the popularity of instagram posts for a lifestyle magazine using deep learning|title=2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA)|pages=174–177|year=2017|last1=De|first1=Shaunak|last2=Maity|first2=Abhishek|last3=Goel|first3=Vritti|last4=Shitole|first4=Sanjay|last5=Bhattacharya|first5=Avik|chapter-url=https://www.semanticscholar.org/paper/c4389f8a63a7be58e007c183a49e491141f9e204}}</ref> Deep learning has been used to interpret large, many-dimensioned advertising datasets. Many data points are collected during the request/serve/click internet advertising cycle. This information can form the basis of machine learning to improve ad selection.

Finding the appropriate mobile audience for mobile advertising is always challenging, since many data points must be considered and analyzed before a target segment can be created and used in ad serving by any ad server. Deep learning has been used to interpret large, many-dimensioned advertising datasets. Many data points are collected during the request/serve/click internet advertising cycle. This information can form the basis of machine learning to improve ad selection.

为移动广告寻找合适的移动受众总是具有挑战性的，因为在任何广告服务器创建和使用目标细分之前，必须考虑和分析许多数据点。深度学习已经被用来解释大型的、多维的广告数据集。许多数据点是在请求 / 服务 / 点击互联网广告周期中收集的。这些信息可以作为机器学习改进广告选择的基础。

=== Image restoration ===

Deep learning has been successfully applied to [[inverse problems]] such as [[denoising]], [[super-resolution]], [[inpainting]], and [[film colorization]].<ref>{{Cite web|url=https://blog.floydhub.com/colorizing-and-restoring-old-images-with-deep-learning/|title=Colorizing and Restoring Old Images with Deep Learning|date=2018-11-13|website=FloydHub Blog|language=en|access-date=2019-10-11}}</ref> These applications include learning methods such as "Shrinkage Fields for Effective Image Restoration"<ref>{{cite conference | url= http://research.uweschmidt.org/pubs/cvpr14schmidt.pdf |first1= Uwe |last1= Schmidt |first2= Stefan |last2= Roth |conference= Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on |title= Shrinkage Fields for Effective Image Restoration }}</ref> which trains on an image dataset, and [[Deep Image Prior]], which trains on the image that needs restoration.

Deep learning has been successfully applied to inverse problems such as denoising, super-resolution, inpainting, and film colorization. These applications include learning methods such as "Shrinkage Fields for Effective Image Restoration" which trains on an image dataset, and Deep Image Prior, which trains on the image that needs restoration.

深度学习已成功地应用于反问题，如去噪，超分辨率，修补，和电影着色。这些应用包括一些学习方法，比如在图像数据集上训练的“有效影像复原收缩域” ，以及训练需要恢复的图像的“深度图像优先”。

=== Financial fraud detection ===

Deep learning is being successfully applied to financial [[fraud detection]] and anti-money laundering. "Deep anti-money laundering detection system can spot and recognize relationships and similarities between data and, further down the road, learn to detect anomalies or classify and predict specific events". The solution leverages both supervised learning techniques, such as the classification of suspicious transactions, and unsupervised learning, e.g. anomaly detection.

Deep learning is being successfully applied to financial fraud detection and anti-money laundering. "Deep anti-money laundering detection system can spot and recognize relationships and similarities between data and, further down the road, learn to detect anomalies or classify and predict specific events". The solution leverages both supervised learning techniques, such as the classification of suspicious transactions, and unsupervised learning, e.g. anomaly detection.

深度学习正被成功地应用于金融欺诈侦查和反洗钱。”深度反洗钱侦测系统能够发现和识别数据之间的关系和相似之处，并在今后学会侦测异常现象或对具体事件进行分类和预测”。该解决方案利用了监督式学习交易管理技术，如可疑交易的分类，以及非监督式学习交易管理技术。异常检测。

<ref>{{cite journal

<ref>{{cite journal

这是一个很好的例子

|first=Tomasz |last=Czech

|first=Tomasz |last=Czech

最后一个捷克

|title=Deep learning: the next frontier for money laundering detection

|title=Deep learning: the next frontier for money laundering detection

深度学习: 洗钱探测的下一个前沿

|url=https://www.globalbankingandfinance.com/deep-learning-the-next-frontier-for-money-laundering-detection/

|url=https://www.globalbankingandfinance.com/deep-learning-the-next-frontier-for-money-laundering-detection/

Https://www.globalbankingandfinance.com/deep-learning-the-next-frontier-for-money-laundering-detection/

|journal=Global Banking and Finance Review

|journal=Global Banking and Finance Review

全球银行和金融评论

}}</ref>

}}</ref>

{} / ref

=== Military ===

The United States Department of Defense applied deep learning to train robots in new tasks through observation.<ref name=":12">{{Cite web|url=https://www.eurekalert.org/pub_releases/2018-02/uarl-ard020218.php|title=Army researchers develop new algorithms to train robots|website=EurekAlert!|access-date=2018-08-29}}</ref>

The United States Department of Defense applied deep learning to train robots in new tasks through observation.

美国国防部应用深度学习技术通过观察训练机器人执行新任务。

== Relation to human cognitive and brain development ==

Deep learning is closely related to a class of theories of [[brain development]] (specifically, neocortical development) proposed by [[cognitive neuroscientist]]s in the early 1990s.<ref name="UTGOFF">{{cite journal | last1 = Utgoff | first1 = P. E. | last2 = Stracuzzi | first2 = D. J. | year = 2002 | title = Many-layered learning | url= https://www.semanticscholar.org/paper/398c477f674b228fec7f3f418a8cec047e2dafe5| journal = Neural Computation | volume = 14 | issue = 10| pages = 2497–2529 | doi=10.1162/08997660260293319| pmid = 12396572 }}</ref><ref name="ELMAN">{{cite book|url={{google books |plainurl=y |id=vELaRu_MrwoC}}|title=Rethinking Innateness: A Connectionist Perspective on Development|last=Elman|first=Jeffrey L.|publisher=MIT Press|year=1998|isbn=978-0-262-55030-7}}</ref><ref name="SHRAGER">{{cite journal | last1 = Shrager | first1 = J. | last2 = Johnson | first2 = MH | year = 1996 | title = Dynamic plasticity influences the emergence of function in a simple cortical array | url= | journal = Neural Networks | volume = 9 | issue = 7| pages = 1119–1129 | doi=10.1016/0893-6080(96)00033-0| pmid = 12662587 }}</ref><ref name="QUARTZ">{{cite journal | last1 = Quartz | first1 = SR | last2 = Sejnowski | first2 = TJ | year = 1997 | title = The neural basis of cognitive development: A constructivist manifesto | url= | journal = Behavioral and Brain Sciences | volume = 20 | issue = 4| pages = 537–556 | doi=10.1017/s0140525x97001581| pmid = 10097006 | citeseerx = 10.1.1.41.7854 }}</ref> These developmental theories were instantiated in computational models, making them predecessors of deep learning systems. These developmental models share the property that various proposed learning dynamics in the brain (e.g., a wave of [[nerve growth factor]]) support the [[self-organization]] somewhat analogous to the neural networks utilized in deep learning models. Like the [[neocortex]], neural networks employ a hierarchy of layered filters in which each layer considers information from a prior layer (or the operating environment), and then passes its output (and possibly the original input), to other layers. This process yields a self-organizing stack of [[transducer]]s, well-tuned to their operating environment. A 1995 description stated, "...the infant's brain seems to organize itself under the influence of waves of so-called trophic-factors ... different regions of the brain become connected sequentially, with one layer of tissue maturing before another and so on until the whole brain is mature."<ref name="BLAKESLEE">S. Blakeslee., "In brain's early growth, timetable may be critical," ''The New York Times, Science Section'', pp. B5–B6, 1995.</ref>

Deep learning is closely related to a class of theories of brain development (specifically, neocortical development) proposed by cognitive neuroscientists in the early 1990s. These developmental theories were instantiated in computational models, making them predecessors of deep learning systems. These developmental models share the property that various proposed learning dynamics in the brain (e.g., a wave of nerve growth factor) support the self-organization somewhat analogous to the neural networks utilized in deep learning models. Like the neocortex, neural networks employ a hierarchy of layered filters in which each layer considers information from a prior layer (or the operating environment), and then passes its output (and possibly the original input), to other layers. This process yields a self-organizing stack of transducers, well-tuned to their operating environment. A 1995 description stated, "...the infant's brain seems to organize itself under the influence of waves of so-called trophic-factors ... different regions of the brain become connected sequentially, with one layer of tissue maturing before another and so on until the whole brain is mature."

深度学习与上世纪90年代早期认知神经科学家提出的一类大脑发育理论(特别是新皮层发育理论)密切相关。这些发展理论在计算模型中被实例化，使它们成为深度学习系统的前辈。这些发展模型都有一个共同的特性，那就是大脑中各种被提出的学习动力学(例如，神经生长因子的波动)支持着自我组织神经网络，有点类似于深度学习模型中使用的神经网络。与新皮层一样，神经网络采用了一个层次化的过滤器，其中每一层考虑来自前一层(或操作环境)的信息，然后将其输出(可能还有原始输入)传递到其他层。这个过程产生一个自组织堆栈的传感器，很好地调整到他们的操作环境。一份1995年的描述说，“ ... 婴儿的大脑似乎在所谓的营养因子波的影响下自我组织... 大脑的不同区域依次连接起来，一层组织先于另一层组织成熟，以此类推，直到整个大脑成熟。”

A variety of approaches have been used to investigate the plausibility of deep learning models from a neurobiological perspective. On the one hand, several variants of the [[backpropagation]] algorithm have been proposed in order to increase its processing realism.<ref>{{Cite journal|last=Mazzoni|first=P.|last2=Andersen|first2=R. A.|last3=Jordan|first3=M. I.|date=1991-05-15|title=A more biologically plausible learning rule for neural networks.|journal=Proceedings of the National Academy of Sciences|volume=88|issue=10|pages=4433–4437|doi=10.1073/pnas.88.10.4433|issn=0027-8424|pmid=1903542|pmc=51674|bibcode=1991PNAS...88.4433M}}</ref><ref>{{Cite journal|last=O'Reilly|first=Randall C.|date=1996-07-01|title=Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm|journal=Neural Computation|volume=8|issue=5|pages=895–938|doi=10.1162/neco.1996.8.5.895|issn=0899-7667|url=https://www.semanticscholar.org/paper/ed9133009dd451bd64215cca7deba6e0b8d7c7b1}}</ref> Other researchers have argued that unsupervised forms of deep learning, such as those based on hierarchical [[generative model]]s and [[deep belief network]]s, may be closer to biological reality.<ref>{{Cite journal|last=Testolin|first=Alberto|last2=Zorzi|first2=Marco|date=2016|title=Probabilistic Models and Generative Neural Networks: Towards an Unified Framework for Modeling Normal and Impaired Neurocognitive Functions|journal=Frontiers in Computational Neuroscience|volume=10|pages=73|doi=10.3389/fncom.2016.00073|pmid=27468262|pmc=4943066|issn=1662-5188|url=https://www.semanticscholar.org/paper/9ff36a621ee2c831fbbda5b719942f9ed8ac844f}}</ref><ref>{{Cite journal|last=Testolin|first=Alberto|last2=Stoianov|first2=Ivilin|last3=Zorzi|first3=Marco|date=September 2017|title=Letter perception emerges from unsupervised deep learning and recycling of natural image features|journal=Nature Human Behaviour|volume=1|issue=9|pages=657–664|doi=10.1038/s41562-017-0186-2|pmid=31024135|issn=2397-3374|url=https://www.semanticscholar.org/paper/ec2463bd610dcb30d67681160e895761e2dde482}}</ref> In this respect, generative neural network models have been related to neurobiological evidence about sampling-based processing in the cerebral cortex.<ref>{{Cite journal|last=Buesing|first=Lars|last2=Bill|first2=Johannes|last3=Nessler|first3=Bernhard|last4=Maass|first4=Wolfgang|date=2011-11-03|title=Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons|journal=PLOS Computational Biology|volume=7|issue=11|pages=e1002211|doi=10.1371/journal.pcbi.1002211|pmid=22096452|pmc=3207943|issn=1553-7358|bibcode=2011PLSCB...7E2211B|url=https://www.semanticscholar.org/paper/e4e100e44bf7618c7d96188605fd9870012bdb50}}</ref>

A variety of approaches have been used to investigate the plausibility of deep learning models from a neurobiological perspective. On the one hand, several variants of the backpropagation algorithm have been proposed in order to increase its processing realism. Other researchers have argued that unsupervised forms of deep learning, such as those based on hierarchical generative models and deep belief networks, may be closer to biological reality. In this respect, generative neural network models have been related to neurobiological evidence about sampling-based processing in the cerebral cortex.

从神经生物学的角度来研究深度学习模型的可行性已经被使用了各种各样的方法。一方面，提出了几种不同的反向传播算法，以增加其处理的真实性。其他研究人员认为，无监督的深度学习形式，比如基于层次生成模型和深度信念网络的深度学习形式，可能更接近生物现实。在这方面，生成神经网络模型已经与神经生物学证据的采样为基础的处理在大脑皮层。

Although a systematic comparison between the human brain organization and the neuronal encoding in deep networks has not yet been established, several analogies have been reported. For example, the computations performed by deep learning units could be similar to those of actual neurons<ref>{{Cite journal|last=Morel|first=Danielle|last2=Singh|first2=Chandan|last3=Levy|first3=William B.|date=2018-01-25|title=Linearization of excitatory synaptic integration at no extra cost|journal=Journal of Computational Neuroscience|volume=44|issue=2|pages=173–188|doi=10.1007/s10827-017-0673-5|pmid=29372434|issn=0929-5313|url=https://www.semanticscholar.org/paper/3a528f2cde957d4e6417651f8005ca2ee81ca367}}</ref><ref>{{Cite journal|last=Cash|first=S.|last2=Yuste|first2=R.|date=February 1999|title=Linear summation of excitatory inputs by CA1 pyramidal neurons|journal=Neuron|volume=22|issue=2|pages=383–394|issn=0896-6273|pmid=10069343|doi=10.1016/s0896-6273(00)81098-3}}</ref> and neural populations.<ref>{{Cite journal|date=2004-08-01|title=Sparse coding of sensory inputs|journal=Current Opinion in Neurobiology|volume=14|issue=4|pages=481–487|doi=10.1016/j.conb.2004.07.007|pmid=15321069|issn=0959-4388 | last1 = Olshausen | first1 = B | last2 = Field | first2 = D|url=https://www.semanticscholar.org/paper/0dd289358b14f8176adb7b62bf2fb53ea62b3818}}</ref> Similarly, the representations developed by deep learning models are similar to those measured in the primate visual system<ref>{{Cite journal|last=Yamins|first=Daniel L K|last2=DiCarlo|first2=James J|date=March 2016|title=Using goal-driven deep learning models to understand sensory cortex|journal=Nature Neuroscience|volume=19|issue=3|pages=356–365|doi=10.1038/nn.4244|pmid=26906502|issn=1546-1726|url=https://www.semanticscholar.org/paper/94c4ba7246f781632aa68ca5b1acff0fdbb2d92f}}</ref> both at the single-unit<ref>{{Cite journal|last=Zorzi|first=Marco|last2=Testolin|first2=Alberto|date=2018-02-19|title=An emergentist perspective on the origin of number sense|journal=Phil. Trans. R. Soc. B|volume=373|issue=1740|pages=20170043|doi=10.1098/rstb.2017.0043|issn=0962-8436|pmid=29292348|pmc=5784047|url=https://www.semanticscholar.org/paper/c91db0c8349a78384f54c6a9a98370f5c9381b6c}}</ref> and at the population<ref>{{Cite journal|last=Güçlü|first=Umut|last2=van Gerven|first2=Marcel A. J.|date=2015-07-08|title=Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream|journal=Journal of Neuroscience|volume=35|issue=27|pages=10005–10014|doi=10.1523/jneurosci.5023-14.2015|pmid=26157000|pmc=6605414|arxiv=1411.6422}}</ref> levels.

Although a systematic comparison between the human brain organization and the neuronal encoding in deep networks has not yet been established, several analogies have been reported. For example, the computations performed by deep learning units could be similar to those of actual neurons and neural populations. Similarly, the representations developed by deep learning models are similar to those measured in the primate visual system both at the single-unit and at the population levels.

虽然人类大脑组织和神经元编码的深层网络之间的系统性比较还没有建立，但已经有几个类似的报道。例如，由深度学习单元执行的计算可能与实际的神经元和神经元群体的计算相似。类似地，深度学习模型在单个单位和人群水平上的表征与灵长类动物视觉系统的测量结果相似。

== Commercial activity ==

[[Facebook]]'s AI lab performs tasks such as [[Automatic image annotation|automatically tagging uploaded pictures]] with the names of the people in them.<ref name="METZ2013">{{cite magazine|first=C. |last=Metz |title=Facebook's 'Deep Learning' Guru Reveals the Future of AI |url=https://www.wired.com/wiredenterprise/2013/12/facebook-yann-lecun-qa/ |magazine=Wired |date=12 December 2013}}</ref>

Facebook's AI lab performs tasks such as automatically tagging uploaded pictures with the names of the people in them.

Facebook 的人工智能实验室执行的任务包括自动为上传的图片加上人员姓名。

Google's [[DeepMind Technologies]] developed a system capable of learning how to play [[Atari]] video games using only pixels as data input. In 2015 they demonstrated their [[AlphaGo]] system, which learned the game of [[Go (game)|Go]] well enough to beat a professional Go player.<ref>{{Cite web|title = Google AI algorithm masters ancient game of Go |url= http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234|website = Nature News & Comment|accessdate = 2016-01-30}}</ref><ref>{{Cite journal|title = Mastering the game of Go with deep neural networks and tree search|journal = [[Nature (journal)|Nature]]| issn= 0028-0836|pages = 484–489|volume = 529|issue = 7587|doi = 10.1038/nature16961|pmid = 26819042|first1 = David|last1 = Silver|author-link1=David Silver (programmer)|first2 = Aja|last2 = Huang|author-link2=Aja Huang|first3 = Chris J.|last3 = Maddison|first4 = Arthur|last4 = Guez|first5 = Laurent|last5 = Sifre|first6 = George van den|last6 = Driessche|first7 = Julian|last7 = Schrittwieser|first8 = Ioannis|last8 = Antonoglou|first9 = Veda|last9 = Panneershelvam|first10= Marc|last10= Lanctot|first11= Sander|last11= Dieleman|first12=Dominik|last12= Grewe|first13= John|last13= Nham|first14= Nal|last14= Kalchbrenner|first15= Ilya|last15= Sutskever|author-link15=Ilya Sutskever|first16= Timothy|last16= Lillicrap|first17= Madeleine|last17= Leach|first18= Koray|last18= Kavukcuoglu|first19= Thore|last19= Graepel|first20= Demis |last20=Hassabis|author-link20=Demis Hassabis|date= 28 January 2016|bibcode = 2016Natur.529..484S|url = https://www.semanticscholar.org/paper/846aedd869a00c09b40f1f1f35673cb22bc87490}}{{closed access}}</ref><ref>{{Cite web|title = A Google DeepMind Algorithm Uses Deep Learning and More to Master the Game of Go {{!}} MIT Technology Review |url= http://www.technologyreview.com/news/546066/googles-ai-masters-the-game-of-go-a-decade-earlier-than-expected/|website = MIT Technology Review|accessdate = 2016-01-30}}</ref> [[Google Translate]] uses a neural network to translate between more than 100 languages.

Google's DeepMind Technologies developed a system capable of learning how to play Atari video games using only pixels as data input. In 2015 they demonstrated their AlphaGo system, which learned the game of Go well enough to beat a professional Go player. Google Translate uses a neural network to translate between more than 100 languages.

谷歌 Google DeepMind 开发了一个系统，能够学习如何使用像素作为数据输入来玩雅达利的视频游戏。2015年，他们展示了自己的 AlphaGo 系统，这个系统学得很好，足以击败一个职业围棋选手。谷歌翻译使用一个神经网络来翻译100多种语言。

In 2015, [[Blippar]] demonstrated a mobile [[augmented reality]] application that uses deep learning to recognize objects in real time.<ref>{{Cite web|title=Blippar Demonstrates New Real-Time Augmented Reality App|url=https://techcrunch.com/2015/12/08/blippar-demonstrates-new-real-time-augmented-reality-app/|website=TechCrunch}}</ref>

In 2015, Blippar demonstrated a mobile augmented reality application that uses deep learning to recognize objects in real time.

在2015年，Blippar 展示了一个移动扩增实境应用程序，它使用深度学习来实时识别物体。

In 2017, Covariant.ai was launched, which focuses on integrating deep learning into factories.<ref>[https://www.nytimes.com/2017/11/06/technology/artificial-intelligence-start-up.html A.I. Researchers Leave Elon Musk Lab to Begin Robotics Start-Up]</ref>

In 2017, Covariant.ai was launched, which focuses on integrating deep learning into factories.

2017年，Covariant.ai 推出，致力于将深度学习融入工厂。

As of 2008,<ref>{{Cite document|title=TAMER: Training an Agent Manually via Evaluative Reinforcement - IEEE Conference Publication|doi=10.1109/DEVLRN.2008.4640845}}</ref> researchers at [[University of Texas at Austin|The University of Texas at Austin]] (UT) developed a machine learning framework called Training an Agent Manually via Evaluative Reinforcement, or TAMER, which proposed new methods for robots or computer programs to learn how to perform tasks by interacting with a human instructor.<ref name=":12" /> First developed as TAMER, a new algorithm called Deep TAMER was later introduced in 2018 during a collaboration between [[U.S. Army Research Laboratory]] (ARL) and UT researchers. Deep TAMER used deep learning to provide a robot the ability to learn new tasks through observation.<ref name=":12" /> Using Deep TAMER, a robot learned a task with a human trainer, watching video streams or observing a human perform a task in-person. The robot later practiced the task with the help of some coaching from the trainer, who provided feedback such as “good job” and “bad job.”<ref>{{Cite web|url=https://governmentciomedia.com/talk-algorithms-ai-becomes-faster-learner|title=Talk to the Algorithms: AI Becomes a Faster Learner|website=governmentciomedia.com|access-date=2018-08-29}}</ref>

As of 2008, researchers at The University of Texas at Austin (UT) developed a machine learning framework called Training an Agent Manually via Evaluative Reinforcement, or TAMER, which proposed new methods for robots or computer programs to learn how to perform tasks by interacting with a human instructor.

2008年，德克萨斯州大学奥斯汀分校的研究人员开发了一个机器学习框架，称为通过评估强化手动训练一个代理，或 TAMER，它提出了新的方法为机器人或计算机程序学习如何执行任务与人类教师互动。

== Criticism and comment ==

Deep learning has attracted both criticism and comment, in some cases from outside the field of computer science.

Deep learning has attracted both criticism and comment, in some cases from outside the field of computer science.

深度学习已经招致了批评和评论，在某些情况下来自计算机科学领域之外。

=== Theory ===

{{see also|Explainable AI}}

A main criticism concerns the lack of theory surrounding some methods.<ref>{{Cite web|url=https://medium.com/@GaryMarcus/in-defense-of-skepticism-about-deep-learning-6e8bfd5ae0f1|title=In defense of skepticism about deep learning|last=Marcus|first=Gary|date=2018-01-14|website=Gary Marcus|access-date=2018-10-11}}</ref> Learning in the most common deep architectures is implemented using well-understood gradient descent. However, the theory surrounding other algorithms, such as contrastive divergence is less clear.{{citation needed|date=July 2016}} (e.g., Does it converge? If so, how fast? What is it approximating?) Deep learning methods are often looked at as a [[black box]], with most confirmations done empirically, rather than theoretically.<ref name="Knight 2017">{{cite web | last=Knight | first=Will | title=DARPA is funding projects that will try to open up AI's black boxes | website=MIT Technology Review | date=2017-03-14 | url=https://www.technologyreview.com/s/603795/the-us-military-wants-its-autonomous-machines-to-explain-themselves/ | accessdate=2017-11-02}}</ref>

A main criticism concerns the lack of theory surrounding some methods. Learning in the most common deep architectures is implemented using well-understood gradient descent. However, the theory surrounding other algorithms, such as contrastive divergence is less clear. (e.g., Does it converge? If so, how fast? What is it approximating?) Deep learning methods are often looked at as a black box, with most confirmations done empirically, rather than theoretically.

一个主要的批评是关于围绕某些方法的理论缺乏。在最常见的深层架构中学习是通过使用易于理解的梯度下降法实现的。然而，围绕其他算法的理论，比如对比发散，还不是很清楚。(例如，它会聚吗？如果是这样，有多快？它的近似值是多少?)深度学习方法常常被看作是一个黑盒子，大多数的证实都是根据经验进行的，而不是理论上的。

Others point out that deep learning should be looked at as a step towards realizing strong AI, not as an all-encompassing solution. Despite the power of deep learning methods, they still lack much of the functionality needed for realizing this goal entirely. Research psychologist Gary Marcus noted:<blockquote>"Realistically, deep learning is only part of the larger challenge of building intelligent machines. Such techniques lack ways of representing [[causality|causal relationships]] (...) have no obvious ways of performing [[inference|logical inferences]], and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used. The most powerful A.I. systems, like [[Watson (computer)|Watson]] (...) use techniques like deep learning as just one element in a very complicated ensemble of techniques, ranging from the statistical technique of [[Bayesian inference]] to [[deductive reasoning]]."<ref>{{cite magazine|url=https://www.newyorker.com/|title=Is "Deep Learning" a Revolution in Artificial Intelligence?|last=Marcus|first=Gary|date=November 25, 2012|magazine=The New Yorker|accessdate=2017-06-14}}</ref></blockquote>As an alternative to this emphasis on the limits of deep learning, one author speculated that it might be possible to train a machine vision stack to perform the sophisticated task of discriminating between "old master" and amateur figure drawings, and hypothesized that such a sensitivity might represent the rudiments of a non-trivial machine empathy.<ref>{{cite web|url=http://artent.net/2015/03/27/art-and-artificial-intelligence-by-g-w-smith/|title=Art and Artificial Intelligence|date=March 27, 2015|publisher=ArtEnt|author=Smith, G. W.|accessdate=March 27, 2015|url-status=bot: unknown|archiveurl=https://web.archive.org/web/20170625075845/http://artent.net/2015/03/27/art-and-artificial-intelligence-by-g-w-smith/|archivedate=June 25, 2017}}</ref> This same author proposed that this would be in line with anthropology, which identifies a concern with aesthetics as a key element of [[behavioral modernity]].<ref>{{cite web |url=http://repositriodeficheiros.yolasite.com/resources/Texto%2028.pdf |author=Mellars, Paul |date=February 1, 2005 |title=The Impossible Coincidence: A Single-Species Model for the Origins of Modern Human Behavior in Europe|publisher=Evolutionary Anthropology: Issues, News, and Reviews |accessdate=April 5, 2017}}</ref>

Others point out that deep learning should be looked at as a step towards realizing strong AI, not as an all-encompassing solution. Despite the power of deep learning methods, they still lack much of the functionality needed for realizing this goal entirely. Research psychologist Gary Marcus noted:<blockquote>"Realistically, deep learning is only part of the larger challenge of building intelligent machines. Such techniques lack ways of representing causal relationships (...) have no obvious ways of performing logical inferences, and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used. The most powerful A.I. systems, like Watson (...) use techniques like deep learning as just one element in a very complicated ensemble of techniques, ranging from the statistical technique of Bayesian inference to deductive reasoning."</blockquote>As an alternative to this emphasis on the limits of deep learning, one author speculated that it might be possible to train a machine vision stack to perform the sophisticated task of discriminating between "old master" and amateur figure drawings, and hypothesized that such a sensitivity might represent the rudiments of a non-trivial machine empathy. This same author proposed that this would be in line with anthropology, which identifies a concern with aesthetics as a key element of behavioral modernity.

其他人指出，深度学习应该被看作是实现强大人工智能的一个步骤，而不是一个包罗万象的解决方案。尽管深度学习方法很强大，但它们仍然缺乏完全实现这一目标所需的大量功能。研究心理学家加里 · 马库斯指出: “实际上，深度学习只是构建智能机器这一更大挑战的一部分。这些技术缺乏表示因果关系的方法(...) ，没有明显的方法进行逻辑推理，而且它们距离整合抽象知识还有很长的路要走，比如关于物体是什么，它们是用来做什么的信息，以及它们是如何被典型地使用的。最强的人工智能。像 Watson (...)这样的系统使用像深度学习这样的技术作为一个非常复杂的技术集合中的一个元素，从贝叶斯推断的统计技术到演绎推理。” / blockquote 作为对深度学习局限性的强调的替代，一位作者推测也许可以训练机器视觉堆栈来执行区分“老大师”和业余人物画的复杂任务，并假设这种敏感性可能代表了非微不足道的机器同理心的雏形。这位作者提出，这与人类学是一致的，人类学认为美学是行为现代性的一个关键因素。

In further reference to the idea that artistic sensitivity might inhere within relatively low levels of the cognitive hierarchy, a published series of graphic representations of the internal states of deep (20-30 layers) neural networks attempting to discern within essentially random data the images on which they were trained<ref>{{cite web|url=http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html |author1=Alexander Mordvintsev |author2=Christopher Olah |author3=Mike Tyka |date=June 17, 2015 |title=Inceptionism: Going Deeper into Neural Networks |publisher=Google Research Blog |accessdate=June 20, 2015}}</ref> demonstrate a visual appeal: the original research notice received well over 1,000 comments, and was the subject of what was for a time the most frequently accessed article on ''[[The Guardian]]'s''<ref>{{cite news|url=https://www.theguardian.com/technology/2015/jun/18/google-image-recognition-neural-network-androids-dream-electric-sheep|title=Yes, androids do dream of electric sheep|date=June 18, 2015|newspaper=The Guardian|author=Alex Hern|accessdate=June 20, 2015}}</ref> website.

In further reference to the idea that artistic sensitivity might inhere within relatively low levels of the cognitive hierarchy, a published series of graphic representations of the internal states of deep (20-30 layers) neural networks attempting to discern within essentially random data the images on which they were trained demonstrate a visual appeal: the original research notice received well over 1,000 comments, and was the subject of what was for a time the most frequently accessed article on The Guardian's website.

为了进一步说明艺术敏感性可能存在于认知层级相对较低的层次中，一系列已发表的关于深层(20-30层)神经网络内部状态的图形表示试图在本质上是随机的数据中辨别他们所受训练的图像展示了一种视觉吸引力: 最初的研究通知收到了超过1000条评论，并且一度是《卫报》网站上最常被访问的文章的主题。

=== Errors ===

Some deep learning architectures display problematic behaviors,<ref name=goertzel>{{cite web|first=Ben |last=Goertzel |title=Are there Deep Reasons Underlying the Pathologies of Today's Deep Learning Algorithms? |year=2015 |url=http://goertzel.org/DeepLearning_v1.pdf}}</ref> such as confidently classifying unrecognizable images as belonging to a familiar category of ordinary images<ref>{{cite arxiv |eprint=1412.1897|last1=Nguyen|first1=Anh|title=Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images|last2=Yosinski|first2=Jason|last3=Clune|first3=Jeff|class=cs.CV|year=2014}}</ref> and misclassifying minuscule perturbations of correctly classified images.<ref>{{cite arxiv |eprint=1312.6199|last1=Szegedy|first1=Christian|title=Intriguing properties of neural networks|last2=Zaremba|first2=Wojciech|last3=Sutskever|first3=Ilya|last4=Bruna|first4=Joan|last5=Erhan|first5=Dumitru|last6=Goodfellow|first6=Ian|last7=Fergus|first7=Rob|class=cs.CV|year=2013}}</ref> [[Ben Goertzel|Goertzel]] hypothesized that these behaviors are due to limitations in their internal representations and that these limitations would inhibit integration into heterogeneous multi-component [[artificial general intelligence]] (AGI) architectures.<ref name="goertzel" /> These issues may possibly be addressed by deep learning architectures that internally form states homologous to image-grammar<ref>{{cite journal | last1 = Zhu | first1 = S.C. | last2 = Mumford | first2 = D. | year = 2006| title = A stochastic grammar of images | url= | journal = Found. Trends Comput. Graph. Vis. | volume = 2 | issue = 4| pages = 259–362 | doi = 10.1561/0600000018| citeseerx = 10.1.1.681.2190 }}</ref> decompositions of observed entities and events.<ref name="goertzel"/> [[Grammar induction|Learning a grammar]] (visual or linguistic) from training data would be equivalent to restricting the system to [[commonsense reasoning]] that operates on concepts in terms of grammatical [[Production (computer science)|production rules]] and is a basic goal of both human language acquisition<ref>Miller, G. A., and N. Chomsky. "Pattern conception." Paper for Conference on pattern detection, University of Michigan. 1957.</ref> and [[artificial intelligence]] (AI).<ref>{{cite web|first=Jason |last=Eisner |title=Deep Learning of Recursive Structure: Grammar Induction |url=http://techtalks.tv/talks/deep-learning-of-recursive-structure-grammar-induction/58089/}}</ref>

Some deep learning architectures display problematic behaviors, such as confidently classifying unrecognizable images as belonging to a familiar category of ordinary images and misclassifying minuscule perturbations of correctly classified images. Goertzel hypothesized that these behaviors are due to limitations in their internal representations and that these limitations would inhibit integration into heterogeneous multi-component artificial general intelligence (AGI) architectures. decompositions of observed entities and events. and artificial intelligence (AI).

一些深度学习架构表现出问题行为，例如自信地将不可识别的图像分类为熟悉的普通图像类别，以及对正确分类的图像的微小扰动进行错误分类。Goertzel 假设这些行为是由于它们内部表示的限制，这些限制会抑制集成到异构的多组件人工通用智能(AGI)架构中。观测实体和事件的分解。和人工智能(AI)。

=== Cyber threat ===

As deep learning moves from the lab into the world, research and experience shows that artificial neural networks are vulnerable to hacks and deception.<ref>{{Cite web|url=https://gizmodo.com/hackers-have-already-started-to-weaponize-artificial-in-1797688425|title=Hackers Have Already Started to Weaponize Artificial Intelligence|website=Gizmodo|access-date=2019-10-11}}</ref> By identifying patterns that these systems use to function, attackers can modify inputs to ANNs in such a way that the ANN finds a match that human observers would not recognize. For example, an attacker can make subtle changes to an image such that the ANN finds a match even though the image looks to a human nothing like the search target. Such a manipulation is termed an “adversarial attack.”<ref>{{Cite web|url=https://www.dailydot.com/debug/adversarial-attacks-ai-mistakes/|title=How hackers can force AI to make dumb mistakes|date=2018-06-18|website=The Daily Dot|language=en|access-date=2019-10-11}}</ref> In 2016 researchers used one ANN to doctor images in trial and error fashion, identify another's focal points and thereby generate images that deceived it. The modified images looked no different to human eyes. Another group showed that printouts of doctored images then photographed successfully tricked an image classification system.<ref name=":4">{{Cite news|url=https://singularityhub.com/2017/10/10/ai-is-easy-to-fool-why-that-needs-to-change|title=AI Is Easy to Fool—Why That Needs to Change|last=|first=|date=2017-10-10|work=Singularity Hub|accessdate=2017-10-11}}</ref> One defense is reverse image search, in which a possible fake image is submitted to a site such as [[TinEye]] that can then find other instances of it. A refinement is to search using only parts of the image, to identify images from which that piece may have been taken'''.'''<ref>{{Cite journal|last=Gibney|first=Elizabeth|title=The scientist who spots fake videos|url=https://www.nature.com/news/the-scientist-who-spots-fake-videos-1.22784|journal=Nature|pages=|doi=10.1038/nature.2017.22784|via=|year=2017}}</ref>

As deep learning moves from the lab into the world, research and experience shows that artificial neural networks are vulnerable to hacks and deception. By identifying patterns that these systems use to function, attackers can modify inputs to ANNs in such a way that the ANN finds a match that human observers would not recognize. For example, an attacker can make subtle changes to an image such that the ANN finds a match even though the image looks to a human nothing like the search target. Such a manipulation is termed an “adversarial attack.” In 2016 researchers used one ANN to doctor images in trial and error fashion, identify another's focal points and thereby generate images that deceived it. The modified images looked no different to human eyes. Another group showed that printouts of doctored images then photographed successfully tricked an image classification system. One defense is reverse image search, in which a possible fake image is submitted to a site such as TinEye that can then find other instances of it. A refinement is to search using only parts of the image, to identify images from which that piece may have been taken.

随着深度学习从实验室走向世界，研究和经验表明，人工神经网络很容易受到黑客和欺骗的攻击。通过识别这些系统使用的模式，攻击者可以修改人工神经网络的输入，使人工神经网络找到人类观察者无法识别的匹配。例如，攻击者可以对图像进行细微的修改，使 ANN 找到匹配的图像，即使图像看起来与搜索目标完全不同。这种操纵被称为“对抗性攻击” 2016年，研究人员使用一种人工神经网络，以试错的方式对图像进行修改，确定另一个人的焦点，从而生成欺骗它的图像。修改后的图像与人眼看到的没有什么不同。另一组研究表明，打印出的经过修改的图片然后被拍摄成功地欺骗了一张图片分类方案。其中一个防御措施是反向图像搜索，即将一个可能的假图像提交给 TinEye 这样的网站，然后该网站可以找到其他的实例。一个改进是搜索只使用图像的一部分，以确定图像可能是从哪一部分采取。

Another group showed that certain [[Psychedelic art|psychedelic]] spectacles could fool a [[facial recognition system]] into thinking ordinary people were celebrities, potentially allowing one person to impersonate another. In 2017 researchers added stickers to [[stop sign]]s and caused an ANN to misclassify them.<ref name=":4" />

Another group showed that certain psychedelic spectacles could fool a facial recognition system into thinking ordinary people were celebrities, potentially allowing one person to impersonate another. In 2017 researchers added stickers to stop signs and caused an ANN to misclassify them.

另一组研究表明，某种迷幻眼镜可以欺骗面部识别系统，使其认为普通人是名人，从而使一个人可以模仿另一个人。2017年，研究人员在停车标志上增加了贴纸，导致人工神经网络将其错误分类。

ANNs can however be further trained to detect attempts at deception, potentially leading attackers and defenders into an arms race similar to the kind that already defines the [[malware]] defense industry. ANNs have been trained to defeat ANN-based anti-malware software by repeatedly attacking a defense with malware that was continually altered by a [[genetic algorithm]] until it tricked the anti-malware while retaining its ability to damage the target.<ref name=":4" />

ANNs can however be further trained to detect attempts at deception, potentially leading attackers and defenders into an arms race similar to the kind that already defines the malware defense industry. ANNs have been trained to defeat ANN-based anti-malware software by repeatedly attacking a defense with malware that was continually altered by a genetic algorithm until it tricked the anti-malware while retaining its ability to damage the target.

然而，人工神经网络可以进一步训练，以发现欺骗企图，潜在地导致攻击者和防御者进入军备竞赛类似的类型，已经定义了恶意软件防御工业。人工神经网络已经被训练来击败基于人工神经网络的反恶意软件，方法是反复使用恶意软件攻击防御系统，这种防御系统被遗传算法不断改变，直到它欺骗了反恶意软件，同时保留了它破坏目标的能力。

Another group demonstrated that certain sounds could make the [[Google Now]] voice command system open a particular web address that would download malware.<ref name=":4" />

Another group demonstrated that certain sounds could make the Google Now voice command system open a particular web address that would download malware.

另一个研究小组证明，某些声音可以使 Google Now 语音命令系统打开一个特定的网址，从而下载恶意软件。

In “data poisoning,” false data is continually smuggled into a machine learning system's training set to prevent it from achieving mastery.<ref name=":4" />

In “data poisoning,” false data is continually smuggled into a machine learning system's training set to prevent it from achieving mastery.

在“数据中毒”中，错误的数据不断地被偷偷带入机器学习系统的训练集中，以防止它获得掌握。

=== Reliance on human [[microwork]] ===

Most Deep Learning systems rely on training and verification data that is generated and/or annotated by humans. It has been argued in [[Media studies|media philosophy]] that not only low-paid [[Clickworkers|clickwork]] (e.g. on [[Amazon Mechanical Turk]]) is regularly deployed for this purpose, but also implicit forms of human [[microwork]] that are often not recognized as such.<ref name=":13">{{Cite journal|last=Mühlhoff|first=Rainer|date=2019-11-06|title=Human-aided artificial intelligence: Or, how to run large computations in human brains? Toward a media sociology of machine learning|journal=New Media & Society|language=en|volume=|pages=146144481988533|doi=10.1177/1461444819885334|issn=1461-4448}}</ref> The philosopher Rainer Mühlhoff distinguishes five types of "machinic capture" of human microwork to generate training data: (1) [[gamification]] (the embedding of annotation or computation tasks in the flow of a game), (2) "trapping and tracking" (e.g. [[CAPTCHA]]s for image recognition or click-tracking on Google [[Search engine results page|search results pages]]), (3) exploitation of social motivations (e.g. [[Tag (Facebook)|tagging faces]] on [[Facebook]] to obtain labeled facial images), (4) [[information mining]] (e.g. by leveraging [[Quantified self|quantified-self]] devices such as [[activity tracker]]s) and (5) [[Clickworkers|clickwork]].<ref name=":13" /> Mühlhoff argues that in most commercial end-user applications of Deep Learning such as [[DeepFace|Facebook's face recognition system]], the need for training data does not stop once an ANN is trained. Rather, there is a continued demand for human-generated verification data to constantly calibrate and update the ANN. For this purpose Facebook introduced the feature that once a user is automatically recognized in an image, they receive a notification. They can choose whether of not they like to be publicly labeled on the image, or tell Facebook that it is not them in the picture.<ref>{{Cite news|url=https://www.wired.com/story/facebook-will-find-your-face-even-when-its-not-tagged/|title=Facebook Can Now Find Your Face, Even When It's Not Tagged|work=Wired|access-date=2019-11-22|language=en|issn=1059-1028}}</ref> This user interface is a mechanism to generate "a constant stream of verification data"<ref name=":13" /> to further train the network in real-time. As Mühlhoff argues, involvement of human users to generate training and verification data is so typical for most commercial end-user applications of Deep Learning that such systems may be referred to as "human-aided artificial intelligence".<ref name=":13" />

Most Deep Learning systems rely on training and verification data that is generated and/or annotated by humans. It has been argued in media philosophy that not only low-paid clickwork (e.g. on Amazon Mechanical Turk) is regularly deployed for this purpose, but also implicit forms of human microwork that are often not recognized as such. The philosopher Rainer Mühlhoff distinguishes five types of "machinic capture" of human microwork to generate training data: (1) gamification (the embedding of annotation or computation tasks in the flow of a game), (2) "trapping and tracking" (e.g. CAPTCHAs for image recognition or click-tracking on Google search results pages), (3) exploitation of social motivations (e.g. tagging faces on Facebook to obtain labeled facial images), (4) information mining (e.g. by leveraging quantified-self devices such as activity trackers) and (5) clickwork. This user interface is a mechanism to generate "a constant stream of verification data" to further train the network in real-time. As Mühlhoff argues, involvement of human users to generate training and verification data is so typical for most commercial end-user applications of Deep Learning that such systems may be referred to as "human-aided artificial intelligence".

大多数深度学习系统依赖于由人工生成和 / 或注释的训练和验证数据。传媒哲学一直认为，不仅仅是低薪点击工作(例如:。亚马逊土耳其机器人上的一个应用程序)定期为此目的而部署，但也有一些隐含的人类微工作形式通常不被认可。哲学家 Rainer m hlhoff 将人类微操作的“机械捕获”分为五种类型，用于生成训练数据: (1)游戏化(在游戏流中嵌入注释或计算任务) ，(2)“捕获和跟踪”(例如:。Captchas 用于图像识别或 Google 搜索结果页面上的点击跟踪) ，(3)利用社会动机(例如:。在脸谱网上标记脸孔以获取被标记的脸部图像) ，(4)信息挖掘(例如:。通过利用量化的自我设备，如活动跟踪器)和(5)点击。这个用户界面是一个生成“恒定验证数据流”的机制，以进一步实时训练网络。正如 m hlhoff 所说，人类用户参与生成培训和验证数据对于深度学习的大多数商业最终用户应用程序来说是如此典型，以至于这类系统可以被称为”人工智能辅助系统”。

== See also ==

* [[Applications of artificial intelligence]]

* [[Comparison of deep learning software]]

* [[Compressed sensing]]

* [[Echo state network]]

* [[List of artificial intelligence projects]]

* [[Liquid state machine]]

* [[List of datasets for machine learning research]]

* [[Reservoir computing]]

* [[Sparse coding]]

== References ==

{{Reflist|30em}}

== Further reading ==

{{refbegin}}

* {{cite book |title=Deep Learning |year=2016

|first1=Ian |last1=Goodfellow |authorlink1=Ian Goodfellow

|first1=Ian |last1=Goodfellow |authorlink1=Ian Goodfellow

作者: 伊恩 · 古德菲勒

|first2=Yoshua |last2=Bengio |authorlink2=Yoshua Bengio

|first2=Yoshua |last2=Bengio |authorlink2=Yoshua Bengio

|first2=Yoshua |last2=Bengio |authorlink2=Yoshua Bengio

|first3=Aaron |last3=Courville

|first3=Aaron |last3=Courville

3 Aaron | last 3 Courville

|publisher=MIT Press

|publisher=MIT Press

出版商: MIT 出版社

|url=http://www.deeplearningbook.org

|url=http://www.deeplearningbook.org

Http://www.deeplearningbook.org

|isbn=978-0-26203561-3

|isbn=978-0-26203561-3

[国际标准图书馆编号978-0-26203561-3]

|postscript=, introductory textbook.

|postscript=, introductory textbook.

附言，入门教科书。

}}

}}

}}

{{Prone to spam|date=June 2015}}{{Z148}}

-->

-->

[[Category:Deep learning| ]]

[[Category:Artificial neural networks]]

Category:Artificial neural networks

类别: 人工神经网络

[[Category:Artificial intelligence]]

Category:Artificial intelligence

类别: 人工智能

[[Category:Emerging technologies]]

Category:Emerging technologies

类别: 新兴技术

<noinclude>

<small>This page was moved from [[wikipedia:en:Deep learning]]. Its edit history can be viewed at [[深度学习/edithistory]]</small></noinclude>

[[Category:待整理页面]]

Moonscar

1,569

个编辑

更改

深度学习 (查看源代码)

2020年5月12日 (二) 17:55的版本

导航菜单

搜索