第1行: |
第1行: |
− | 此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。
| + | 深度学习(也被称为深度结构学习或者分层学习)是基于数据表示学习的[https://en.wikipedia.org/wiki/Machine_learning 机器学习]子集,而不是针对特定任务的算法。这种学习方式可以是[https://en.wikipedia.org/wiki/Supervised_learning 有监督的],[https://en.wikipedia.org/wiki/Semi-supervised_learninghttps://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks 半监督],或者是[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督]的。<ref name="BENGIO2012" /><ref name="SCHIDHUB" /><ref name="NatureBengio">{{cite journal |last1=Bengio |first1=Yoshua |last2=LeCun |first2= Yann| last3=Hinton | first3= Geoffrey|year=2015 |title=Deep Learning |journal=Nature |volume=521 |issue=7553 |pages=436–444 |doi=10.1038/nature14539 |pmid=26017442|bibcode=2015Natur.521..436L }}</ref> |
| | | |
− | {{About||deep versus shallow learning in educational psychology|Student approaches to learning|more information|Artificial neural network}} | + | 深度学习的构架,如[https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks 深度神经网络],[https://en.wikipedia.org/wiki/Deep_belief_network 深度信念网络],[https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络]已经被应用于很多领域,比如[https://en.wikipedia.org/wiki/Computer_vision 计算机视觉],[https://en.wikipedia.org/wiki/Speech_recognition 语音识别],[https://en.wikipedia.org/wiki/Natural_language_processing 自然语言处理],社交网络过滤,[https://en.wikipedia.org/wiki/Machine_translation 机器翻译],[https://en.wikipedia.org/wiki/Bioinformatics 生物信息学],[https://en.wikipedia.org/wiki/Drug_design 药物设计],[https://en.wikipedia.org/wiki/Board_game 棋类游戏]程序等。在这些领域中深度学习的能力可以与人类专家匹配,甚至超越人类专家。<ref name=":9">{{Cite journal|last=Ciresan|first=Dan|last2=Meier|first2=U.|last3=Schmidhuber|first3=J.|date=June 2012|title=Multi-column deep neural networks for image classification|url=http://ieeexplore.ieee.org/document/6248110/|journal=2012 IEEE Conference on Computer Vision and Pattern Recognition|pages=3642–3649|doi=10.1109/cvpr.2012.6248110|via=|isbn=978-1-4673-1228-8|arxiv=1202.2745}}</ref><ref name="krizhevsky2012">{{cite journal|last1=Krizhevsky|first1=Alex|last2=Sutskever|first2=Ilya|last3=Hinton|first3=Geoffry|date=2012|title=ImageNet Classification with Deep Convolutional Neural Networks|url=https://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf|journal=NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada}} |
| + | </ref><ref>{{cite web |title=Google’s AlphaGo AI wins three-match series against the world’s best Go player |url=https://techcrunch.com/2017/05/24/alphago-beats-planets-best-human-go-player-ke-jie/amp/ |website=TechCrunch |date=25 May 2017}}</ref> |
| | | |
− | {{short description|Branch of machine learning}} | + | 深度学习模型略微受到了信息处理和[https://en.wikipedia.org/wiki/Nervous_system 生物神经系统]的交流的启发,但是与生物大脑的有着结构和功能性质上的差异,这使得它们与[https://en.wikipedia.org/wiki/Neuroscience 神经科学]的一些发现是相冲突的。<ref>{{Cite journal|last=Marblestone|first=Adam H.|last2=Wayne|first2=Greg|last3=Kording|first3=Konrad P.|date=2016|title=Toward an Integration of Deep Learning and Neuroscience |journal=Frontiers in Computational Neuroscience |volume=10|pages=94|doi=10.3389/fncom.2016.00094 |pmc=5021692|pmid=27683554}}</ref><ref>{{cite journal|last1=Olshausen|first1=B. A.|year=1996|title=Emergence of simple-cell receptive field properties by learning a sparse code for natural images|journal=Nature|volume=381|issue=6583|pages=607–609|bibcode=1996Natur.381..607O|doi=10.1038/381607a0|pmid=8637596}}</ref><ref>{{cite arxiv|last=Bengio|first=Yoshua|last2=Lee|first2=Dong-Hyun|last3=Bornschein|first3=Jorg|last4=Mesnard|first4=Thomas|last5=Lin|first5=Zhouhan|date=2015-02-13|title=Towards Biologically Plausible Deep Learning|eprint=1502.04156|class=cs.LG}}</ref> |
| | | |
| + | 深度学习是一类[https://en.wikipedia.org/wiki/Machine_learning 机器学习算法]:<ref name="BOOK2014">{{cite journal|last2=Yu|first2=D.|year=2014|title=Deep Learning: Methods and Applications|url=http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol7-SIG-039.pdf|journal=Foundations and Trends in Signal Processing|volume=7|issue=3–4|pages=1–199|doi=10.1561/2000000039|last1=Deng|first1=L.}}</ref> |
| + | *使用了多层的[https://en.wikipedia.org/wiki/Nonlinear_filter 非线性处理单元]用于[https://en.wikipedia.org/wiki/Feature_extraction 特征抽取]和特征处理。后一层使用前一层的输出作为输入。 |
| + | *可以使用[https://en.wikipedia.org/wiki/Supervised_learning 监督](比如分类)或者[https://en.wikipedia.org/wiki/Unsupervised_learning 非监督](比如模式分析)的方式学习。 |
| + | *对应不同层次的抽象可以学到不同层次的表示。这些层次是垂直分层的。 |
| | | |
| + | === 概览 === |
| | | |
− | {{machine learning bar}} | + | 大多数现代的深度学习模型都是基于[https://en.wikipedia.org/wiki/Artificial_neural_network 人工神经网络]的,尽管它们也可以包括[https://en.wikipedia.org/wiki/Propositional_formula 命题逻辑]和深度[https://en.wikipedia.org/wiki/Generative_model 生成模型]中层维度的隐变量,比如深度信念网络和深度[https://en.wikipedia.org/wiki/Boltzmann_machine 玻尔兹曼机]中的节点。<ref name="BENGIODEEP">{{cite journal|last=Bengio|first=Yoshua|year=2009|title=Learning Deep Architectures for AI|url=http://sanghv.com/download/soft/machine%20learning,%20artificial%20intelligence,%20mathematics%20ebooks/ML/learning%20deep%20architectures%20for%20AI%20%282009%29.pdf|journal=Foundations and Trends in Machine Learning|volume=2|issue=1|pages=1–127|doi=10.1561/2200000006}}</ref> |
| | | |
| + | 在深度学习中,每一层都会将它的输入数据转化为更加抽象和组合的表示。在图像识别的应用中,原始数据可能是由像素组成的[https://en.wikipedia.org/wiki/Matrix_(mathematics) 矩阵]。第一个表示层可能会将像素抽象并编码成边缘。第二层可能会组合并再编码这些边缘。第三层可能会编码出鼻子或者眼睛。第四层可能会识别图像是否包含一张脸。在这其中最重要的是,深度学习的过程可以学习特征所处的最优位置。(当然,并不能完全排除手动调整,比如,不同的层数和每层不同的大小可以提供不同程度的抽象。)<ref name="BENGIO2012">{{cite journal|last2=Courville|first2=A.|last3=Vincent|first3=P.|year=2013|title=Representation Learning: A Review and New Perspectives|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=35|issue=8|pages=1798–1828|arxiv=1206.5538|doi=10.1109/tpami.2013.50|last1=Bengio|first1=Y.}}</ref><ref>{{cite journal|last1=LeCun|first1=Yann|last2=Bengio|first2=Yoshua|last3=Hinton|first3=Geoffrey|title=Deep learning|journal=Nature|date=28 May 2015|volume=521|issue=7553|pages=436–444|doi=10.1038/nature14539|bibcode=2015Natur.521..436L}}</ref> |
| | | |
| + | “深度学习”中的“深度”指的是数据转化层的数量。更确切地说,深度学习系统具有很大的信念分配路径(CAP)。CAP即输入到输出的转化链。<ref name="SCHIDHUB" /> 说明了输入和输出之间的潜在的因果关系。对于一个[https://en.wikipedia.org/wiki/Feedforward_neural_network 前馈神经网络],CAP的深度就是网络的隐层数加一(因为输出层也有参数)。对于[https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络],信号可能会多次经过同一层,CAP的深度可能是无限的。浅度学习和深度学习之间的阈值没有一个普遍的共识,但是大多数研究人员认为只要CAP深度大于2就可以认为是深度学习,因为有研究指出,只要CAP深度大于2神经网络就能以任意精度拟合任意函数。CAP深度大于2再增加层数也不能增加神经网络的逼近函数的性能。所以说这些额外的层有助于学习特征。 |
| | | |
− | '''Deep learning''' (also known as '''deep structured learning''') is part of a broader family of [[machine learning]] methods based on [[artificial neural networks]] with [[representation learning]]. Learning can be [[Supervised learning|supervised]], [[Semi-supervised learning|semi-supervised]] or [[Unsupervised learning|unsupervised]].<ref name="BENGIO2012" /><ref name="SCHIDHUB" /><ref name="NatureBengio">{{cite journal |last1=Bengio |first1=Yoshua |last2=LeCun |first2= Yann| last3=Hinton | first3= Geoffrey|year=2015 |title=Deep Learning |journal=Nature |volume=521 |issue=7553 |pages=436–444 |doi=10.1038/nature14539 |pmid=26017442|bibcode=2015Natur.521..436L |url=https://www.semanticscholar.org/paper/a4cec122a08216fe8a3bc19b22e78fbaea096256 }}</ref>
| + | 神经网络通常是用[https://en.wikipedia.org/wiki/Greedy_algorithm 逐层贪婪]的方法构建的。(需要澄清)(需要进一步的解释)深度学习有助于分解这些抽象概念,找出哪些特征可以提高性能。 |
| + | 对于[https://en.wikipedia.org/wiki/Supervised_learning 有监督的学习]任务,深度学习可以避免做[https://en.wikipedia.org/wiki/Feature_engineering 特征工程],并将数据抽象为一些中间的表示方法,这有点类似于[https://en.wikipedia.org/wiki/Principal_component_analysis 主成成分分析]。这样可以消除表示结构中的冗余。 |
| + | 深度学习算法可以应用于无监督学习人员。这是一个很重要的优点,因为未标记的数据比有标记的数据更多。深度学习构架用于无监督学习例子就是神经历史压缩器以及[https://en.wikipedia.org/wiki/Deep_belief_network 深度信念网络]。<ref name="BENGIO2012" /><ref name="SCHOLARDBNS">{{cite journal | last1 = Hinton | first1 = G.E. | year = 2009| title = Deep belief networks | url= | journal = Scholarpedia | volume = 4 | issue = 5| page = 5947 | doi=10.4249/scholarpedia.5947| bibcode = 2009SchpJ...4.5947H}}</ref> |
| | | |
− | Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.
| + | === 解释 === |
| | | |
− | 深度学习(也称为深度学习学习)是基于表征学习的人工神经网络的机器学习方法大家庭的一部分。学习可以是有监督的,半监督的或者无监督的。
| + | 深度神经网络通常被解释为[https://en.wikipedia.org/wiki/Universal_approximation_theorem 万能近似定理]<ref name="ReferenceB">Balázs Csanád Csáji (2001). Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary</ref><ref name=cyb>{{cite journal | last1 = Cybenko | year = 1989 | title = Approximations by superpositions of sigmoidal functions | url = http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf | format = PDF | journal = [[Mathematics of Control, Signals, and Systems]] | volume = 2 | issue = 4 | pages = 303–314 | doi = 10.1007/bf02551274 | deadurl = yes | archiveurl = https://web.archive.org/web/20151010204407/http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf | archivedate = 2015-10-10 | df = }}</ref><ref name=horn>{{cite journal | last1 = Hornik | first1 = Kurt | year = 1991 | title = Approximation Capabilities of Multilayer Feedforward Networks | url= | journal = Neural Networks | volume = 4 | issue = 2| pages = 251–257 | doi=10.1016/0893-6080(91)90009-t}}</ref><ref name="Haykin, Simon 1998">{{cite book|first=Simon S. |last=Haykin|title=Neural Networks: A Comprehensive Foundation|url={{google books |plainurl=y |id=bX4pAQAAMAAJ}}|year=1999|publisher=Prentice Hall|isbn=978-0-13-273350-2}}</ref><ref name="Hassoun, M. 1995 p. 48">{{cite book|first=Mohamad H. |last=Hassoun|title=Fundamentals of Artificial Neural Networks|url={{google books |plainurl=y |id=Otk32Y3QkxQC|page=48}}|year=1995|publisher=MIT Press|isbn=978-0-262-08239-6|p=48}}</ref>或者[https://en.wikipedia.org/wiki/Bayesian_inference 概率推断]。<ref name="BOOK2014" /><ref name="BENGIODEEP" /><ref name="BENGIO2012" /><ref name="SCHIDHUB">{{cite journal|last=Schmidhuber|first=J.|year=2015|title=Deep Learning in Neural Networks: An Overview|journal=Neural Networks|volume=61|pages=85–117|arxiv=1404.7828|doi=10.1016/j.neunet.2014.09.003|pmid=25462637}}</ref><ref name="SCHOLARDBNS" /><ref name = MURPHY>{{cite book|first=Kevin P. |last=Murphy|title=Machine Learning: A Probabilistic Perspective|url={{google books |plainurl=y |id=NZP6AQAAQBAJ}}|date=24 August 2012|publisher=MIT Press|isbn=978-0-262-01802-9}}</ref><ref name= "Patel NIPS 2016">{{Cite journal|url=https://papers.nips.cc/paper/6231-a-probabilistic-framework-for-deep-learning.pdf|title=A Probabilistic Framework for Deep Learning|last=Patel|first=Ankit|last2=Nguyen|first2=Tan|last3=Baraniuk|first3=Richard|date=2016|journal=Advances in Neural Information Processing Systems|pages=}}</ref> |
| | | |
| + | 万能近似定理有关单隐层[https://en.wikipedia.org/wiki/Feedforward_neural_network 前馈神经网络]对有限集连续函数的[https://en.wikipedia.org/wiki/Continuous_function 拟合能力]。<ref name="ReferenceB"/><ref name="cyb"/><ref name="horn"/><ref name="Haykin, Simon 1998"/><ref name="Hassoun, M. 1995 p. 48"/>在1989年,由[https://en.wikipedia.org/wiki/George_Cybenko George Cybenko]给出[https://en.wikipedia.org/wiki/Sigmoid_function sigmoid]激活函数的第一个证明,<ref name="cyb" />并由Kurt Hornik于1991年将其推广到多层的前馈神经网络结构。<ref name="horn" /> |
| | | |
| + | [https://en.wikipedia.org/wiki/Probability 概率]角度<ref name="MURPHY" /> 的解释源于[https://en.wikipedia.org/wiki/Machine_learning 机器学习]领域。它具有推断性以及分别与拟合与泛化相关的[https://en.wikipedia.org/wiki/Training 训练]和[https://en.wikipedia.org/wiki/Test_(assessment) 测试]的[https://en.wikipedia.org/wiki/Mathematical_optimization 优化]概念。更具体地说,概率角度的解释认为非线性的激活函数是一种[https://en.wikipedia.org/wiki/Cumulative_distribution_function 累积分布函数]。概率角度的解释在神经网络中引入了失活[https://en.wikipedia.org/wiki/Dropout_(neural_networks) (dropout)]和[https://en.wikipedia.org/wiki/Regularization_(mathematics) 正则化]的的方法。概率角度的解释是由[https://en.wikipedia.org/wiki/John_Hopfield Hopfield]、[https://en.wikipedia.org/wiki/Bernard_Widrow Widrow]和[https://en.wikipedia.org/wiki/Kumpati_S._Narendra Narendra]等研究人员提出的,并在[https://en.wikipedia.org/wiki/Christopher_Bishop Bishop]等人<ref name="BOOK2014" /><ref name="BENGIODEEP" /><ref name="BENGIO2012" /><ref name="SCHIDHUB" /><ref name="SCHOLARDBNS" /><ref name="MURPHY" /> 的推广下普及。<ref name="DROPOUT">{{cite arXiv |last1=Hinton |first1=G. E. |last2=Srivastava| first2 =N.|last3=Krizhevsky| first3=A.| last4 =Sutskever| first4=I.| last5=Salakhutdinov| first5=R.R.|eprint=1207.0580 |class=math.LG |title=Improving neural networks by preventing co-adaptation of feature detectors |date=2012}}</ref><ref name="prml">{{cite book|title=Pattern Recognition and Machine Learning|author=Bishop, Christopher M.|year=2006|publisher=Springer|url=http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf|isbn=978-0-387-31073-2}}</ref> |
| | | |
− | Deep learning architectures such as [[#Deep_neural_networks|deep neural network]]s, [[deep belief network]]s, [[recurrent neural networks]] and [[convolutional neural networks]] have been applied to fields including [[computer vision]], [[automatic speech recognition|speech recognition]], [[natural language processing]], [[audio recognition]], social network filtering, [[machine translation]], [[bioinformatics]], [[drug design]], medical image analysis, material inspection and [[board game]] programs, where they have produced results comparable to and in some cases surpassing human expert performance.<ref name=":9">{{Cite book |doi=10.1109/cvpr.2012.6248110 |isbn=978-1-4673-1228-8|arxiv=1202.2745|chapter=Multi-column deep neural networks for image classification|title=2012 IEEE Conference on Computer Vision and Pattern Recognition|pages=3642–3649|year=2012|last1=Ciresan|first1=D.|last2=Meier|first2=U.|last3=Schmidhuber|first3=J.}}</ref><ref name="krizhevsky2012">{{cite journal|last1=Krizhevsky|first1=Alex|last2=Sutskever|first2=Ilya|last3=Hinton|first3=Geoffry|date=2012|title=ImageNet Classification with Deep Convolutional Neural Networks|url=https://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf|journal=NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada}}
| + | === 历史 === |
| | | |
− | Deep learning architectures such as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.<ref name="krizhevsky2012">
| + | 深度学习这个术语是有[https://en.wikipedia.org/wiki/Rina_Dechter Rina Dechter]于1986年<ref name="dechter1986">[[Rina Dechter]] (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory.[https://www.researchgate.net/publication/221605378_Learning_While_Searching_in_Constraint-Satisfaction-Problems Online]</ref><ref name="scholarpedia" />引入到[https://en.wikipedia.org/wiki/Artificial_neural_network 机器学习社区]中的。2000年Igor Aizenberg和他的同事在布尔门神经元中引入到人工神经网络。<ref name="aizenberg2000">Igor Aizenberg, Naum N. Aizenberg, Joos P.L. Vandewalle (2000). Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications. Springer Science & Business Media.</ref><ref>Co-evolving recurrent neurons learn deep memory POMDPs. Proc. GECCO, Washington, D. C., pp. 1795-1802, ACM Press, New York, NY, USA, 2005.</ref> |
| | | |
− | 深度神经网络、深度信念网络、回归神经网络和卷积神经网络等深度学习体系结构已被应用于计算机视觉、语音识别、自然语言处理、音频识别、社会网络滤波、机器翻译、生物信息学、药物设计、医学图像分析、材料检测和棋盘游戏等领域,并取得了与人类专家表现相当甚至超过的结果。 2012年12月12日
| + | 1965年,[https://en.wikipedia.org/wiki/Alexey_Ivakhnenko Alexey Ivakhnenko] 和Lapa发布了第一个有监督的,深度的,前馈的,多层的[https://en.wikipedia.org/wiki/Perceptron 感知]机的一般性的学习算法。<ref name="ivak1965">{{cite book|first=A. G. |last=Ivakhnenko|title=Cybernetic Predicting Devices|url={{google books |plainurl=y |id=FhwVNQAACAAJ}}|year=1973|publisher=CCM Information Corporation}}</ref>1971年的一篇论文描述了一个使用[https://en.wikipedia.org/wiki/Group_method_of_data_handling 群数据处理]算法训练的8层深度学习网络。<ref name="ivak1971">{{Cite journal|last=Ivakhnenko|first=Alexey|date=1971|title=Polynomial theory of complex systems|url=|journal=IEEE Transactions on Systems, Man and Cybernetics |pages=364–378|doi=10.1109/TSMC.1971.4308320|pmid=|accessdate=|volume=1|issue=4}}</ref> |
| | | |
− | </ref><ref>{{cite web |title=Google's AlphaGo AI wins three-match series against the world's best Go player |url=https://techcrunch.com/2017/05/24/alphago-beats-planets-best-human-go-player-ke-jie/amp/ |website=TechCrunch |date=25 May 2017}}</ref> | + | 其他深度学习的工作架构,尤其是那些为了[https://en.wikipedia.org/wiki/Computer_vision 计算机视觉]设计的,最开始由[https://en.wikipedia.org/wiki/Neocognitron】 Kunihiko Fukashima]在1980年引入了[https://en.wikipedia.org/wiki/Neocognitron Neocognitron]。<ref name="FUKU1980">{{cite journal | last1 = Fukushima | first1 = K. | year = 1980 | title = Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position | url= | journal = Biol. Cybern. | volume = 36 | issue = 4| pages = 193–202 | doi=10.1007/bf00344251 | pmid=7370364}}</ref> 1989年,[https://en.wikipedia.org/wiki/Yann_LeCun Yann LeCun]等人应用了标准的反向传播算法,这种算法自1970年<ref name="lin1970">[[Seppo Linnainmaa]] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6-7.</ref><ref name="grie2012">{{Cite journal|last=Griewank|first=Andreas|date=2012|title=Who Invented the Reverse Mode of Differentiation?|url=http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|journal=Documenta Matematica, Extra Volume ISMP|pages=389–400|via=}}</ref><ref name="WERBOS1974">{{Cite journal|last=Werbos|first=P.|date=1974|title=Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences |url=https://www.researchgate.net/publication/35657389_Beyond_regression_new_tools_for_prediction_and_analysis_in_the_behavioral_sciences |journal=Harvard University |accessdate=12 June 2017}}</ref><ref name="werbos1982">{{Cite book|url=ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf|title=System modeling and optimization|last=Werbos|first=Paul|publisher=Springer|year=1982|isbn=|location=|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis}}</ref>以来一直是用于深度神经网络的[https://en.wikipedia.org/wiki/Automatic_differentiation 自动微分的反模式],这个神经网络的目的是为了识别邮件上的手写的数字邮政编码。虽然算法很有用,但是这需要训练三天。<ref name="LECUN1989">LeCun ''et al.'', "Backpropagation Applied to Handwritten Zip Code Recognition," ''Neural Computation'', 1, pp. 541–551, 1989.</ref> |
| | | |
− | </ref> | + | 1991年,这种系统被用于识别独立的2维手写数字,也可以用一个手工的3D目标模型来匹配2D的图像来识别3D的物体。<ref name="Weng1992">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf Cresceptron: a self-organizing neural network which grows adaptively]," ''Proc. International Joint Conference on Neural Networks'', Baltimore, Maryland, vol I, pp. 576-581, June, 1992.</ref><ref name="Weng1993">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf Learning recognition and segmentation of 3-D objects from 2-D images]," ''Proc. 4th International Conf. Computer Vision'', Berlin, Germany, pp. 121-128, May, 1993.</ref><ref name="Weng1997">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCV.pdf Learning recognition and segmentation using the Cresceptron]," ''International Journal of Computer Vision'', vol. 25, no. 2, pp. 105-139, Nov. 1997.</ref>Weng等人提出,人类大脑不使用单一的3D对象模型,1992年他们发表了Cresceptron,这是一种在杂乱场景中对三维图像进行识别的方法。Cresceptron是一个类似于Neocognitron的[https://en.wikipedia.org/wiki/Convolution 层叠网络]。但是Neocognitron需要程序员去手动整合特征,而Cresceptron可以在无监督的情况下在每层中学到任意数量的特征,其中的每个特征会使用卷积核表示。Cresceptron通过对网络进行回退分析来从聚合的场景中将学习对象分离出来。现在经常用在深度神经网络中的[https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer 最大池]化(比如[https://en.wikipedia.org/wiki/ImageNet ImageNet]的测试)就是Cresceptron先采用的。Cresceptron通过堆叠将位置分辨率从2x2降低到1来提高泛化能力。 |
| | | |
− | / 参考 | + | 1994年,André de Carvalho,Mike Fairhurst 和 David Bisset共同发表了一个多层布尔神经网络的实验性结果,这个网络也称之为无权重神经网络,它由三层自组织的特征提取神经网络模块(SOFT),随后是多层的分类神经网络模块(GSN),这些模型也都是独立训练的。特征提取模型中的每一层都提取了比前一层更加复杂的特征。<ref>{{Cite journal |title=An integrated Boolean neural network for pattern classification |journal=Pattern Recognition Letters |date=1994-08-08 |pages=807–813 |volume=15 |issue=8 |doi=10.1016/0167-8655(94)90009-4 |first=Andre C. L. F. |last1=de Carvalho |first2 = Mike C. |last2=Fairhurst |first3=David |last3 = Bisset}}</ref> |
| | | |
| + | 1995年,[https://en.wikipedia.org/wiki/Brendan_Frey Brendan Frey],[https://en.wikipedia.org/wiki/Peter_Dayan Peter Dayan] 和[https://en.wikipedia.org/wiki/Geoffrey_Hinton Hiton] 证明了可以使用[https://en.wikipedia.org/wiki/Wake-sleep_algorithm Wake-sleep算法]训练一个6层全连接且具有数百个隐层节点的神经网络。<ref>{{Cite journal|title = The wake-sleep algorithm for unsupervised neural networks |journal = Science|date = 1995-05-26|pages = 1158–1161|volume = 268|issue = 5214|doi = 10.1126/science.7761831|first = Geoffrey E.|last = Hinton|first2 = Peter|last2 = Dayan|first3 = Brendan J.|last3 = Frey|first4 = Radford|last4 = Neal|bibcode = 1995Sci...268.1158H}}</ref> 这里有很多因素会导致传播变慢,比如1991又[https://en.wikipedia.org/wiki/Sepp_Hochreiter Sepp Hochreiter]分析出的[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]。<ref name="HOCH1991">S. Hochreiter., "[http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf Untersuchungen zu dynamischen neuronalen Netzen]," ''Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber'', 1991.</ref><ref name="HOCH2001">{{cite book|url={{google books |plainurl=y |id=NWOcMVA64aAC}}|title=A Field Guide to Dynamical Recurrent Networks|last=Hochreiter|first=S.|display-authors=etal|date=15 January 2001|publisher=John Wiley & Sons|year=|isbn=978-0-7803-5369-5|location=|pages=|chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies|editor-last2=Kremer|editor-first2=Stefan C.|editor-first1=John F.|editor-last1=Kolen}}</ref> |
| | | |
| + | 在上个世纪90年代和本世纪00年代,因为ANNs的计算成本和缺乏大脑神经元如何连接的认知,一些使用手工特征的简化模型比如[https://en.wikipedia.org/wiki/Gabor_filter Gabor过滤器]和[https://en.wikipedia.org/wiki/Support_vector_machine 支持向量机]通常是一个受欢迎的选择。 |
| | | |
− | [[Artificial neural network]]s (ANNs) were inspired by information processing and distributed communication nodes in biological systems. ANNs have various differences from biological [[brain]]s. Specifically, neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analog.<ref>{{Cite journal|last=Marblestone|first=Adam H.|last2=Wayne|first2=Greg|last3=Kording|first3=Konrad P.|date=2016|title=Toward an Integration of Deep Learning and Neuroscience |journal=Frontiers in Computational Neuroscience |volume=10|pages=94|doi=10.3389/fncom.2016.00094 |pmc=5021692|pmid=27683554|bibcode=2016arXiv160603813M|arxiv=1606.03813|url=https://www.semanticscholar.org/paper/2dec4f52b1ce552b416f086d4ea1040626675dfa}}</ref><ref>{{cite journal|last1=Olshausen|first1=B. A.|year=1996|title=Emergence of simple-cell receptive field properties by learning a sparse code for natural images|journal=Nature|volume=381|issue=6583|pages=607–609|bibcode=1996Natur.381..607O|doi=10.1038/381607a0|pmid=8637596|url=https://www.semanticscholar.org/paper/8012c4a1e2ca663f1a04e80cbb19631a00cbab27}}</ref><ref>{{cite arxiv|last=Bengio|first=Yoshua|last2=Lee|first2=Dong-Hyun|last3=Bornschein|first3=Jorg|last4=Mesnard|first4=Thomas|last5=Lin|first5=Zhouhan|date=2015-02-13|title=Towards Biologically Plausible Deep Learning|eprint=1502.04156|class=cs.LG}}</ref>
| + | ANNs的浅度学习和深度学习都已经被探索多年。<ref>{{Cite journal|last=Morgan|first=Nelson|last2=Bourlard |first2=Hervé |last3=Renals |first3=Steve |last4=Cohen |first4=Michael|last5=Franco |first5=Horacio |date=1993-08-01 |title=Hybrid neural network/hidden markov model systems for continuous speech recognition |url=http://www.worldscientific.com/doi/abs/10.1142/S0218001493000455|journal=International Journal of Pattern Recognition and Artificial Intelligence|volume=07|issue=4|pages=899–916|doi=10.1142/s0218001493000455|issn=0218-0014}}</ref><ref name="Robinson1992">{{Cite journal|last=Robinson|first=T.|date=1992|title=A real-time recurrent error propagation network word recognition system|url=http://dl.acm.org/citation.cfm?id=1895720|journal=ICASSP|pages=|via=}}</ref><ref>{{Cite journal|last=Waibel|first=A.|last2=Hanazawa|first2=T.|last3=Hinton|first3=G.|last4=Shikano|first4=K.|last5=Lang|first5=K. J.|date=March 1989|title=Phoneme recognition using time-delay neural networks|url=http://ieeexplore.ieee.org/document/21701/|journal=IEEE Transactions on Acoustics, Speech, and Signal Processing|volume=37|issue=3|pages=328–339|doi=10.1109/29.21701|issn=0096-3518}}</ref>在语音的生成式模型领域这些方法都没超过内部手工的非均匀高斯[https://en.wikipedia.org/wiki/Mixture_model 混合模型]/[ https://en.wikipedia.org/wiki/Hidden_Markov_model 隐马尔可夫模型]。在神经网络预测模型遇到的关键的困难就是梯度弥散和弱时间相关结构。<ref name="Baker2009">{{cite journal | last1 = Baker | first1 = J. | last2 = Deng | first2 = Li | last3 = Glass | first3 = Jim | last4 = Khudanpur | first4 = S. | last5 = Lee | first5 = C.-H. | last6 = Morgan | first6 = N. | last7 = O'Shaughnessy | first7 = D. | year = 2009 | title = Research Developments and Directions in Speech Recognition and Understanding, Part 1 | url= | journal = IEEE Signal Processing Magazine | volume = 26 | issue = 3| pages = 75–80 | doi=10.1109/msp.2009.932166| bibcode = 2009ISPM...26...75B }}</ref> Key difficulties have been analyzed, including gradient diminishing<ref name="HOCH1991" /><ref name="Bengio1991">{{Cite web|url=https://www.researchgate.net/publication/41229141_Artificial_neural_networks_and_their_application_to_sequence_recognition|title=Artificial Neural Networks and their Application to Speech/Sequence Recognition|last=Bengio|first=Y.|date=1991|website=|publisher=McGill University Ph.D. thesis|accessdate=}}</ref><ref name="Deng1994">{{cite journal | last1 = Deng | first1 = L. | last2 = Hassanein | first2 = K. | last3 = Elmasry | first3 = M. | year = 1994 | title = Analysis of correlation structure for a neural predictive model with applications to speech recognition | url= | journal = Neural Networks | volume = 7 | issue = 2| pages = 331–339 | doi=10.1016/0893-6080(94)90027-2}}</ref> 另外的困难就是训练数据的缺乏和计算能力的限制。 |
| | | |
− | Artificial neural networks (ANNs) were inspired by information processing and distributed communication nodes in biological systems. ANNs have various differences from biological brains. Specifically, neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analog.
| + | 大多数语[https://en.wikipedia.org/wiki/Speech_recognition 语音识别研究]人员都从神经网络转向了生成式模型。20世纪90年代末,[https://en.wikipedia.org/wiki/SRI_International SRI国际]公司是其中的一个例外。<ref name="Heck2000">{{cite journal | last1 = Heck | first1 = L. | last2 = Konig | first2 = Y. | last3 = Sonmez | first3 = M. | last4 = Weintraub | first4 = M. | year = 2000 | title = Robustness to Telephone Handset Distortion in Speaker Recognition by Discriminative Feature Design | url= | journal = Speech Communication | volume = 31 | issue = 2| pages = 181–192 | doi=10.1016/s0167-6393(99)00077-1}}</ref> |
| | | |
− | 人工神经网络是受生物系统中信息处理和分布式通信节点的启发而产生的。人工神经网络与生物大脑有很多不同之处。具体来说,神经网络往往是静态的和符号化的,而大多数生物有机体的生物大脑是动态的(可塑的)和类似的。
| + | 在[https://en.wikipedia.org/wiki/National_Security_Agency 美国国家安全局]和[https://en.wikipedia.org/wiki/DARPA 美国国防部高级研究计划局]的资质下,SRI研究了语音识别和语者识别的深度神经网络。Heck's的语者识别团队在1998年[https://en.wikipedia.org/wiki/National_Institute_of_Standards_and_Technology 国家标准与技术研究所]的语音处理方面取得了第一个重大的成功。虽然SRI在语者识别的深度神经网络方面取得了成功,但在语音识别方面却没有取得类似的成功。上世纪90年代末,在“原始”光谱图和线性滤波器特征的深度自编码结构中,首次成功探索了将“原始”特征提升到手工优化特征的原则,表明其优于含分阶段固定变换的MelCepstral特征的。语音的的原始特征,[https://en.wikipedia.org/wiki/Waveform 波形],后来产生了出色的大规模结果。<ref>{{Cite web|url=https://www.researchgate.net/publication/266030526_Acoustic_Modeling_with_Deep_Neural_Networks_Using_Raw_Time_Signal_for_LVCSR|title=Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR (PDF Download Available)|website=ResearchGate|accessdate=2017-06-14}}</ref> |
| | | |
| + | 语音识别的很多方面都会用到一个叫做[https://en.wikipedia.org/wiki/Waveform 长短时记忆网络](LSTM)的神经网络结构,这是Hochreiter和Schmidhuber于1997年发表的一种循环神经网络结构(RNN)。<ref name=":0">{{Cite journal|last=Hochreiter|first=Sepp|author-link=Sepp Hochreiter|last2=Schmidhuber|first2=Jürgen|author-link2=Jürgen Schmidhuber|date=1997-11-01|title=Long Short-Term Memory|url=http://www.mitpressjournals.org/doi/10.1162/neco.1997.9.8.1735|journal=Neural Computation|volume=9|issue=8|pages=1735–1780|doi=10.1162/neco.1997.9.8.1735|issn=0899-7667|via=|pmid=9377276}}</ref>长短时记忆的循环神经网络避免了梯度消失的问题,而且可以学习非常深的网络结构,这种网络结构可以跨越数千个离散的时间步,记住非常久之前发生的事件,而这对于语言识别来说恰恰是非常重要的。<ref name="graves2003">{{Cite web|url=Ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf|title=Biologically Plausible Speech Recognition with LSTM Neural Nets|last=Graves|first=Alex|last2=Eck|first2=Douglas|date=2003|website=1st Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland|pages=175–184|accessdate=|last3=Beringer|first3=Nicole|last4=Schmidhuber|first4=Jürgen|authorlink4=Jürgen Schmidhuber}}</ref> 在2003年,长短时记忆网络开始在某些特定任务上与传统的语言识别器竞争。后来,长短时记忆循环神经网络开始与CTC(connectionist temporal classification)相结合。在2015年,据报道,经过CTC训练的长短时记忆网络的谷歌语言识别系统性能大幅提升了约49%,这项功能以及由[https://en.wikipedia.org/wiki/Google_Voice_Search 谷歌语音搜索]提供。<ref name="sak2015">{{Cite web|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html|title=Google voice search: faster and more accurate|last=Sak|first=Haşim|last2=Senior|first2=Andrew|date=September 2015|website=|accessdate=|last3=Rao|first3=Kanishka|last4=Beaufays|first4=Françoise|last5=Schalkwyk|first5=Johan}}</ref> |
| | | |
| + | 在2006年,[https://en.wikipedia.org/wiki/Geoffrey_Hinton Geoff Hinton],Ruslan Salakhutdinov,Osindero和Teh的出版物表明了如何有效地每次逐层地预训练一个多层的[https://en.wikipedia.org/wiki/Feedforward_neural_network 前馈神经网络],<ref>{{Cite journal|last=Hinton|first=Geoffrey E.|date=2007-10-01|title=Learning multiple layers of representation|url=http://www.cell.com/trends/cognitive-sciences/abstract/S1364-6613(07)00217-3|journal=Trends in Cognitive Sciences|volume=11|issue=10|pages=428–434|doi=10.1016/j.tics.2007.09.004|issn=1364-6613|pmid=17921042}}</ref> |
| + | <ref name=hinton06>{{Cite journal | last1 = Hinton | first1 = G. E. |authorlink1=Geoff Hinton| last2 = Osindero | first2 = S. | last3 = Teh | first3 = Y. W. | doi = 10.1162/neco.2006.18.7.1527 | title = A Fast Learning Algorithm for Deep Belief Nets | journal = [[Neural Computation (journal)|Neural Computation]]| volume = 18 | issue = 7 | pages = 1527–1554 | year = 2006 | pmid = 16764513| pmc = | url = http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf}}</ref><ref name=bengio2012>{{cite arXiv |last=Bengio |first=Yoshua |author-link=Yoshua Bengio |eprint=1206.5533 |title=Practical recommendations for gradient-based training of deep architectures |class=cs.LG|year=2012 }}</ref> 通过轮流将每一层视作一个无监督的[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 玻尔兹曼机],然后利用有监督下的[https://en.wikipedia.org/wiki/Backpropagation 反向传播算法]对其进行微调。<ref name="HINTON2007">G. E. Hinton., "[http://www.csri.utoronto.ca/~hinton/absps/ticsdraft.pdf Learning multiple layers of representation]," ''Trends in Cognitive Sciences'', 11, pp. 428–434, 2007.</ref>这些论文也提到了深度信念网络。 |
| | | |
− | The adjective "deep" in deep learning comes from the use of multiple layers in the network. Early work showed that a linear [[perceptron]] cannot be a universal classifier, and then that a network with a nonpolynomial activation function with one hidden layer of unbounded width can on the other hand so be. Deep learning is a modern variation which is concerned with an unbounded number of layers of bounded size, which permits practical application and optimized implementation, while retaining theoretical universality under mild conditions. In deep learning the layers are also permitted to be heterogeneous and to deviate widely from biologically informed [[connectionism|connectionist]] models, for the sake of efficiency, trainability and understandability, whence the "structured" part.
| + | 深度学习是各个学习最先进的系统的一部分,特别是计算机视觉和[https://en.wikipedia.org/wiki/Speech_recognition 自动语音识别](ASR)。常用的评估集有[https://en.wikipedia.org/wiki/TIMIT TIMIT]和[https://en.wikipedia.org/wiki/MNIST_database MNIST]等,以及一系列的大词汇语音识别任务的性能也在稳步提高。<ref name="HintonDengYu2012" /><ref>{{cite journal|url=https://www.microsoft.com/en-us/research/publication/new-types-of-deep-neural-network-learning-for-speech-recognition-and-related-applications-an-overview/|title=New types of deep neural network learning for speech recognition and related applications: An overview|first1=Li|last1=Deng|first2=Geoffrey|last2=Hinton|first3=Brian|last3=Kingsbury|date=1 May 2013|publisher=|via=research.microsoft.com}}</ref><ref>{{Cite journal|last=Deng|first=L.|last2=Li|first2=J.|last3=Huang|first3=J. T.|last4=Yao|first4=K.|last5=Yu|first5=D.|last6=Seide|first6=F.|last7=Seltzer|first7=M.|last8=Zweig|first8=G.|last9=He|first9=X.|date=May 2013|title=Recent advances in deep learning for speech research at Microsoft|url=http://ieeexplore.ieee.org/document/6639345/|journal=2013 IEEE International Conference on Acoustics, Speech and Signal Processing|pages=8604–8608|doi=10.1109/icassp.2013.6639345|isbn=978-1-4799-0356-6}}</ref> 对于长短时记忆网络而言,[https://en.wikipedia.org/wiki/Convolutional_neural_network 卷积神经网络](CNNs)被CTC替换成(ASR)。<ref name=":0" /><ref name="sak2015" /><ref name="sak2014">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling|last=Sak|first=Hasim|last2=Senior|first2=Andrew|date=2014|website=|accessdate=|last3=Beaufays|first3=Francoise}}</ref><ref name="liwu2015">{{cite arxiv |eprint=1410.4281|last1=Li|first1=Xiangang|title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition|last2=Wu|first2=Xihong|class=cs.CL|year=2014}}</ref><ref name="zen2015">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis|last=Zen|first=Heiga|last2=Sak|first2=Hasim|date=2015|website=Google.com|publisher=ICASSP|pages=4470–4474|accessdate=}}</ref><ref name="CNNspeech2013">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion|last=Deng|first=L.|last2=Abdel-Hamid|first2=O.|date=2013|website=Google.com|publisher=ICASSP|accessdate=|last3=Yu|first3=D.}}</ref><ref name=":2">{{Cite journal|last=Sainath|first=T. N.|last2=Mohamed|first2=A. r|last3=Kingsbury|first3=B.|last4=Ramabhadran|first4=B.|date=May 2013|title=Deep convolutional neural networks for LVCSR|url=http://ieeexplore.ieee.org/document/6639347/|journal=2013 IEEE International Conference on Acoustics, Speech and Signal Processing|pages=8614–8618|doi=10.1109/icassp.2013.6639347|isbn=978-1-4799-0356-6}}</ref> 但在计算机视觉方面更为成功。 |
| | | |
− | The adjective "deep" in deep learning comes from the use of multiple layers in the network. Early work showed that a linear perceptron cannot be a universal classifier, and then that a network with a nonpolynomial activation function with one hidden layer of unbounded width can on the other hand so be. Deep learning is a modern variation which is concerned with an unbounded number of layers of bounded size, which permits practical application and optimized implementation, while retaining theoretical universality under mild conditions. In deep learning the layers are also permitted to be heterogeneous and to deviate widely from biologically informed connectionist models, for the sake of efficiency, trainability and understandability, whence the "structured" part.
| + | 根据Yann LeCun 的数据,<ref name="lecun2016slides">[[Yann LeCun]] (2016). Slides on Deep Learning [https://indico.cern.ch/event/510372/ Online]</ref>深度学习对于产业的影响始于21世纪初期,当时CNNs已经处理了美国所有支票的10%到20%。深度学习在大规模语音识别的领域的工业应用始于2010年左右。 |
| + | 2009年,NIPS的深度语音识别工作组积极于深度语音生成模型的局限性,<ref name="HintonDengYu2012">{{cite journal | last1 = Hinton | first1 = G. | last2 = Deng | first2 = L. | last3 = Yu | first3 = D. | last4 = Dahl | first4 = G. | last5 = Mohamed | first5 = A. | last6 = Jaitly | first6 = N. | last7 = Senior | first7 = A. | last8 = Vanhoucke | first8 = V. | last9 = Nguyen | first9 = P. | last10 = Sainath | first10 = T. | last11 = Kingsbury | first11 = B. | year = 2012 | title = Deep Neural Networks for Acoustic Modeling in Speech Recognition --- The shared views of four research groups | url= | journal = IEEE Signal Processing Magazine | volume = 29 | issue = 6| pages = 82–97 | doi=10.1109/msp.2012.2205597}}</ref><ref name="patent2011">D. Yu, L. Deng, G. Li, and F. Seide (2011). "Discriminative pretraining of deep neural networks," U.S. Patent Filing.</ref>以及深度神经网络对更强大硬件和大规模数据集实用化的可能性。然而,后来发现直接使用大量数据的反向传播算法来代替预训练的大型且输出层上下文独立的深度神经网络会比最先进的高斯混合模型/隐马尔可夫模型错误率更低,而且这也是更加先进的基于模型的生成式系统。这两类识别系统产生错误的原因是有本质不同的,<ref name="ReferenceICASSP2013" /><ref name="NIPS2009">NIPS Workshop: Deep Learning for Speech Recognition and Related Applications, Whistler, BC, Canada, Dec. 2009 (Organizers: Li Deng, Geoff Hinton, D. Yu).</ref>将深度学习整合进已存在的高效的实时的语音编码系统已经被主流的语音识别系统所部署,这里提供了一些技术的洞见。<ref>{{cite web|title=Deng receives prestigious IEEE Technical Achievement Award - Microsoft Research|url=https://www.microsoft.com/en-us/research/blog/deng-receives-prestigious-ieee-technical-achievement-award/|website=Microsoft Research|date=3 December 2015}}</ref>2009-2010左右的分析,将高斯混合模型(以及其他生成式语音模型)和深度神经网络模型进行了对比,刺激了对深度学习语音识别的早起工业投资,最终工业上的普遍和主导性的使用。分析结果表明,辨别式深度神经网络模型和生成式模型之间的表现是差不多的(小于1.5%的错误率)。<ref name="HintonDengYu2012" /><ref name="ReferenceICASSP2013">{{cite journal|last2=Hinton|first2=G.|last3=Kingsbury|first3=B.|date=2013|title=New types of deep neural network learning for speech recognition and related applications: An overview (ICASSP)|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ICASSP-2013-DengHintonKingsbury-revised.pdf|journal=|pages=|via=|last1=Deng|first1=L.}}</ref><ref name="HintonKeynoteICASSP2013">Keynote talk: Recent Developments in Deep Neural Networks. ICASSP, 2013 (by Geoff Hinton).</ref><ref name="interspeech2014Keynote">{{Cite web|url=https://www.superlectures.com/interspeech2014/downloadFile?id=6&type=slides&filename=achievements-and-challenges-of-deep-learning-from-speech-analysis-and-recognition-to-language-and-multimodal-processing|title=Keynote talk: 'Achievements and Challenges of Deep Learning - From Speech Analysis and Recognition To Language and Multimodal Processing'|last=Li|first=Deng|date=September 2014|website=Interspeech|accessdate=}}</ref> |
| | | |
− | 深度学习中的形容词“深度”来自于网络中多层次的使用。早期的工作表明线性感知器不可能是一个通用的分类器,然后一个拥有一个无限宽度的隐藏层的非多项式激活函数的网络可以是这样的。深度学习是一种现代变体,它涉及到有限的层数,允许实际应用和优化实现,同时在温和的条件下保持理论的普遍性。在深度学习中,为了提高效率、可训练性和可理解性,允许层次结构具有异质性和广泛偏离生物信息连接主义模型。
| + | 2010年,研究人员将深度学习从TIMIT扩展到大词汇的语音识别,采用一个大型输出层的深度神经网络,其基于[https://en.wikipedia.org/wiki/Decision_tree 决策树]构造的上下文独立的隐马尔可夫状态。<ref name="Roles2010">{{cite journal|last1=Yu|first1=D.|last2=Deng|first2=L.|date=2010|title=Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition|url=https://www.microsoft.com/en-us/research/publication/roles-of-pre-training-and-fine-tuning-in-context-dependent-dbn-hmms-for-real-world-speech-recognition/|journal=NIPS Workshop on Deep Learning and Unsupervised Feature Learning|pages=|via=}}</ref><ref>{{Cite journal|last=Seide|first=F.|last2=Li|first2=G.|last3=Yu|first3=D.|date=2011|title=Conversational speech transcription using context-dependent deep neural networks|url=https://www.microsoft.com/en-us/research/publication/conversational-speech-transcription-using-context-dependent-deep-neural-networks|journal=Interspeech|pages=|via=}}</ref><ref>{{Cite journal|last=Deng|first=Li|last2=Li|first2=Jinyu|last3=Huang|first3=Jui-Ting|last4=Yao|first4=Kaisheng|last5=Yu|first5=Dong|last6=Seide|first6=Frank|last7=Seltzer|first7=Mike|last8=Zweig|first8=Geoff|last9=He|first9=Xiaodong|date=2013-05-01|title=Recent Advances in Deep Learning for Speech Research at Microsoft|url=https://www.microsoft.com/en-us/research/publication/recent-advances-in-deep-learning-for-speech-research-at-microsoft/|journal=Microsoft Research}}</ref><ref name="ReferenceA" /> |
− | | |
− | | |
− | | |
− | {{toclimit|3}} | |
− | | |
− | | |
− | | |
− | == Definition == | |
− | | |
− | [[File:Deep Learning.jpg|alt=Representing Images on Multiple Layers of Abstraction in Deep Learning|thumb|Representing Images on Multiple Layers of Abstraction in Deep Learning <ref>{{Cite journal|last=Schulz|first=Hannes|last2=Behnke|first2=Sven|date=2012-11-01|title=Deep Learning|journal=KI - Künstliche Intelligenz|language=en|volume=26|issue=4|pages=357–363|doi=10.1007/s13218-012-0198-z|issn=1610-1987|url=https://www.semanticscholar.org/paper/51a80649d16a38d41dbd20472deb3bc9b61b59a0}}</ref>]]
| |
− | | |
− | Representing Images on Multiple Layers of Abstraction in Deep Learning
| |
− | | |
− | 深度学习中基于多层抽象的图像表示
| |
− | | |
− | Deep learning is a class of [[machine learning]] [[algorithm]]s that<ref name="BOOK2014">{{cite journal|last2=Yu|first2=D.|year=2014|title=Deep Learning: Methods and Applications|url=http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol7-SIG-039.pdf|journal=Foundations and Trends in Signal Processing|volume=7|issue=3–4|pages=1–199|doi=10.1561/2000000039|last1=Deng|first1=L.}}</ref>{{rp|pages=199–200}} uses multiple layers to progressively extract higher level features from the raw input. For example, in [[image processing]], lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.
| |
− | | |
− | Deep learning is a class of machine learning algorithms that uses multiple layers to progressively extract higher level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.
| |
− | | |
− | 深度学习是一类机器学习算法,它使用多层逐步从原始输入中提取更高层次的特征。例如,在图像处理中,较低的图层可以识别边缘,而较高的图层可以识别与人相关的概念,如数字、字母或面孔。
| |
− | | |
− | | |
− | | |
− | == Overview == | |
− | | |
− | Most modern deep learning models are based on artificial neural networks, specifically, [[Convolutional Neural Network]]s (CNN)s, although they can also include [[propositional formula]]s or latent variables organized layer-wise in deep [[generative model]]s such as the nodes in [[deep belief network]]s and deep [[Boltzmann machine]]s.<ref name="BENGIODEEP">{{cite journal|last=Bengio|first=Yoshua|year=2009|title=Learning Deep Architectures for AI|url=http://sanghv.com/download/soft/machine%20learning,%20artificial%20intelligence,%20mathematics%20ebooks/ML/learning%20deep%20architectures%20for%20AI%20%282009%29.pdf|journal=Foundations and Trends in Machine Learning|volume=2|issue=1|pages=1–127|doi=10.1561/2200000006|citeseerx=10.1.1.701.9550|access-date=2015-09-03|archive-url=https://web.archive.org/web/20160304084250/http://sanghv.com/download/soft/machine%20learning,%20artificial%20intelligence,%20mathematics%20ebooks/ML/learning%20deep%20architectures%20for%20AI%20(2009).pdf|archive-date=2016-03-04|url-status=dead}}</ref>
| |
− | | |
− | Most modern deep learning models are based on artificial neural networks, specifically, Convolutional Neural Networks (CNN)s, although they can also include propositional formulas or latent variables organized layer-wise in deep generative models such as the nodes in deep belief networks and deep Boltzmann machines.
| |
− | | |
− | 大多数现代深度学习模型都是基于人工神经网络,特别是基于卷积神经网络,虽然它们也可以包括命题公式或潜在变量组织在深度生成模型,如深度信念网络和深度玻尔兹曼机器的节点。
| |
− | | |
− | | |
− | | |
− | In deep learning, each level learns to transform its input data into a slightly more abstract and composite representation. In an image recognition application, the raw input may be a [[Matrix (mathematics)|matrix]] of pixels; the first representational layer may abstract the pixels and encode edges; the second layer may compose and encode arrangements of edges; the third layer may encode a nose and eyes; and the fourth layer may recognize that the image contains a face. Importantly, a deep learning process can learn which features to optimally place in which level ''on its own''. (Of course, this does not completely eliminate the need for hand-tuning; for example, varying numbers of layers and layer sizes can provide different degrees of abstraction.)<ref name="BENGIO2012">{{cite journal|last2=Courville|first2=A.|last3=Vincent|first3=P.|year=2013|title=Representation Learning: A Review and New Perspectives|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=35|issue=8|pages=1798–1828|arxiv=1206.5538|doi=10.1109/tpami.2013.50|pmid=23787338|last1=Bengio|first1=Y.}}</ref><ref>{{cite journal|last1=LeCun|first1=Yann|last2=Bengio|first2=Yoshua|last3=Hinton|first3=Geoffrey|title=Deep learning|journal=Nature|date=28 May 2015|volume=521|issue=7553|pages=436–444|doi=10.1038/nature14539|pmid=26017442|bibcode=2015Natur.521..436L|url=https://www.semanticscholar.org/paper/a4cec122a08216fe8a3bc19b22e78fbaea096256}}</ref>
| |
− | | |
− | In deep learning, each level learns to transform its input data into a slightly more abstract and composite representation. In an image recognition application, the raw input may be a matrix of pixels; the first representational layer may abstract the pixels and encode edges; the second layer may compose and encode arrangements of edges; the third layer may encode a nose and eyes; and the fourth layer may recognize that the image contains a face. Importantly, a deep learning process can learn which features to optimally place in which level on its own. (Of course, this does not completely eliminate the need for hand-tuning; for example, varying numbers of layers and layer sizes can provide different degrees of abstraction.)
| |
− | | |
− | 在深度学习中,每个级别都学习将其输入数据转换成稍微更抽象和更复杂的表示。在图像识别应用中,原始输入可以是像素矩阵; 第一代表层可以提取像素并编码边缘; 第二层可以编码边缘排列; 第三层可以编码鼻子和眼睛; 第四层可以识别图像包含人脸。重要的是,深度学习过程可以了解哪些特性可以最佳地放置在哪个级别本身。(当然,这并不能完全消除手工调优的需要; 例如,不同的层数和层大小可以提供不同程度的抽象。)
| |
− | | |
− | | |
− | | |
− | The word "deep" in "deep learning" refers to the number of layers through which the data is transformed. More precisely, deep learning systems have a substantial ''credit assignment path'' (CAP) depth. The CAP is the chain of transformations from input to output. CAPs describe potentially causal connections between input and output. For a [[feedforward neural network]], the depth of the CAPs is that of the network and is the number of hidden layers plus one (as the output layer is also parameterized). For [[recurrent neural network]]s, in which a signal may propagate through a layer more than once, the CAP depth is potentially unlimited.<ref name="SCHIDHUB" /> No universally agreed upon threshold of depth divides shallow learning from deep learning, but most researchers agree that deep learning involves CAP depth higher than 2. CAP of depth 2 has been shown to be a universal approximator in the sense that it can emulate any function.<ref>{{Cite book|url=https://books.google.com/books?id=9CqQDwAAQBAJ&pg=PA15&dq#v=onepage&q&f=false|title=Human Behavior and Another Kind in Consciousness: Emerging Research and Opportunities: Emerging Research and Opportunities|last=Shigeki|first=Sugiyama|date=2019-04-12|publisher=IGI Global|isbn=978-1-5225-8218-2|language=en}}</ref> Beyond that, more layers do not add to the function approximator ability of the network. Deep models (CAP > 2) are able to extract better features than shallow models and hence, extra layers help in learning the features effectively.
| |
− | | |
− | The word "deep" in "deep learning" refers to the number of layers through which the data is transformed. More precisely, deep learning systems have a substantial credit assignment path (CAP) depth. The CAP is the chain of transformations from input to output. CAPs describe potentially causal connections between input and output. For a feedforward neural network, the depth of the CAPs is that of the network and is the number of hidden layers plus one (as the output layer is also parameterized). For recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP depth is potentially unlimited. Beyond that, more layers do not add to the function approximator ability of the network. Deep models (CAP > 2) are able to extract better features than shallow models and hence, extra layers help in learning the features effectively.
| |
− | | |
− | “深度学习”中的“深度”一词指的是数据转换所经过的层数。更准确地说,深度学习系统有一个实质性的学分分配路径(CAP)深度。Cap 是从输入到输出的转换链。Caps 描述了输入和输出之间潜在的因果关系。对于一个前馈神经网络,CAPs 的深度是网络的深度,是隐藏层的数量加上1(因为输出层也是参数化的)。对于回归神经网络,其中一个信号可以通过一个层传播多次,CAP 的深度是潜在的无限的。除此之外,更多的层不会增加网络的函数逼近能力。深层模型(CAP 2)能够比浅层模型更好地提取特征,因此,额外的层有助于有效地学习特征。
| |
− | | |
− | | |
− | | |
− | Deep learning architectures can be constructed with a [[greedy algorithm|greedy]] layer-by-layer method.<ref name=BENGIO2007>{{cite conference | first1=Yoshua | last1=Bengio | first2=Pascal | last2=Lamblin | first3=Dan|last3=Popovici |first4=Hugo|last4=Larochelle | title=Greedy layer-wise training of deep networks| year=2007 | url=http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks.pdf| conference = Advances in neural information processing systems | pages= 153–160}}</ref> Deep learning helps to disentangle these abstractions and pick out which features improve performance.<ref name="BENGIO2012" />
| |
− | | |
− | Deep learning architectures can be constructed with a greedy layer-by-layer method. Deep learning helps to disentangle these abstractions and pick out which features improve performance.
| |
− | | |
− | 深度学习架构可以用贪婪的层层方法构建。深度学习有助于理清这些抽象概念,并找出哪些特性可以提高性能。
| |
− | | |
− | | |
− | | |
− | For [[supervised learning]] tasks, deep learning methods eliminate [[feature engineering]], by translating the data into compact intermediate representations akin to [[Principal Component Analysis|principal components]], and derive layered structures that remove redundancy in representation.
| |
− | | |
− | For supervised learning tasks, deep learning methods eliminate feature engineering, by translating the data into compact intermediate representations akin to principal components, and derive layered structures that remove redundancy in representation.
| |
− | | |
− | 对于监督式学习任务,深度学习方法通过将数据转换成类似于主成分的紧凑的中间表示形式,消除了特征工程,并派生出层次结构,消除了表示中的冗余。
| |
− | | |
− | | |
− | | |
− | Deep learning algorithms can be applied to unsupervised learning tasks. This is an important benefit because unlabeled data are more abundant than the labeled data. Examples of deep structures that can be trained in an unsupervised manner are neural history compressors<ref name="scholarpedia">Jürgen Schmidhuber (2015). Deep Learning. Scholarpedia, 10(11):32832. [http://www.scholarpedia.org/article/Deep_Learning Online]</ref> and [[deep belief network]]s.<ref name="BENGIO2012" /><ref name="SCHOLARDBNS">{{cite journal | last1 = Hinton | first1 = G.E. | year = 2009| title = Deep belief networks | url= | journal = Scholarpedia | volume = 4 | issue = 5| page = 5947 | doi=10.4249/scholarpedia.5947| bibcode = 2009SchpJ...4.5947H}}</ref>
| |
− | | |
− | Deep learning algorithms can be applied to unsupervised learning tasks. This is an important benefit because unlabeled data are more abundant than the labeled data. Examples of deep structures that can be trained in an unsupervised manner are neural history compressors and deep belief networks.
| |
− | | |
− | 深度学习算法可以应用于非监督式学习任务。这是一个重要的好处,因为未标记的数据比标记的数据更加丰富。可以用无监督方式训练的深层结构的例子有神经历史压缩器和深度信念网络。
| |
− | | |
− | | |
− | | |
− | == Interpretations ==
| |
− | | |
− | Deep neural networks are generally interpreted in terms of the [[universal approximation theorem]]<ref name="ReferenceB">Balázs Csanád Csáji (2001). Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary</ref><ref name=cyb>{{cite journal | last1 = Cybenko | year = 1989 | title = Approximations by superpositions of sigmoidal functions | url = http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf | journal = [[Mathematics of Control, Signals, and Systems]] | volume = 2 | issue = 4 | pages = 303–314 | doi = 10.1007/bf02551274 | url-status = dead | archiveurl = https://web.archive.org/web/20151010204407/http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf | archivedate = 2015-10-10 }}</ref><ref name=horn>{{cite journal | last1 = Hornik | first1 = Kurt | year = 1991 | title = Approximation Capabilities of Multilayer Feedforward Networks | url= | journal = Neural Networks | volume = 4 | issue = 2| pages = 251–257 | doi=10.1016/0893-6080(91)90009-t}}</ref><ref name="Haykin, Simon 1998">{{cite book|first=Simon S. |last=Haykin|title=Neural Networks: A Comprehensive Foundation|url={{google books |plainurl=y |id=bX4pAQAAMAAJ}}|year=1999|publisher=Prentice Hall|isbn=978-0-13-273350-2}}</ref><ref name="Hassoun, M. 1995 p. 48">{{cite book|first=Mohamad H. |last=Hassoun|title=Fundamentals of Artificial Neural Networks|url={{google books |plainurl=y |id=Otk32Y3QkxQC|page=48}}|year=1995|publisher=MIT Press|isbn=978-0-262-08239-6|p=48}}</ref><ref name=ZhouLu>Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). [http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width The Expressive Power of Neural Networks: A View from the Width]. Neural Information Processing Systems, 6231-6239.
| |
− | | |
− | Deep neural networks are generally interpreted in terms of the universal approximation theorem<ref name=ZhouLu>Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). [http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width The Expressive Power of Neural Networks: A View from the Width]. Neural Information Processing Systems, 6231-6239.
| |
− | | |
− | 深层神经网络通常是根据泛近似定理来解释的,该定理的名称为: [[[] ,[] ,[]] ,[] ,[]] ,[] ,[]] ,[] ,[] ,[] ,[]]。Http://papers.nips.cc/paper/7203-The-Expressive-Power-of-Neural-Networks-a-View-from-The-Width : 神经网络的表达能力: 从宽度来看。神经信息处理系统,6231-6239。
| |
− | | |
− | </ref> or [[Bayesian inference|probabilistic inference]].<ref name="BOOK2014" /><ref name="BENGIODEEP" /><ref name="BENGIO2012" /><ref name="SCHIDHUB">{{cite journal|last=Schmidhuber|first=J.|year=2015|title=Deep Learning in Neural Networks: An Overview|journal=Neural Networks|volume=61|pages=85–117|arxiv=1404.7828|doi=10.1016/j.neunet.2014.09.003|pmid=25462637|url=https://www.semanticscholar.org/paper/126df9f24e29feee6e49e135da102fbbd9154a48}}</ref><ref name="SCHOLARDBNS" /><ref name = MURPHY>{{cite book|first=Kevin P. |last=Murphy|title=Machine Learning: A Probabilistic Perspective|url={{google books |plainurl=y |id=NZP6AQAAQBAJ}}|date=24 August 2012|publisher=MIT Press|isbn=978-0-262-01802-9}}</ref><ref name= "Patel NIPS 2016">{{Cite journal|url=https://papers.nips.cc/paper/6231-a-probabilistic-framework-for-deep-learning.pdf|title=A Probabilistic Framework for Deep Learning|last=Patel|first=Ankit|last2=Nguyen|first2=Tan|last3=Baraniuk|first3=Richard|date=2016|journal=Advances in Neural Information Processing Systems|pages=|bibcode=2016arXiv161201936P|arxiv=1612.01936}}</ref>
| |
− | | |
− | </ref> or probabilistic inference.
| |
− | | |
− | / ref 或概率推断。
| |
− | | |
− | | |
− | | |
− | The classic universal approximation theorem concerns the capacity of [[feedforward neural networks]] with a single hidden layer of finite size to approximate [[continuous functions]].<ref name="ReferenceB"/><ref name="cyb"/><ref name="horn"/><ref name="Haykin, Simon 1998"/><ref name="Hassoun, M. 1995 p. 48"/> In 1989, the first proof was published by [[George Cybenko]] for [[sigmoid function|sigmoid]] activation functions<ref name="cyb" /> and was generalised to feed-forward multi-layer architectures in 1991 by Kurt Hornik.<ref name="horn" /> Recent work also showed that universal approximation also holds for non-bounded activation functions such as the rectified linear unit.<ref name=sonoda17>{{cite journal | last1 = Sonoda | first1 = Sho | last2=Murata | first2=Noboru | year = 2017 | title = Neural network with unbounded activation functions is universal approximator | journal = Applied and Computational Harmonic Analysis | volume = 43 | issue = 2 | pages = 233–268 | doi = 10.1016/j.acha.2015.12.005| arxiv = 1505.03654 | url = https://www.semanticscholar.org/paper/d0e48a4d5d6d0b4aa2dbab2c50560945e62a3817 }}</ref>
| |
− | | |
− | The classic universal approximation theorem concerns the capacity of feedforward neural networks with a single hidden layer of finite size to approximate continuous functions.
| |
− | | |
− | 经典的通用逼近定理是关于具有一个有限大小的隐层的前向神经网络逼近连续函数的能力。
| |
− | | |
− | | |
− | | |
− | The universal approximation theorem for [[deep neural network]]s concerns the capacity of networks with bounded width but the depth is allowed to grow. Lu et al.<ref name=ZhouLu/> proved that if the width of a [[deep neural network]] with [[ReLU]] activation is strictly larger than the input dimension, then the network can approximate any [[Lebesgue integration|Lebesgue integrable function]]; If the width is smaller or equal to the input dimension, then [[deep neural network]] is not a universal approximator.
| |
− | | |
− | The universal approximation theorem for deep neural networks concerns the capacity of networks with bounded width but the depth is allowed to grow. Lu et al. proved that if the width of a deep neural network with ReLU activation is strictly larger than the input dimension, then the network can approximate any Lebesgue integrable function; If the width is smaller or equal to the input dimension, then deep neural network is not a universal approximator.
| |
− | | |
− | 深度神经网络的通用近似定理是关于宽度有界但深度可以增长的网络的容量。卢等人。证明了如果具有 ReLU 激活的深层神经网络的宽度严格大于输入维数,则该网络可以逼近任意 Lebesgue 可积函数; 如果宽度小于或等于输入维数,则深层神经网络不是通用逼近器。
| |
− | | |
− | | |
− | | |
− | The [[probabilistic]] interpretation<ref name="MURPHY" /> derives from the field of [[machine learning]]. It features inference,<ref name="BOOK2014" /><ref name="BENGIODEEP" /><ref name="BENGIO2012" /><ref name="SCHIDHUB" /><ref name="SCHOLARDBNS" /><ref name="MURPHY" /> as well as the [[optimization]] concepts of [[training]] and [[test (assessment)|testing]], related to fitting and [[generalization]], respectively. More specifically, the probabilistic interpretation considers the activation nonlinearity as a [[cumulative distribution function]].<ref name="MURPHY" /> The probabilistic interpretation led to the introduction of [[dropout (neural networks)|dropout]] as [[Regularization (mathematics)|regularizer]] in neural networks.<ref name="DROPOUT">{{cite arXiv |last1=Hinton |first1=G. E. |last2=Srivastava| first2 =N.|last3=Krizhevsky| first3=A.| last4 =Sutskever| first4=I.| last5=Salakhutdinov| first5=R.R.|eprint=1207.0580 |class=math.LG |title=Improving neural networks by preventing co-adaptation of feature detectors |date=2012}}</ref> The probabilistic interpretation was introduced by researchers including [[John Hopfield|Hopfield]], [[Bernard Widrow|Widrow]] and [[Kumpati S. Narendra|Narendra]] and popularized in surveys such as the one by [[Christopher Bishop|Bishop]].<ref name="prml">{{cite book|title=Pattern Recognition and Machine Learning|author=Bishop, Christopher M.|year=2006|publisher=Springer|url=http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf|isbn=978-0-387-31073-2}}</ref>
| |
− | | |
− | The probabilistic interpretation The probabilistic interpretation was introduced by researchers including Hopfield, Widrow and Narendra and popularized in surveys such as the one by Bishop.
| |
− | | |
− | 概率解释概率解释是由包括霍普菲尔德、维德罗和纳伦德拉在内的研究人员提出的,并在毕肖普等人的调查中得到推广。
| |
− | | |
− | | |
− | | |
− | == History ==
| |
− | | |
− | The first general, working learning algorithm for supervised, deep, feedforward, multilayer [[perceptron]]s was published by [[Alexey Ivakhnenko]] and Lapa in 1967.<ref name="ivak1965">{{cite book|first1=A. G. |last1=Ivakhnenko |first2=V. G. |last2=Lapa |title=Cybernetics and Forecasting Techniques|url={{google books |plainurl=y |id=rGFgAAAAMAAJ}}|year=1967|publisher=American Elsevier Publishing Co.|isbn=978-0-444-00020-0}}</ref> A 1971 paper described already a deep network with 8 layers trained by the [[group method of data handling]] algorithm.<ref name="ivak1971">{{Cite journal|last=Ivakhnenko|first=Alexey|date=1971|title=Polynomial theory of complex systems|url=http://gmdh.net/articles/history/polynomial.pdf |journal=IEEE Transactions on Systems, Man and Cybernetics |pages=364–378|doi=10.1109/TSMC.1971.4308320|pmid=|accessdate=|volume=SMC-1|issue=4}}</ref> Other deep learning working architectures, specifically those built for [[computer vision]], began with the [[Neocognitron]] introduced by [[Kunihiko Fukushima]] in 1980.<ref name="FUKU1980">{{cite journal | last1 = Fukushima | first1 = K. | year = 1980 | title = Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position | url= | journal = Biol. Cybern. | volume = 36 | issue = 4| pages = 193–202 | doi=10.1007/bf00344251 | pmid=7370364}}</ref>
| |
− | | |
− | The first general, working learning algorithm for supervised, deep, feedforward, multilayer perceptrons was published by Alexey Ivakhnenko and Lapa in 1967. A 1971 paper described already a deep network with 8 layers trained by the group method of data handling algorithm. Other deep learning working architectures, specifically those built for computer vision, began with the Neocognitron introduced by Kunihiko Fukushima in 1980.
| |
− | | |
− | 第一个一般的,工作学习算法的监督,深,前馈,多层感知器是由 Alexey Ivakhnenko 和 Lapa 在1967年发表。1971年的一篇论文已经描述了一个用数据处理算法的组合方法训练的8层深度网络。其他的深度学习工作架构,特别是那些为计算机视觉建造的架构,始于1980年由福岛邦彦介绍的 Neocognitron。
| |
− | | |
− | | |
− | | |
− | The term ''Deep Learning'' was introduced to the machine learning community by [[Rina Dechter]] in 1986,<ref name="dechter1986">[[Rina Dechter]] (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory.[https://www.researchgate.net/publication/221605378_Learning_While_Searching_in_Constraint-Satisfaction-Problems Online]</ref><ref name="scholarpedia" /> and to [[Artificial Neural Networks|artificial neural networks]] by Igor Aizenberg and colleagues in 2000, in the context of [[Boolean network|Boolean]] threshold neurons.<ref name="aizenberg2000">Igor Aizenberg, Naum N. Aizenberg, Joos P.L. Vandewalle (2000). Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications. Springer Science & Business Media.</ref><ref>Co-evolving recurrent neurons learn deep memory POMDPs. Proc. GECCO, Washington, D. C., pp. 1795-1802, ACM Press, New York, NY, USA, 2005.</ref>
| |
− | | |
− | The term Deep Learning was introduced to the machine learning community by Rina Dechter in 1986,
| |
− | | |
− | 深度学习这个术语是由 Rina Dechter 在1986年引入机器学习社区的,
| |
− | | |
− | | |
− | | |
− | In 1989, [[Yann LeCun]] et al. applied the standard backpropagation algorithm, which had been around as the reverse mode of [[automatic differentiation]] since 1970,<ref name="lin1970">[[Seppo Linnainmaa]] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6-7.</ref><ref name="grie2012">{{Cite journal|last=Griewank|first=Andreas|date=2012|title=Who Invented the Reverse Mode of Differentiation?|url=http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|journal=Documenta Mathematica|issue=Extra Volume ISMP|pages=389–400|access-date=2017-06-11|archive-url=https://web.archive.org/web/20170721211929/http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|archive-date=2017-07-21|url-status=dead}}</ref><ref name="WERBOS1974">{{Cite journal|last=Werbos|first=P.|date=1974|title=Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences |url=https://www.researchgate.net/publication/35657389 |journal=Harvard University |accessdate=12 June 2017}}</ref><ref name="werbos1982">{{Cite book|chapter-url=ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf|title=System modeling and optimization|last=Werbos|first=Paul|publisher=Springer|year=1982|isbn=|location=|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis}}</ref> to a deep neural network with the purpose of [[Handwriting_recognition|recognizing handwritten ZIP code]]s on mail. While the algorithm worked, training required 3 days.<ref name="LECUN1989">LeCun ''et al.'', "Backpropagation Applied to Handwritten Zip Code Recognition," ''Neural Computation'', 1, pp. 541–551, 1989.</ref>
| |
− | | |
− | In 1989, Yann LeCun et al. applied the standard backpropagation algorithm, which had been around as the reverse mode of automatic differentiation since 1970, to a deep neural network with the purpose of recognizing handwritten ZIP codes on mail. While the algorithm worked, training required 3 days.
| |
− | | |
− | In 1989, Yann LeCun et al.将标准的反向传播算法应用于深度神经网络,该算法自1970年以来一直被用作自动微分邮政编码的反向模式,目的是识别手写邮政编码。当算法工作时,训练需要3天。
| |
− | | |
− | | |
− | | |
− | By 1991 such systems were used for recognizing isolated 2-D hand-written digits, while [[3D object recognition|recognizing 3-D objects]] was done by matching 2-D images with a handcrafted 3-D object model. Weng ''et al.'' suggested that a human brain does not use a monolithic 3-D object model and in 1992 they published Cresceptron,<ref name="Weng1992">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf Cresceptron: a self-organizing neural network which grows adaptively]," ''Proc. International Joint Conference on Neural Networks'', Baltimore, Maryland, vol I, pp. 576-581, June, 1992.</ref><ref name="Weng1993">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf Learning recognition and segmentation of 3-D objects from 2-D images]," ''Proc. 4th International Conf. Computer Vision'', Berlin, Germany, pp. 121-128, May, 1993.</ref><ref name="Weng1997">J. Weng, N. Ahuja and T. S. Huang, "[http://www.cse.msu.edu/~weng/research/CresceptronIJCV.pdf Learning recognition and segmentation using the Cresceptron]," ''International Journal of Computer Vision'', vol. 25, no. 2, pp. 105-139, Nov. 1997.</ref> a method for performing 3-D object recognition in cluttered scenes. Because it directly used natural images, Cresceptron started the beginning of general-purpose visual learning for natural 3D worlds. Cresceptron is a cascade of layers similar to Neocognitron. But while Neocognitron required a human programmer to hand-merge features, Cresceptron learned an open number of features in each layer without supervision, where each feature is represented by a [[Convolution|convolution kernel]]. Cresceptron segmented each learned object from a cluttered scene through back-analysis through the network. [[Max pooling]], now often adopted by deep neural networks (e.g. [[ImageNet]] tests), was first used in Cresceptron to reduce the position resolution by a factor of (2x2) to 1 through the cascade for better generalization.
| |
− | | |
− | By 1991 such systems were used for recognizing isolated 2-D hand-written digits, while recognizing 3-D objects was done by matching 2-D images with a handcrafted 3-D object model. Weng et al. suggested that a human brain does not use a monolithic 3-D object model and in 1992 they published Cresceptron, a method for performing 3-D object recognition in cluttered scenes. Because it directly used natural images, Cresceptron started the beginning of general-purpose visual learning for natural 3D worlds. Cresceptron is a cascade of layers similar to Neocognitron. But while Neocognitron required a human programmer to hand-merge features, Cresceptron learned an open number of features in each layer without supervision, where each feature is represented by a convolution kernel. Cresceptron segmented each learned object from a cluttered scene through back-analysis through the network. Max pooling, now often adopted by deep neural networks (e.g. ImageNet tests), was first used in Cresceptron to reduce the position resolution by a factor of (2x2) to 1 through the cascade for better generalization.
| |
− | | |
− | 到1991年,这种系统被用于识别孤立的二维手写数字,同时识别三维物体是通过将二维图像与手工制作的三维物体模型进行匹配来完成的。翁等人。1992年,他们发表了 crescetron,这是一种在凌乱的场景中进行三维物体识别的方法。因为它直接使用自然图像,crescetron 开始了自然3D 世界的通用视觉学习。甲酚管是一个层叠的类似于 Neocognitron。但是当 Neocognitron 需要一个人类程序员手工合并特性时,Cresceptron 在没有监督的情况下学习了每一层的开放数量的特性,其中每一个特性都由一个卷积内核来表示。通过网络反向分析,从混乱的场景中分割出每个学习对象。最大池,现在通常采用的深度神经网络(例如:。Imagenet 测试) ,首先在 Cresceptron 使用,通过级联将位置分辨率降低一个因子(2x2)到1,以便更好地泛化。
| |
− | | |
− | | |
− | | |
− | In 1994, André de Carvalho, together with Mike Fairhurst and David Bisset, published experimental results of a multi-layer boolean neural network, also known as a weightless neural network, composed of a 3-layers self-organising feature extraction neural network module (SOFT) followed by a multi-layer classification neural network module (GSN), which were independently trained. Each layer in the feature extraction module extracted features with growing complexity regarding the previous layer.<ref>{{Cite journal |title=An integrated Boolean neural network for pattern classification |journal=Pattern Recognition Letters |date=1994-08-08 |pages=807–813 |volume=15 |issue=8 |doi=10.1016/0167-8655(94)90009-4 |first=Andre C. L. F. |last1=de Carvalho |first2 = Mike C. |last2=Fairhurst |first3=David |last3 = Bisset}}</ref>
| |
− | | |
− | In 1994, André de Carvalho, together with Mike Fairhurst and David Bisset, published experimental results of a multi-layer boolean neural network, also known as a weightless neural network, composed of a 3-layers self-organising feature extraction neural network module (SOFT) followed by a multi-layer classification neural network module (GSN), which were independently trained. Each layer in the feature extraction module extracted features with growing complexity regarding the previous layer.
| |
− | | |
− | 1994年,andr de Carvalho 与 Mike Fairhurst 和 David Bisset 一起发表了一个多层布尔神经网络(也称为无重力神经网络)的实验结果,该网络由一个三层自组织特征提取神经网络模块(SOFT)和一个多层分类神经网络模块(GSN)组成,后者经过独立训练。特征提取模块中的每一层对于前一层提取的特征复杂度越来越高。
| |
− | | |
− | | |
− | | |
− | In 1995, [[Brendan Frey]] demonstrated that it was possible to train (over two days) a network containing six fully connected layers and several hundred hidden units using the [[wake-sleep algorithm]], co-developed with [[Peter Dayan]] and [[Geoffrey Hinton|Hinton]].<ref>{{Cite journal|title = The wake-sleep algorithm for unsupervised neural networks |journal = Science|date = 1995-05-26|pages = 1158–1161|volume = 268|issue = 5214|doi = 10.1126/science.7761831|pmid = 7761831|first = Geoffrey E.|last = Hinton|first2 = Peter|last2 = Dayan|first3 = Brendan J.|last3 = Frey|first4 = Radford|last4 = Neal|bibcode = 1995Sci...268.1158H}}</ref> Many factors contribute to the slow speed, including the [[vanishing gradient problem]] analyzed in 1991 by [[Sepp Hochreiter]].<ref name="HOCH1991">S. Hochreiter., "[http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf Untersuchungen zu dynamischen neuronalen Netzen]," ''Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber'', 1991.</ref><ref name="HOCH2001">{{cite book|chapter-url={{google books |plainurl=y |id=NWOcMVA64aAC}}|title=A Field Guide to Dynamical Recurrent Networks|last=Hochreiter|first=S.|display-authors=etal|date=15 January 2001|publisher=John Wiley & Sons|isbn=978-0-7803-5369-5|location=|pages=|chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies|editor-last2=Kremer|editor-first2=Stefan C.|editor-first1=John F.|editor-last1=Kolen}}</ref>
| |
− | | |
− | In 1995, Brendan Frey demonstrated that it was possible to train (over two days) a network containing six fully connected layers and several hundred hidden units using the wake-sleep algorithm, co-developed with Peter Dayan and Hinton. Many factors contribute to the slow speed, including the vanishing gradient problem analyzed in 1991 by Sepp Hochreiter.
| |
− | | |
− | 1995年,Brendan Frey 证明了使用唤醒-睡眠算法(wake-sleep algorithm)训练一个包含六个完全连接的层和数百个隐藏单元的网络是可能的(超过两天) ,这个算法是与 Peter Dayan 和 Hinton 共同开发的。导致速度慢的因素很多,包括 Sepp Hochreiter 在1991年分析的消失梯度问题。
| |
− | | |
− | | |
− | | |
− | Simpler models that use task-specific handcrafted features such as [[Gabor filter]]s and [[support vector machine]]s (SVMs) were a popular choice in the 1990s and 2000s, because of [[artificial neural network]]'s (ANN) computational cost and a lack of understanding of how the brain wires its biological networks.
| |
− | | |
− | Simpler models that use task-specific handcrafted features such as Gabor filters and support vector machines (SVMs) were a popular choice in the 1990s and 2000s, because of artificial neural network's (ANN) computational cost and a lack of understanding of how the brain wires its biological networks.
| |
− | | |
− | 由于人工神经网络(ANN)的计算成本以及对大脑如何连接其生物网络缺乏了解,使得使用诸如 Gabor 过滤器和支持向量机(SVMs)等特定任务的简单模型在上世纪九十年代和本世纪初成为一种流行的选择。
| |
− | | |
− | | |
− | | |
− | Both shallow and deep learning (e.g., recurrent nets) of ANNs have been explored for many years.<ref>{{Cite journal|last=Morgan|first=Nelson|last2=Bourlard |first2=Hervé |last3=Renals |first3=Steve |last4=Cohen |first4=Michael|last5=Franco |first5=Horacio |date=1993-08-01 |title=Hybrid neural network/hidden markov model systems for continuous speech recognition |journal=International Journal of Pattern Recognition and Artificial Intelligence|volume=07|issue=4|pages=899–916|doi=10.1142/s0218001493000455|issn=0218-0014}}</ref><ref name="Robinson1992">{{Cite journal|last=Robinson|first=T.|authorlink=Tony Robinson (speech recognition)|date=1992|title=A real-time recurrent error propagation network word recognition system|url=http://dl.acm.org/citation.cfm?id=1895720|journal=ICASSP|pages=617–620|via=|isbn=9780780305328|series=Icassp'92}}</ref><ref>{{Cite journal|last=Waibel|first=A.|last2=Hanazawa|first2=T.|last3=Hinton|first3=G.|last4=Shikano|first4=K.|last5=Lang|first5=K. J.|date=March 1989|title=Phoneme recognition using time-delay neural networks|journal=IEEE Transactions on Acoustics, Speech, and Signal Processing|volume=37|issue=3|pages=328–339|doi=10.1109/29.21701|issn=0096-3518|hdl=10338.dmlcz/135496|url=http://dml.cz/bitstream/handle/10338.dmlcz/135496/Kybernetika_38-2002-6_2.pdf}}</ref> These methods never outperformed non-uniform internal-handcrafting Gaussian [[mixture model]]/[[Hidden Markov model]] (GMM-HMM) technology based on generative models of speech trained discriminatively.<ref name="Baker2009">{{cite journal | last1 = Baker | first1 = J. | last2 = Deng | first2 = Li | last3 = Glass | first3 = Jim | last4 = Khudanpur | first4 = S. | last5 = Lee | first5 = C.-H. | last6 = Morgan | first6 = N. | last7 = O'Shaughnessy | first7 = D. | year = 2009 | title = Research Developments and Directions in Speech Recognition and Understanding, Part 1 | url= | journal = IEEE Signal Processing Magazine | volume = 26 | issue = 3| pages = 75–80 | doi=10.1109/msp.2009.932166| bibcode = 2009ISPM...26...75B }}</ref> Key difficulties have been analyzed, including gradient diminishing<ref name="HOCH1991" /> and weak temporal correlation structure in neural predictive models.<ref name="Bengio1991">{{Cite web|url=https://www.researchgate.net/publication/41229141|title=Artificial Neural Networks and their Application to Speech/Sequence Recognition|last=Bengio|first=Y.|date=1991|website=|publisher=McGill University Ph.D. thesis|accessdate=}}</ref><ref name="Deng1994">{{cite journal | last1 = Deng | first1 = L. | last2 = Hassanein | first2 = K. | last3 = Elmasry | first3 = M. | year = 1994 | title = Analysis of correlation structure for a neural predictive model with applications to speech recognition | url= | journal = Neural Networks | volume = 7 | issue = 2| pages = 331–339 | doi=10.1016/0893-6080(94)90027-2}}</ref> Additional difficulties were the lack of training data and limited computing power.
| |
− | | |
− | Both shallow and deep learning (e.g., recurrent nets) of ANNs have been explored for many years. These methods never outperformed non-uniform internal-handcrafting Gaussian mixture model/Hidden Markov model (GMM-HMM) technology based on generative models of speech trained discriminatively. Key difficulties have been analyzed, including gradient diminishing Additional difficulties were the lack of training data and limited computing power.
| |
− | | |
− | 人工神经网络的浅学习和深学习(如复发网络)已经探索了很多年。这些方法从未优于基于辨别训练语音生成模型的非均匀内手工高斯混合模型 / 隐马尔可夫模型(GMM-HMM)技术。关键的困难已经分析,包括梯度递减额外的困难是缺乏训练数据和有限的计算能力。
| |
− | | |
− | | |
− | | |
− | Most [[speech recognition]] researchers moved away from neural nets to pursue generative modeling. An exception was at [[SRI International]] in the late 1990s. Funded by the US government's [[National Security Agency|NSA]] and [[DARPA]], SRI studied deep neural networks in speech and speaker recognition. The speaker recognition team led by [[Larry Heck]] reported significant success with deep neural networks in speech processing in the 1998 [[National Institute of Standards and Technology]] Speaker Recognition evaluation.<ref name="Doddington2000">{{cite journal | last1 = Doddington | first1 = G. | last2 = Przybocki | first2 = M. | last3 = Martin | first3 = A. | last4 = Reynolds | first4 = D. | year = 2000 | title = The NIST speaker recognition evaluation ± Overview, methodology, systems, results, perspective | url= | journal = Speech Communication | volume = 31 | issue = 2| pages = 225–254 | doi=10.1016/S0167-6393(99)00080-1}}</ref> The SRI deep neural network was then deployed in the Nuance Verifier, representing the first major industrial application of deep learning.<ref name="Heck2000">{{cite journal | last1 = Heck | first1 = L. | last2 = Konig | first2 = Y. | last3 = Sonmez | first3 = M. | last4 = Weintraub | first4 = M. | year = 2000 | title = Robustness to Telephone Handset Distortion in Speaker Recognition by Discriminative Feature Design | url= | journal = Speech Communication | volume = 31 | issue = 2| pages = 181–192 | doi=10.1016/s0167-6393(99)00077-1}}</ref>
| |
− | | |
− | Most speech recognition researchers moved away from neural nets to pursue generative modeling. An exception was at SRI International in the late 1990s. Funded by the US government's NSA and DARPA, SRI studied deep neural networks in speech and speaker recognition. The speaker recognition team led by Larry Heck reported significant success with deep neural networks in speech processing in the 1998 National Institute of Standards and Technology Speaker Recognition evaluation. The SRI deep neural network was then deployed in the Nuance Verifier, representing the first major industrial application of deep learning.
| |
− | | |
− | 大多数语音识别研究人员离开了神经网络,转而追求生成模型。上世纪90年代末,斯坦福国际研究所(SRI International)是一个例外。由美国国家安全局和美国国防部高级研究计划局资助,SRI 研究了语音和讲话者识别深层神经网络。由 Larry Heck 领导的讲话者识别研究小组在1998年国家标准和技术研究所的讲话者识别评估中报告了深层神经网络在语音处理方面取得的显著成功。然后,在 Nuance Verifier 中部署了 SRI 深度神经网络,代表了深度学习的第一个主要工业应用。
| |
− | | |
− | | |
− | | |
− | The principle of elevating "raw" features over hand-crafted optimization was first explored successfully in the architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features in the late 1990s,<ref name="Heck2000" /> showing its superiority over the Mel-Cepstral features that contain stages of fixed transformation from spectrograms. The raw features of speech, [[waveform]]s, later produced excellent larger-scale results.<ref>{{Cite web|url=https://www.researchgate.net/publication/266030526|title=Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR (PDF Download Available)|website=ResearchGate|accessdate=2017-06-14}}</ref>
| |
− | | |
− | The principle of elevating "raw" features over hand-crafted optimization was first explored successfully in the architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features in the late 1990s,
| |
− | | |
− | 提升“原始”特性的原则超过手工优化是第一次探索成功的结构深自动编码器的“原始”光谱图或线性滤波器组功能,在20世纪90年代后期,
| |
− | | |
− | | |
− | | |
− | Many aspects of speech recognition were taken over by a deep learning method called [[long short-term memory]] (LSTM), a recurrent neural network published by Hochreiter and [[Jürgen Schmidhuber|Schmidhuber]] in 1997.<ref name=":0">{{Cite journal|last=Hochreiter|first=Sepp|last2=Schmidhuber|first2=Jürgen|date=1997-11-01|title=Long Short-Term Memory|journal=Neural Computation|volume=9|issue=8|pages=1735–1780|doi=10.1162/neco.1997.9.8.1735|issn=0899-7667|pmid=9377276|url=https://www.semanticscholar.org/paper/44d2abe2175df8153f465f6c39b68b76a0d40ab9}}</ref> LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks<ref name="SCHIDHUB" /> that require memories of events that happened thousands of discrete time steps before, which is important for speech. In 2003, LSTM started to become competitive with traditional speech recognizers on certain tasks.<ref name="graves2003">{{Cite web|url=Ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf|title=Biologically Plausible Speech Recognition with LSTM Neural Nets|last=Graves|first=Alex|last2=Eck|first2=Douglas|date=2003|website=1st Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland|pages=175–184|last3=Beringer|first3=Nicole|last4=Schmidhuber|first4=Jürgen}}</ref> Later it was combined with connectionist temporal classification (CTC)<ref name=":1">{{Cite journal|last=Graves|first=Alex|last2=Fernández|first2=Santiago|last3=Gomez|first3=Faustino|date=2006|title=Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks|journal=Proceedings of the International Conference on Machine Learning, ICML 2006|pages=369–376|citeseerx=10.1.1.75.6306}}</ref> in stacks of LSTM RNNs.<ref name="fernandez2007keyword">Santiago Fernandez, Alex Graves, and Jürgen Schmidhuber (2007). [https://mediatum.ub.tum.de/doc/1289941/file.pdf An application of recurrent neural networks to discriminative keyword spotting]. Proceedings of ICANN (2), pp. 220–229.</ref> In 2015, Google's speech recognition reportedly experienced a dramatic performance jump of 49% through CTC-trained LSTM, which they made available through [[Google Voice Search]].<ref name="sak2015">{{Cite web|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html|title=Google voice search: faster and more accurate|last=Sak|first=Haşim|last2=Senior|first2=Andrew|date=September 2015|website=|accessdate=|last3=Rao|first3=Kanishka|last4=Beaufays|first4=Françoise|last5=Schalkwyk|first5=Johan}}</ref>
| |
− | | |
− | Many aspects of speech recognition were taken over by a deep learning method called long short-term memory (LSTM), a recurrent neural network published by Hochreiter and Schmidhuber in 1997. LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks Later it was combined with connectionist temporal classification (CTC) in stacks of LSTM RNNs. In 2015, Google's speech recognition reportedly experienced a dramatic performance jump of 49% through CTC-trained LSTM, which they made available through Google Voice Search.
| |
− | | |
− | 语音识别的许多方面被一种叫做长短期记忆的深度学习方法所取代,这是 Hochreiter 和 Schmidhuber 在1997年发表的一篇递归神经网络。Lstm RNNs 避免了梯度消失问题,能够学习“甚深度学习”任务,后来它与连接主义时间分类(CTC)结合在 LSTM RNNs 的堆栈中。2015年,据报道,Google 的语音识别系统通过受过 ctc 训练的语音识别系统(LSTM)获得了49% 的巨大性能提升,这些语音识别系统可以通过 Google 语音搜索获得。
| |
− | | |
− | | |
− | | |
− | In 2006, publications by [[Geoffrey Hinton|Geoff Hinton]], [[Russ Salakhutdinov|Ruslan Salakhutdinov]], Osindero and [[Yee Whye Teh|Teh]]<ref>{{Cite journal|last=Hinton|first=Geoffrey E.|date=2007-10-01|title=Learning multiple layers of representation|url=http://www.cell.com/trends/cognitive-sciences/abstract/S1364-6613(07)00217-3|journal=Trends in Cognitive Sciences|volume=11|issue=10|pages=428–434|doi=10.1016/j.tics.2007.09.004|issn=1364-6613|pmid=17921042}}</ref>
| |
− | | |
− | In 2006, publications by Geoff Hinton, Ruslan Salakhutdinov, Osindero and Teh
| |
− | | |
− | 2006年,Geoff Hinton,Ruslan Salakhutdinov,Osindero and Teh 出版
| |
− | | |
− | <ref name=hinton06>{{Cite journal | last1 = Hinton | first1 = G. E. |authorlink1=Geoff Hinton| last2 = Osindero | first2 = S. | last3 = Teh | first3 = Y. W. | doi = 10.1162/neco.2006.18.7.1527 | title = A Fast Learning Algorithm for Deep Belief Nets | journal = [[Neural Computation (journal)|Neural Computation]]| volume = 18 | issue = 7 | pages = 1527–1554 | year = 2006 | pmid = 16764513| pmc = | url = http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf}}</ref><ref name=bengio2012>{{cite arXiv |last=Bengio |first=Yoshua |author-link=Yoshua Bengio |eprint=1206.5533 |title=Practical recommendations for gradient-based training of deep architectures |class=cs.LG|year=2012 }}</ref> showed how a many-layered [[feedforward neural network]] could be effectively pre-trained one layer at a time, treating each layer in turn as an unsupervised [[restricted Boltzmann machine]], then fine-tuning it using supervised [[backpropagation]].<ref name="HINTON2007">G. E. Hinton., "[http://www.csri.utoronto.ca/~hinton/absps/ticsdraft.pdf Learning multiple layers of representation]," ''Trends in Cognitive Sciences'', 11, pp. 428–434, 2007.</ref> The papers referred to ''learning'' for ''deep belief nets.''
| |
− | | |
− | showed how a many-layered feedforward neural network could be effectively pre-trained one layer at a time, treating each layer in turn as an unsupervised restricted Boltzmann machine, then fine-tuning it using supervised backpropagation. The papers referred to learning for deep belief nets.
| |
− | | |
− | 展示了如何有效地预先训练一个层次的多层前馈神经网络,将每个层次视为无监督的受限玻尔兹曼机,然后使用有监督的反向传播对其进行微调。论文提到了深度信念网的学习。
| |
− | | |
− | | |
− | | |
− | Deep learning is part of state-of-the-art systems in various disciplines, particularly computer vision and [[automatic speech recognition]] (ASR). Results on commonly used evaluation sets such as [[TIMIT]] (ASR) and [[MNIST database|MNIST]] ([[image classification]]), as well as a range of large-vocabulary speech recognition tasks have steadily improved.<ref name="HintonDengYu2012" /><ref>{{cite journal|url=https://www.microsoft.com/en-us/research/publication/new-types-of-deep-neural-network-learning-for-speech-recognition-and-related-applications-an-overview/|title=New types of deep neural network learning for speech recognition and related applications: An overview|journal=Microsoft Research|first1=Li|last1=Deng|first2=Geoffrey|last2=Hinton|first3=Brian|last3=Kingsbury|date=1 May 2013|via=research.microsoft.com|citeseerx=10.1.1.368.1123}}</ref><ref>{{Cite book |doi=10.1109/icassp.2013.6639345|isbn=978-1-4799-0356-6|chapter=Recent advances in deep learning for speech research at Microsoft|title=2013 IEEE International Conference on Acoustics, Speech and Signal Processing|pages=8604–8608|year=2013|last1=Deng|first1=Li|last2=Li|first2=Jinyu|last3=Huang|first3=Jui-Ting|last4=Yao|first4=Kaisheng|last5=Yu|first5=Dong|last6=Seide|first6=Frank|last7=Seltzer|first7=Michael|last8=Zweig|first8=Geoff|last9=He|first9=Xiaodong|last10=Williams|first10=Jason|last11=Gong|first11=Yifan|last12=Acero|first12=Alex}}</ref> [[Convolutional neural network]]s (CNNs) were superseded for ASR by CTC<ref name=":1" /> for LSTM.<ref name=":0" /><ref name="sak2015" /><ref name="sak2014">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling|last=Sak|first=Hasim|last2=Senior|first2=Andrew|date=2014|website=|accessdate=|last3=Beaufays|first3=Francoise|archive-url=https://web.archive.org/web/20180424203806/https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|archive-date=2018-04-24|url-status=dead}}</ref><ref name="liwu2015">{{cite arxiv |eprint=1410.4281|last1=Li|first1=Xiangang|title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition|last2=Wu|first2=Xihong|class=cs.CL|year=2014}}</ref><ref name="zen2015">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis|last=Zen|first=Heiga|last2=Sak|first2=Hasim|date=2015|website=Google.com|publisher=ICASSP|pages=4470–4474|accessdate=}}</ref><ref name="CNNspeech2013">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion|last=Deng|first=L.|last2=Abdel-Hamid|first2=O.|date=2013|website=Google.com|publisher=ICASSP|accessdate=|last3=Yu|first3=D.}}</ref><ref name=":2">{{Cite book |doi=10.1109/icassp.2013.6639347|isbn=978-1-4799-0356-6|chapter=Deep convolutional neural networks for LVCSR|title=2013 IEEE International Conference on Acoustics, Speech and Signal Processing|pages=8614–8618|year=2013|last1=Sainath|first1=Tara N.|last2=Mohamed|first2=Abdel-Rahman|last3=Kingsbury|first3=Brian|last4=Ramabhadran|first4=Bhuvana}}</ref> but are more successful in computer vision.
| |
− | | |
− | Deep learning is part of state-of-the-art systems in various disciplines, particularly computer vision and automatic speech recognition (ASR). Results on commonly used evaluation sets such as TIMIT (ASR) and MNIST (image classification), as well as a range of large-vocabulary speech recognition tasks have steadily improved. Convolutional neural networks (CNNs) were superseded for ASR by CTC but are more successful in computer vision.
| |
− | | |
− | 深度学习是各学科最先进系统的一部分,特别是计算机视觉和自动语音识别(ASR)。在常用的评价集(如 TIMIT (ASR)和 MNIST (图像分类) ,以及一系列大词汇量语音识别任务的结果都有了稳步的提高。卷积神经网络在计算机视觉领域的应用较为成功,但被 CTC 所取代。
| |
− | | |
− | | |
− | | |
− | The impact of deep learning in industry began in the early 2000s, when CNNs already processed an estimated 10% to 20% of all the checks written in the US, according to Yann LeCun.<ref name="lecun2016slides">[[Yann LeCun]] (2016). Slides on Deep Learning [https://indico.cern.ch/event/510372/ Online]</ref> Industrial applications of deep learning to large-scale speech recognition started around 2010.
| |
− | | |
− | The impact of deep learning in industry began in the early 2000s, when CNNs already processed an estimated 10% to 20% of all the checks written in the US, according to Yann LeCun. Industrial applications of deep learning to large-scale speech recognition started around 2010.
| |
− | | |
− | 据 Yann LeCun 说,深度学习对工业的影响始于21世纪初,当时 cnn 已经处理了美国所有签发的支票中的10% 到20% 。深度学习在大规模语音识别中的工业应用始于2010年左右。
| |
− | | |
− | | |
− | | |
− | The 2009 NIPS Workshop on Deep Learning for Speech Recognition<ref name="NIPS2009" /> was motivated by the limitations of deep generative models of speech, and the possibility that given more capable hardware and large-scale data sets that deep neural nets (DNN) might become practical. It was believed that pre-training DNNs using generative models of deep belief nets (DBN) would overcome the main difficulties of neural nets.<ref name="HintonKeynoteICASSP2013" /> However, it was discovered that replacing pre-training with large amounts of training data for straightforward backpropagation when using DNNs with large, context-dependent output layers produced error rates dramatically lower than then-state-of-the-art Gaussian mixture model (GMM)/Hidden Markov Model (HMM) and also than more-advanced generative model-based systems.<ref name="HintonDengYu2012">{{cite journal | last1 = Hinton | first1 = G. | last2 = Deng | first2 = L. | last3 = Yu | first3 = D. | last4 = Dahl | first4 = G. | last5 = Mohamed | first5 = A. | last6 = Jaitly | first6 = N. | last7 = Senior | first7 = A. | last8 = Vanhoucke | first8 = V. | last9 = Nguyen | first9 = P. | last10 = Sainath | first10 = T. | last11 = Kingsbury | first11 = B. | year = 2012 | title = Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups| url= | journal = IEEE Signal Processing Magazine | volume = 29 | issue = 6| pages = 82–97 | doi=10.1109/msp.2012.2205597}}</ref><ref name="patent2011">D. Yu, L. Deng, G. Li, and F. Seide (2011). "Discriminative pretraining of deep neural networks," U.S. Patent Filing.</ref> The nature of the recognition errors produced by the two types of systems was characteristically different,<ref name="ReferenceICASSP2013" /><ref name="NIPS2009">NIPS Workshop: Deep Learning for Speech Recognition and Related Applications, Whistler, BC, Canada, Dec. 2009 (Organizers: Li Deng, Geoff Hinton, D. Yu).</ref> offering technical insights into how to integrate deep learning into the existing highly efficient, run-time speech decoding system deployed by all major speech recognition systems.<ref name="BOOK2014" /><ref name="ReferenceA">{{cite book|last2=Deng|first2=L.|date=2014|title=Automatic Speech Recognition: A Deep Learning Approach (Publisher: Springer)|url={{google books |plainurl=y |id=rUBTBQAAQBAJ}}|pages=|isbn=978-1-4471-5779-3|via=|last1=Yu|first1=D.}}</ref><ref>{{cite web|title=Deng receives prestigious IEEE Technical Achievement Award - Microsoft Research|url=https://www.microsoft.com/en-us/research/blog/deng-receives-prestigious-ieee-technical-achievement-award/|website=Microsoft Research|date=3 December 2015}}</ref> Analysis around 2009–2010, contrasted the GMM (and other generative speech models) vs. DNN models, stimulated early industrial investment in deep learning for speech recognition,<ref name="ReferenceICASSP2013" /><ref name="NIPS2009" /> eventually leading to pervasive and dominant use in that industry. That analysis was done with comparable performance (less than 1.5% in error rate) between discriminative DNNs and generative models.<ref name="HintonDengYu2012" /><ref name="ReferenceICASSP2013">{{cite journal|last2=Hinton|first2=G.|last3=Kingsbury|first3=B.|date=2013|title=New types of deep neural network learning for speech recognition and related applications: An overview (ICASSP)|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ICASSP-2013-DengHintonKingsbury-revised.pdf|journal=|pages=|via=|last1=Deng|first1=L.}}</ref><ref name="HintonKeynoteICASSP2013">Keynote talk: Recent Developments in Deep Neural Networks. ICASSP, 2013 (by Geoff Hinton).</ref><ref name="interspeech2014Keynote">{{Cite web|url=https://www.superlectures.com/interspeech2014/downloadFile?id=6&type=slides&filename=achievements-and-challenges-of-deep-learning-from-speech-analysis-and-recognition-to-language-and-multimodal-processing|title=Keynote talk: 'Achievements and Challenges of Deep Learning - From Speech Analysis and Recognition To Language and Multimodal Processing'|last=Li|first=Deng|date=September 2014|website=Interspeech|accessdate=}}</ref>
| |
− | | |
− | The 2009 NIPS Workshop on Deep Learning for Speech Recognition The nature of the recognition errors produced by the two types of systems was characteristically different, offering technical insights into how to integrate deep learning into the existing highly efficient, run-time speech decoding system deployed by all major speech recognition systems. Analysis around 2009–2010, contrasted the GMM (and other generative speech models) vs. DNN models, stimulated early industrial investment in deep learning for speech recognition,
| |
− | | |
− | 2009年 NIPS 语音识别深度学习工作坊两种系统所产生的识别错误的性质有所不同,为如何将深度学习融入现有的高效率的运行时语音解码系统提供了技术上的见解。2009-2010年的分析,对比了 GMM (和其他生成语音模型)和 DNN 模型,刺激了早期工业投资在语音识别的深度学习,
| |
− | | |
− | | |
− | | |
− | In 2010, researchers extended deep learning from TIMIT to large vocabulary speech recognition, by adopting large output layers of the DNN based on context-dependent HMM states constructed by [[decision tree]]s.<ref name="Roles2010">{{cite journal|last1=Yu|first1=D.|last2=Deng|first2=L.|date=2010|title=Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition|url=https://www.microsoft.com/en-us/research/publication/roles-of-pre-training-and-fine-tuning-in-context-dependent-dbn-hmms-for-real-world-speech-recognition/|journal=NIPS Workshop on Deep Learning and Unsupervised Feature Learning|pages=|via=}}</ref><ref>{{Cite journal|last=Seide|first=F.|last2=Li|first2=G.|last3=Yu|first3=D.|date=2011|title=Conversational speech transcription using context-dependent deep neural networks|url=https://www.microsoft.com/en-us/research/publication/conversational-speech-transcription-using-context-dependent-deep-neural-networks|journal=Interspeech|pages=|via=}}</ref><ref>{{Cite journal|last=Deng|first=Li|last2=Li|first2=Jinyu|last3=Huang|first3=Jui-Ting|last4=Yao|first4=Kaisheng|last5=Yu|first5=Dong|last6=Seide|first6=Frank|last7=Seltzer|first7=Mike|last8=Zweig|first8=Geoff|last9=He|first9=Xiaodong|date=2013-05-01|title=Recent Advances in Deep Learning for Speech Research at Microsoft|url=https://www.microsoft.com/en-us/research/publication/recent-advances-in-deep-learning-for-speech-research-at-microsoft/|journal=Microsoft Research}}</ref><ref name="ReferenceA" />
| |
− | | |
− | In 2010, researchers extended deep learning from TIMIT to large vocabulary speech recognition, by adopting large output layers of the DNN based on context-dependent HMM states constructed by decision trees.
| |
− | | |
− | 2010年,研究人员将深度学习从 TIMIT 扩展到大词汇量语音识别,采用了基于决策树构建的上下文相关 HMM 状态的大输出层 DNN。
| |
− | | |
− | | |
− | | |
− | Advances in hardware have enabled renewed interest in deep learning. In 2009, [[Nvidia]] was involved in what was called the “big bang” of deep learning, “as deep-learning neural networks were trained with Nvidia [[graphics processing unit]]s (GPUs).”<ref>{{cite web|url=https://venturebeat.com/2016/04/05/nvidia-ceo-bets-big-on-deep-learning-and-vr/|title=Nvidia CEO bets big on deep learning and VR|date=April 5, 2016|publisher=[[Venture Beat]]}}</ref> That year, [[Google Brain]] used Nvidia GPUs to create capable DNNs. While there, [[Andrew Ng]] determined that GPUs could increase the speed of deep-learning systems by about 100 times.<ref>{{cite news|url=https://www.economist.com/news/special-report/21700756-artificial-intelligence-boom-based-old-idea-modern-twist-not|title=From not working to neural networking|newspaper=[[The Economist]]}}</ref> In particular, GPUs are well-suited for the matrix/vector computations involved in machine learning.<ref name="jung2004">{{cite journal | last1 = Oh | first1 = K.-S. | last2 = Jung | first2 = K. | year = 2004 | title = GPU implementation of neural networks | url= | journal = Pattern Recognition | volume = 37 | issue = 6| pages = 1311–1314 | doi=10.1016/j.patcog.2004.01.013}}</ref><ref>"[https://www.academia.edu/40135801 A Survey of Techniques for Optimizing Deep Learning on GPUs]", S. Mittal and S. Vaishay, Journal of Systems Architecture, 2019</ref><ref name="chellapilla2006">Chellapilla, K., Puri, S., and Simard, P. (2006). High performance convolutional neural networks for document processing. International Workshop on Frontiers in Handwriting Recognition.</ref> GPUs speed up training algorithms by orders of magnitude, reducing running times from weeks to days.<ref name=":3">{{Cite journal|last=Cireşan|first=Dan Claudiu|last2=Meier|first2=Ueli|last3=Gambardella|first3=Luca Maria|last4=Schmidhuber|first4=Jürgen|date=2010-09-21|title=Deep, Big, Simple Neural Nets for Handwritten Digit Recognition|journal=Neural Computation|volume=22|issue=12|pages=3207–3220|doi=10.1162/neco_a_00052|pmid=20858131|issn=0899-7667|arxiv=1003.0358}}</ref><ref>{{Cite journal|last=Raina|first=Rajat|last2=Madhavan|first2=Anand|last3=Ng|first3=Andrew Y.|date=2009|title=Large-scale Deep Unsupervised Learning Using Graphics Processors|journal=Proceedings of the 26th Annual International Conference on Machine Learning|series=ICML '09|location=New York, NY, USA|publisher=ACM|pages=873–880|doi=10.1145/1553374.1553486|isbn=9781605585161|citeseerx=10.1.1.154.372|url=https://www.semanticscholar.org/paper/e337c5e4c23999c36f64bcb33ebe6b284e1bcbf1}}</ref> Further, specialized hardware and algorithm optimizations can be used for efficient processing of deep learning models.<ref name="sze2017">{{cite arXiv
| |
− | | |
− | Advances in hardware have enabled renewed interest in deep learning. In 2009, Nvidia was involved in what was called the “big bang” of deep learning, “as deep-learning neural networks were trained with Nvidia graphics processing units (GPUs).” That year, Google Brain used Nvidia GPUs to create capable DNNs. While there, Andrew Ng determined that GPUs could increase the speed of deep-learning systems by about 100 times. In particular, GPUs are well-suited for the matrix/vector computations involved in machine learning. GPUs speed up training algorithms by orders of magnitude, reducing running times from weeks to days. Further, specialized hardware and algorithm optimizations can be used for efficient processing of deep learning models.<ref name="sze2017">{{cite arXiv
| |
− | | |
− | 硬件的进步使人们对深度学习重新产生了兴趣。2009年,Nvidia 参与了所谓的深度学习的“大爆炸” ,“因为深度学习神经网络是用 Nvidia 的图形处理单元(gpu)训练的。” 那一年,谷歌大脑使用 Nvidia gpu 创建了功能强大的 dnn。在那里,Andrew Ng 决定 gpu 可以将深度学习系统的速度提高大约100倍。特别是 gpu 非常适合机器学习中的矩阵 / 向量计算。Gpu 通过数量级加速训练算法,将运行时间从几周减少到几天。此外,专门的硬件和算法优化可用于深度学习模型的高效处理
| |
| | | |
| + | 硬件方面的进步使得人们重新产生了兴趣。2009年,随着深度学习神经网络由[https://en.wikipedia.org/wiki/Graphics_processing_unit 英伟达图像处理单元](GPUs)<ref>{{cite web|url=https://venturebeat.com/2016/04/05/nvidia-ceo-bets-big-on-deep-learning-and-vr/|title=Nvidia CEO bets big on deep learning and VR|date=April 5, 2016|publisher=''[[Venture Beat]]''}}</ref>训练,英伟达也参与了所谓的深度学习的“大爆炸”。那一年,[https://en.wikipedia.org/wiki/Google_Brain Google Brain]使用了英伟达GPUs来构造强大的DNNs。而且[https://en.wikipedia.org/wiki/Andrew_Ng Ng]确定GPUs可以加速深度学习系统大约100倍。<ref>{{cite web|url=https://www.economist.com/news/special-report/21700756-artificial-intelligence-boom-based-old-idea-modern-twist-not|title=From not working to neural networking|publisher=''[[The Economist]]''}}</ref>GPUs特征适用于机器学习涉及到的矩阵/向量运算。<ref name="jung2004">{{cite journal | last1 = Oh | first1 = K.-S. | last2 = Jung | first2 = K. | year = 2004 | title = GPU implementation of neural networks | url= | journal = Pattern Recognition | volume = 37 | issue = 6| pages = 1311–1314 | doi=10.1016/j.patcog.2004.01.013}}</ref><ref name="chellapilla2006">Chellapilla, K., Puri, S., and Simard, P. (2006). High performance convolutional neural networks for document processing. International Workshop on Frontiers in Handwriting Recognition.</ref>GPUs将训练算法加速了一个数量级,几周的任务只要几天即可完成。<ref name=":3">{{Cite journal|last=Cireşan|first=Dan Claudiu|last2=Meier|first2=Ueli|last3=Gambardella|first3=Luca Maria|last4=Schmidhuber|first4=Jürgen|date=2010-09-21|title=Deep, Big, Simple Neural Nets for Handwritten Digit Recognition|url=http://www.mitpressjournals.org/doi/10.1162/NECO_a_00052|journal=Neural Computation|volume=22|issue=12|pages=3207–3220|doi=10.1162/neco_a_00052|issn=0899-7667}}</ref><ref>{{Cite journal|last=Raina|first=Rajat|last2=Madhavan|first2=Anand|last3=Ng|first3=Andrew Y.|date=2009|title=Large-scale Deep Unsupervised Learning Using Graphics Processors|url=http://doi.acm.org/10.1145/1553374.1553486|journal=Proceedings of the 26th Annual International Conference on Machine Learning|series=ICML '09|location=New York, NY, USA|publisher=ACM|pages=873–880|doi=10.1145/1553374.1553486|isbn=9781605585161|citeseerx=10.1.1.154.372}}</ref>专门的硬件和算法的优化也可以用于高效的处理。<ref name="sze2017">{{cite arXiv |
| |title= Efficient Processing of Deep Neural Networks: A Tutorial and Survey | | |title= Efficient Processing of Deep Neural Networks: A Tutorial and Survey |
− |
| |
− | |title= Efficient Processing of Deep Neural Networks: A Tutorial and Survey
| |
− |
| |
− | 深层神经网络的有效处理: 一个教程和调查
| |
− |
| |
− | |last1=Sze |first1=Vivienne
| |
− |
| |
− | |last1=Sze |first1=Vivienne
| |
− |
| |
| |last1=Sze |first1=Vivienne | | |last1=Sze |first1=Vivienne |
− |
| |
| |last2=Chen |first2=Yu-Hsin | | |last2=Chen |first2=Yu-Hsin |
− |
| |
− | |last2=Chen |first2=Yu-Hsin
| |
− |
| |
− | 最后二陈最初二余新
| |
− |
| |
− | |last3=Yang |first3=Tien-Ju
| |
− |
| |
| |last3=Yang |first3=Tien-Ju | | |last3=Yang |first3=Tien-Ju |
− |
| |
− | 最后3个杨先生3个天菊
| |
− |
| |
| |last4=Emer |first4=Joel | | |last4=Emer |first4=Joel |
− |
| |
− | |last4=Emer |first4=Joel
| |
− |
| |
− | 4 | last 4 Emer | first4 Joel
| |
− |
| |
− | |eprint=1703.09039
| |
− |
| |
| |eprint=1703.09039 | | |eprint=1703.09039 |
− |
| |
− | 1703.09039
| |
− |
| |
| |year=2017 | | |year=2017 |
− |
| |
− | |year=2017
| |
− |
| |
− | 2017年
| |
− |
| |
− | |class=cs.CV }}</ref>
| |
− |
| |
| |class=cs.CV }}</ref> | | |class=cs.CV }}</ref> |
| | | |
− | | class cs.CV } / ref
| + | === 深度学习革命 === |
− | | |
− | | |
− | | |
− | === Deep learning revolution === | |
− | | |
− | [[File:AI-ML-DL.png|thumb|How deep learning is a subset of machine learning and how machine learning is a subset of artificial intelligence (AI).]]
| |
− | | |
− | How deep learning is a subset of machine learning and how machine learning is a subset of artificial intelligence (AI).
| |
− | | |
− | 深度学习是机器学习的一个子集,机器学习是人工智能的一个子集。
| |
− | | |
− | In 2012, a team led by George E. Dahl won the "Merck Molecular Activity Challenge" using multi-task deep neural networks to predict the [[biomolecular target]] of one drug.<ref name="MERCK2012">{{cite web|url=https://www.kaggle.com/c/MerckActivity/details/winners|title=Announcement of the winners of the Merck Molecular Activity Challenge}}</ref><ref name=":5">{{Cite web|url=http://www.datascienceassn.org/content/multi-task-neural-networks-qsar-predictions|title=Multi-task Neural Networks for QSAR Predictions {{!}} Data Science Association|website=www.datascienceassn.org|accessdate=2017-06-14}}</ref> In 2014, Hochreiter's group used deep learning to detect off-target and toxic effects of environmental chemicals in nutrients, household products and drugs and won the "Tox21 Data Challenge" of [[NIH]], [[FDA]] and [[National Center for Advancing Translational Sciences|NCATS]].<ref name="TOX21">"Toxicology in the 21st century Data Challenge"</ref><ref name="TOX21Data">{{cite web|url=https://tripod.nih.gov/tox21/challenge/leaderboard.jsp|title=NCATS Announces Tox21 Data Challenge Winners}}</ref><ref name=":11">{{cite web|url=http://www.ncats.nih.gov/news-and-events/features/tox21-challenge-winners.html|title=Archived copy|archiveurl=https://web.archive.org/web/20150228225709/http://www.ncats.nih.gov/news-and-events/features/tox21-challenge-winners.html|archivedate=2015-02-28|url-status=dead|accessdate=2015-03-05}}</ref>
| |
− | | |
− | In 2012, a team led by George E. Dahl won the "Merck Molecular Activity Challenge" using multi-task deep neural networks to predict the biomolecular target of one drug. In 2014, Hochreiter's group used deep learning to detect off-target and toxic effects of environmental chemicals in nutrients, household products and drugs and won the "Tox21 Data Challenge" of NIH, FDA and NCATS.
| |
− | | |
− | 2012年,由乔治 · e · 达尔领导的团队赢得了“默克分子活性挑战” ,他们使用多任务深度神经网络来预测一种药物的生物分子靶点。2014年,Hochreiter 的团队利用深度学习技术检测营养物质、家用产品和药物中环境化学物质的偏离目标和毒性影响,并赢得了 NIH、 FDA 和 NCATS 的“毒素21数据挑战”。
| |
− | | |
− | | |
− | | |
− | Significant additional impacts in image or object recognition were felt from 2011 to 2012. Although CNNs trained by backpropagation had been around for decades, and GPU implementations of NNs for years, including CNNs, fast implementations of CNNs with max-pooling on GPUs in the style of Ciresan and colleagues were needed to progress on computer vision.<ref name="jung2004" /><ref name="chellapilla2006" /><ref name="LECUN1989" /><ref name=":6">{{Cite journal|last=Ciresan|first=D. C.|last2=Meier|first2=U.|last3=Masci|first3=J.|last4=Gambardella|first4=L. M.|last5=Schmidhuber|first5=J.|date=2011|title=Flexible, High Performance Convolutional Neural Networks for Image Classification|url=http://ijcai.org/papers11/Papers/IJCAI11-210.pdf|journal=International Joint Conference on Artificial Intelligence|pages=|doi=10.5591/978-1-57735-516-8/ijcai11-210|via=}}</ref><ref name="SCHIDHUB" /> In 2011, this approach achieved for the first time superhuman performance in a visual pattern recognition contest. Also in 2011, it won the ICDAR Chinese handwriting contest, and in May 2012, it won the ISBI image segmentation contest.<ref name=":8">{{Cite book|url=http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf|title=Advances in Neural Information Processing Systems 25|last=Ciresan|first=Dan|last2=Giusti|first2=Alessandro|last3=Gambardella|first3=Luca M.|last4=Schmidhuber|first4=Juergen|date=2012|publisher=Curran Associates, Inc.|editor-last=Pereira|editor-first=F.|pages=2843–2851|editor-last2=Burges|editor-first2=C. J. C.|editor-last3=Bottou|editor-first3=L.|editor-last4=Weinberger|editor-first4=K. Q.}}</ref> Until 2011, CNNs did not play a major role at computer vision conferences, but in June 2012, a paper by Ciresan et al. at the leading conference CVPR<ref name=":9" /> showed how max-pooling CNNs on GPU can dramatically improve many vision benchmark records. In October 2012, a similar system by Krizhevsky et al.<ref name="krizhevsky2012" /> won the large-scale [[ImageNet competition]] by a significant margin over shallow machine learning methods. In November 2012, Ciresan et al.'s system also won the ICPR contest on analysis of large medical images for cancer detection, and in the following year also the MICCAI Grand Challenge on the same topic.<ref name="ciresan2013miccai">{{Cite journal|last=Ciresan|first=D.|last2=Giusti|first2=A.|last3=Gambardella|first3=L.M.|last4=Schmidhuber|first4=J.|date=2013|title=Mitosis Detection in Breast Cancer Histology Images using Deep Neural Networks|journal=Proceedings MICCAI|volume=7908|issue=Pt 2|pages=411–418|doi=10.1007/978-3-642-40763-5_51|pmid=24579167|series=Lecture Notes in Computer Science|isbn=978-3-642-38708-1}}</ref> In 2013 and 2014, the error rate on the ImageNet task using deep learning was further reduced, following a similar trend in large-scale speech recognition. The [[Stephen Wolfram|Wolfram]] Image Identification project publicized these improvements.<ref>{{Cite web|url=https://www.imageidentify.com/|title=The Wolfram Language Image Identification Project|website=www.imageidentify.com|accessdate=2017-03-22}}</ref>
| |
− | | |
− | Significant additional impacts in image or object recognition were felt from 2011 to 2012. Although CNNs trained by backpropagation had been around for decades, and GPU implementations of NNs for years, including CNNs, fast implementations of CNNs with max-pooling on GPUs in the style of Ciresan and colleagues were needed to progress on computer vision. Until 2011, CNNs did not play a major role at computer vision conferences, but in June 2012, a paper by Ciresan et al. at the leading conference CVPR In 2013 and 2014, the error rate on the ImageNet task using deep learning was further reduced, following a similar trend in large-scale speech recognition. The Wolfram Image Identification project publicized these improvements.
| |
− | | |
− | 2011年至2012年,感受到了图像或物体识别方面的重大额外影响。虽然经过反向传播训练的 cnn 已经存在了几十年,而且 GPU 实现 NNs (包括 cnn)也已经有几年了,但要在计算机视觉方面取得进展,还需要在 GPU 上快速实现具有 Ciresan 风格的最大池的 cnn。直到2011年,cnn 还没有在计算机视觉会议上扮演重要角色,但在2012年6月,Ciresan 等人发表了一篇论文。在领先的会议 CVPR 在2013年和2014年,使用深度学习的 ImageNet 任务的错误率进一步降低,在大规模语音识别中也有类似的趋势。Wolfram Image Identification 项目公布了这些改进。
| |
− | | |
− | | |
− | | |
− | Image classification was then extended to the more challenging task of [[Automatic image annotation|generating descriptions]] (captions) for images, often as a combination of CNNs and LSTMs.<ref name="1411.4555">{{cite arxiv |eprint=1411.4555|last1=Vinyals|first1=Oriol|title=Show and Tell: A Neural Image Caption Generator|last2=Toshev|first2=Alexander|last3=Bengio|first3=Samy|last4=Erhan|first4=Dumitru|class=cs.CV|year=2014}}.</ref><ref name="1411.4952">{{cite arxiv |eprint=1411.4952|last1=Fang|first1=Hao|title=From Captions to Visual Concepts and Back|last2=Gupta|first2=Saurabh|last3=Iandola|first3=Forrest|last4=Srivastava|first4=Rupesh|last5=Deng|first5=Li|last6=Dollár|first6=Piotr|last7=Gao|first7=Jianfeng|last8=He|first8=Xiaodong|last9=Mitchell|first9=Margaret|last10=Platt|first10=John C|last11=Lawrence Zitnick|first11=C|last12=Zweig|first12=Geoffrey|class=cs.CV|year=2014}}.</ref><ref name="1411.2539">{{cite arxiv |eprint=1411.2539|last1=Kiros|first1=Ryan|title=Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models|last2=Salakhutdinov|first2=Ruslan|last3=Zemel|first3=Richard S|class=cs.LG|year=2014}}.</ref><ref>{{Cite journal|last=Zhong|first=Sheng-hua|last2=Liu|first2=Yan|last3=Liu|first3=Yang|date=2011|title=Bilinear Deep Learning for Image Classification|journal=Proceedings of the 19th ACM International Conference on Multimedia|series=MM '11|location=New York, NY, USA|publisher=ACM|pages=343–352|doi=10.1145/2072298.2072344|isbn=9781450306164|url=https://www.semanticscholar.org/paper/e1bbfb2c7ef74445b4fad9199b727464129df582}}</ref>
| |
− | | |
− | Image classification was then extended to the more challenging task of generating descriptions (captions) for images, often as a combination of CNNs and LSTMs.
| |
− | | |
− | 然后将图像分类扩展到更具挑战性的任务,即为图像生成描述(字幕) ,通常是将 CNNs 和 LSTMs 相结合。
| |
− | | |
− | | |
− | | |
− | Some researchers assess that the October 2012 ImageNet victory anchored the start of a "deep learning revolution" that has transformed the AI industry.<ref>{{cite news|title=Why Deep Learning Is Suddenly Changing Your Life|url=http://fortune.com/ai-artificial-intelligence-deep-machine-learning/|accessdate=13 April 2018|work=Fortune|date=2016}}</ref>
| |
− | | |
− | Some researchers assess that the October 2012 ImageNet victory anchored the start of a "deep learning revolution" that has transformed the AI industry.
| |
− | | |
− | 一些研究人员认为,2012年10月 ImageNet 的胜利标志着一场“深度学习革命”的开始,这场革命已经改变了人工智能产业。
| |
− | | |
− | | |
− | | |
− | In March 2019, [[Yoshua Bengio]], [[Geoffrey Hinton]] and [[Yann LeCun]] were awarded the [[Turing Award]] for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.
| |
− | | |
− | In March 2019, Yoshua Bengio, Geoffrey Hinton and Yann LeCun were awarded the Turing Award for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.
| |
− | | |
− | 2019年3月,Yoshua Bengio,Geoffrey Hinton 和 Yann LeCun 被授予图灵奖,以表彰深层神经网络在概念和工程方面的突破,这些突破使得深层神经网络成为计算的关键组成部分。
| |
− | | |
− | | |
− | | |
− | == Neural networks ==
| |
− | | |
− | | |
− | | |
− | === Artificial neural networks ===
| |
− | | |
− | {{Main|Artificial neural network}}
| |
− | | |
− | '''Artificial neural networks''' ('''ANNs''') or '''[[Connectionism|connectionist]] systems''' are computing systems inspired by the [[biological neural network]]s that constitute animal brains. Such systems learn (progressively improve their ability) to do tasks by considering examples, generally without task-specific programming. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually [[Labeled data|labeled]] as "cat" or "no cat" and using the analytic results to identify cats in other images. They have found most use in applications difficult to express with a traditional computer algorithm using [[rule-based programming]].
| |
− | | |
− | Artificial neural networks (ANNs) or connectionist systems are computing systems inspired by the biological neural networks that constitute animal brains. Such systems learn (progressively improve their ability) to do tasks by considering examples, generally without task-specific programming. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the analytic results to identify cats in other images. They have found most use in applications difficult to express with a traditional computer algorithm using rule-based programming.
| |
− | | |
− | 人工神经网络(ann)或连接主义系统是由构成动物大脑的生物神经网络启发的计算系统。这样的系统通过考虑例子来学习(逐步提高他们的能力)完成任务,通常不需要特定任务的编程。例如,在图像识别方面,他们可以通过分析手动标记为“猫”或“没有猫”的例子图像,并利用分析结果来识别其他图像中的猫,从而学会识别包含猫的图像。他们发现大多数应用程序难以表达与传统的计算机算法使用基于规则的编程。
| |
− | | |
− | | |
− | | |
− | An ANN is based on a collection of connected units called [[artificial neuron]]s, (analogous to biological neurons in a [[Brain|biological brain]]). Each connection ([[synapse]]) between neurons can transmit a signal to another neuron. The receiving (postsynaptic) neuron can process the signal(s) and then signal downstream neurons connected to it. Neurons may have state, generally represented by [[real numbers]], typically between 0 and 1. Neurons and synapses may also have a weight that varies as learning proceeds, which can increase or decrease the strength of the signal that it sends downstream.
| |
− | | |
− | An ANN is based on a collection of connected units called artificial neurons, (analogous to biological neurons in a biological brain). Each connection (synapse) between neurons can transmit a signal to another neuron. The receiving (postsynaptic) neuron can process the signal(s) and then signal downstream neurons connected to it. Neurons may have state, generally represented by real numbers, typically between 0 and 1. Neurons and synapses may also have a weight that varies as learning proceeds, which can increase or decrease the strength of the signal that it sends downstream.
| |
− | | |
− | 人工神经网络是基于一组被称为人工神经元的连接单元(类似于生物大脑中的生物神经元)。神经元之间的每一个连接(突触)都可以将信号传递给另一个神经元。接收(突触后)神经元可以处理信号,然后向连接到它的下游神经元发出信号。神经元可能有状态,一般用实数表示,通常介于0和1之间。神经元和突触的重量也可能随着学习的进行而变化,这可能会增加或减少下游信号的强度。
| |
− | | |
− | | |
− | | |
− | Typically, neurons are organized in layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple times.
| |
− | | |
− | Typically, neurons are organized in layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple times.
| |
− | | |
− | 通常情况下,神经元是分层组织的。不同的层可以对其输入执行不同类型的转换。信号从第一个(输入)传输到最后一个(输出)层,可能是在多次遍历这些层之后。
| |
− | | |
− | | |
− | | |
− | The original goal of the neural network approach was to solve problems in the same way that a human brain would. Over time, attention focused on matching specific mental abilities, leading to deviations from biology such as backpropagation, or passing information in the reverse direction and adjusting the network to reflect that information.
| |
− | | |
− | The original goal of the neural network approach was to solve problems in the same way that a human brain would. Over time, attention focused on matching specific mental abilities, leading to deviations from biology such as backpropagation, or passing information in the reverse direction and adjusting the network to reflect that information.
| |
− | | |
− | 神经网络方法的最初目标是用人类大脑解决问题的同样方式。随着时间的推移,注意力集中在匹配特定的心理能力上,导致偏离生物学,如反向传播,或以相反的方向传递信息,并调整网络来反映这些信息。
| |
− | | |
− | | |
− | | |
− | Neural networks have been used on a variety of tasks, including computer vision, [[speech recognition]], [[machine translation]], [[social network]] filtering, [[general game playing|playing board and video games]] and medical diagnosis.
| |
− | | |
− | Neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis.
| |
− | | |
− | 神经网络已被用于各种任务,包括计算机视觉、语音识别、机器翻译、社会网络过滤、玩棋盘和视频游戏以及医疗诊断。
| |
− | | |
− | | |
− | | |
− | As of 2017, neural networks typically have a few thousand to a few million units and millions of connections. Despite this number being several order of magnitude less than the number of neurons on a human brain, these networks can perform many tasks at a level beyond that of humans (e.g., recognizing faces, playing "Go"<ref>{{Cite journal|last=Silver|first=David|last2=Huang|first2=Aja|last3=Maddison|first3=Chris J.|last4=Guez|first4=Arthur|last5=Sifre|first5=Laurent|last6=Driessche|first6=George van den|last7=Schrittwieser|first7=Julian|last8=Antonoglou|first8=Ioannis|last9=Panneershelvam|first9=Veda|date=January 2016|title=Mastering the game of Go with deep neural networks and tree search|journal=Nature|volume=529|issue=7587|pages=484–489|doi=10.1038/nature16961|issn=1476-4687|pmid=26819042|bibcode=2016Natur.529..484S|url=https://www.semanticscholar.org/paper/846aedd869a00c09b40f1f1f35673cb22bc87490}}</ref> ).
| |
− | | |
− | As of 2017, neural networks typically have a few thousand to a few million units and millions of connections. Despite this number being several order of magnitude less than the number of neurons on a human brain, these networks can perform many tasks at a level beyond that of humans (e.g., recognizing faces, playing "Go" ).
| |
− | | |
− | 截至2017年,神经网络通常有几千到几百万个单元和数百万个连接。尽管这个数字比人类大脑的神经元数量少几个数量级,但这些神经网络可以执行许多超出人类水平的任务(例如,识别面孔,玩“围棋”)。
| |
− | | |
− | | |
− | | |
− | === Deep neural networks ===
| |
− | | |
− | {{technical|section|date=July 2016}}
| |
− | | |
− | A deep neural network (DNN) is an [[artificial neural network]] (ANN) with multiple layers between the input and output layers.<ref name="BENGIODEEP" /><ref name="SCHIDHUB" /> The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a [[linear relationship]] or a non-linear relationship. The network moves through the layers calculating the probability of each output. For example, a DNN that is trained to recognize dog breeds will go over the given image and calculate the probability that the dog in the image is a certain breed. The user can review the results and select which probabilities the network should display (above a certain threshold, etc.) and return the proposed label. Each mathematical manipulation as such is considered a layer, and complex DNN have many layers, hence the name "deep" networks.
| |
− | | |
− | A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship. The network moves through the layers calculating the probability of each output. For example, a DNN that is trained to recognize dog breeds will go over the given image and calculate the probability that the dog in the image is a certain breed. The user can review the results and select which probabilities the network should display (above a certain threshold, etc.) and return the proposed label. Each mathematical manipulation as such is considered a layer, and complex DNN have many layers, hence the name "deep" networks.
| |
− | | |
− | 深层神经网络(DNN)是一种在输入层和输出层之间具有多层结构的人工神经网络(ANN)。Dnn 找到了正确的数学操作,将输入转化为输出,无论是线性关系还是非线性关系。网络通过层移动计算每个输出的概率。例如,一个经过训练能够识别狗的品种的 DNN 会检查给定的图像并计算图像中的狗是特定品种的概率。用户可以查看结果并选择网络应该显示的概率(超过某个阈值,等等)并返回建议的标签。每个数学操作本身被认为是一个层,而复杂的 DNN 有许多层,因此被称为“深层”网络。
| |
− | | |
− | | |
− | | |
− | DNNs can model complex non-linear relationships. DNN architectures generate compositional models where the object is expressed as a layered composition of [[Primitive data type|primitives]].<ref>{{Cite journal|last=Szegedy|first=Christian|last2=Toshev|first2=Alexander|last3=Erhan|first3=Dumitru|date=2013|title=Deep neural networks for object detection|url=https://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection|journal=Advances in Neural Information Processing Systems|pages=2553–2561|via=}}</ref> The extra layers enable composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network.<ref name="BENGIODEEP" />
| |
− | | |
− | DNNs can model complex non-linear relationships. DNN architectures generate compositional models where the object is expressed as a layered composition of primitives. The extra layers enable composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network.
| |
− | | |
− | Dnn 可以模拟复杂的非线性关系。Dnn 架构生成组合模型,其中对象表示为原语的分层组合。这些额外的层可以组合较低层的特征,可以用较少的单元对复杂的数据进行建模,而不必使用性能相似的浅层网络。
| |
− | | |
− | | |
− | | |
− | Deep architectures include many variants of a few basic approaches. Each architecture has found success in specific domains. It is not always possible to compare the performance of multiple architectures, unless they have been evaluated on the same data sets.
| |
− | | |
− | Deep architectures include many variants of a few basic approaches. Each architecture has found success in specific domains. It is not always possible to compare the performance of multiple architectures, unless they have been evaluated on the same data sets.
| |
− | | |
− | 深层架构包括一些基本方法的许多变体。每个架构都在特定领域取得了成功。并不总是可以比较多个体系结构的性能,除非它们已经在相同的数据集上进行了评估。
| |
− | | |
− | | |
− | | |
− | DNNs are typically feedforward networks in which data flows from the input layer to the output layer without looping back. At first, the DNN creates a map of virtual neurons and assigns random numerical values, or "weights", to connections between them. The weights and inputs are multiplied and return an output between 0 and 1. If the network did not accurately recognize a particular pattern, an algorithm would adjust the weights.<ref>{{Cite news|url=https://www.technologyreview.com/s/513696/deep-learning/|title=Is Artificial Intelligence Finally Coming into Its Own?|last=Hof|first=Robert D.|work=MIT Technology Review|access-date=2018-07-10}}</ref> That way the algorithm can make certain parameters more influential, until it determines the correct mathematical manipulation to fully process the data.
| |
− | | |
− | DNNs are typically feedforward networks in which data flows from the input layer to the output layer without looping back. At first, the DNN creates a map of virtual neurons and assigns random numerical values, or "weights", to connections between them. The weights and inputs are multiplied and return an output between 0 and 1. If the network did not accurately recognize a particular pattern, an algorithm would adjust the weights. That way the algorithm can make certain parameters more influential, until it determines the correct mathematical manipulation to fully process the data.
| |
− | | |
− | Dnn 是典型的前馈网络,其中的数据流从输入层到输出层,没有循环回来。首先,DNN 创建一个虚拟神经元的映射,并为它们之间的连接分配随机的数值或“权重”。权重和输入被乘以并返回0到1之间的输出。如果网络不能准确识别特定的模式,算法就会调整权重。这样,算法可以使某些参数更有影响力,直到它确定正确的数学操作,以完全处理数据。
| |
− | | |
− | | |
− | | |
− | [[Recurrent neural networks]] (RNNs), in which data can flow in any direction, are used for applications such as [[language model]]ing.<ref name="gers2001">{{cite journal|last1=Gers|first1=Felix A.|last2=Schmidhuber|first2=Jürgen|year=2001|title=LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages|url=http://elartu.tntu.edu.ua/handle/lib/30719|journal= IEEE Transactions on Neural Networks|volume=12|issue=6|pages=1333–1340|doi=10.1109/72.963769|pmid=18249962}}</ref><ref name="NIPS2014"/><ref name="vinyals2016">{{cite arxiv |eprint=1602.02410|last1=Jozefowicz|first1=Rafal|title=Exploring the Limits of Language Modeling|last2=Vinyals|first2=Oriol|last3=Schuster|first3=Mike|last4=Shazeer|first4=Noam|last5=Wu|first5=Yonghui|class=cs.CL|year=2016}}</ref><ref name="gillick2015">{{cite arxiv |eprint=1512.00103|last1=Gillick|first1=Dan|title=Multilingual Language Processing from Bytes|last2=Brunk|first2=Cliff|last3=Vinyals|first3=Oriol|last4=Subramanya|first4=Amarnag|class=cs.CL|year=2015}}</ref><ref name="MIKO2010">{{Cite journal|last=Mikolov|first=T.|display-authors=etal|date=2010|title=Recurrent neural network based language model|url=http://www.fit.vutbr.cz/research/groups/speech/servite/2010/rnnlm_mikolov.pdf|journal=Interspeech|pages=|via=}}</ref> Long short-term memory is particularly effective for this use.<ref name=":0" /><ref name=":10">{{Cite web|url=https://www.researchgate.net/publication/220320057|title=Learning Precise Timing with LSTM Recurrent Networks (PDF Download Available)|website=ResearchGate|accessdate=2017-06-13}}</ref>
| |
− | | |
− | Recurrent neural networks (RNNs), in which data can flow in any direction, are used for applications such as language modeling. Long short-term memory is particularly effective for this use.
| |
− | | |
− | 回归神经网络(RNNs) ,其中的数据可以在任何方向流动,被用于应用,如语言建模。长期短期记忆对于这种使用特别有效。
| |
− | | |
− | | |
− | | |
− | [[Convolutional neural network|Convolutional deep neural networks (CNNs)]] are used in computer vision.<ref name="LECUN86">{{cite journal |last1=LeCun |first1=Y. |display-authors=etal |year= 1998|title=Gradient-based learning applied to document recognition |url= |journal=Proceedings of the IEEE |volume=86 |issue=11 |pages=2278–2324 |doi=10.1109/5.726791}}</ref> CNNs also have been applied to [[acoustic model]]ing for automatic speech recognition (ASR).<ref name=":2" />
| |
− | | |
− | Convolutional deep neural networks (CNNs) are used in computer vision. CNNs also have been applied to acoustic modeling for automatic speech recognition (ASR).
| |
− | | |
− | 卷积深层神经网络用于计算机视觉。Cnns 也被应用到自动语音识别(ASR)的声学建模中。
| |
− | | |
− | | |
− | | |
− | ==== Challenges ====
| |
− | | |
− | As with ANNs, many issues can arise with naively trained DNNs. Two common issues are [[overfitting]] and computation time.
| |
− | | |
− | As with ANNs, many issues can arise with naively trained DNNs. Two common issues are overfitting and computation time.
| |
− | | |
− | 与人工神经网络一样,经过天真训练的 dna 也会产生许多问题。两个常见的问题是过于合身和计算时间。
| |
− | | |
− | | |
− | | |
− | DNNs are prone to overfitting because of the added layers of abstraction, which allow them to model rare dependencies in the training data. [[Regularization (mathematics)|Regularization]] methods such as Ivakhnenko's unit pruning<ref name="ivak1971"/> or [[weight decay]] (<math> \ell_2 </math>-regularization) or [[sparse matrix|sparsity]] (<math> \ell_1 </math>-regularization) can be applied during training to combat overfitting.<ref>{{Cite book |doi=10.1109/icassp.2013.6639349|isbn=978-1-4799-0356-6|arxiv=1212.0901|citeseerx=10.1.1.752.9151|chapter=Advances in optimizing recurrent networks|title=2013 IEEE International Conference on Acoustics, Speech and Signal Processing|pages=8624–8628|year=2013|last1=Bengio|first1=Yoshua|last2=Boulanger-Lewandowski|first2=Nicolas|last3=Pascanu|first3=Razvan}}</ref> Alternatively dropout regularization randomly omits units from the hidden layers during training. This helps to exclude rare dependencies.<ref name="DAHL2013">{{Cite journal|last=Dahl|first=G.|display-authors=etal|date=2013|title=Improving DNNs for LVCSR using rectified linear units and dropout|url=http://www.cs.toronto.edu/~gdahl/papers/reluDropoutBN_icassp2013.pdf|journal=ICASSP|pages=|via=}}</ref> Finally, data can be augmented via methods such as cropping and rotating such that smaller training sets can be increased in size to reduce the chances of overfitting.<ref>{{Cite web|url=https://www.coursera.org/learn/convolutional-neural-networks/lecture/AYzbX/data-augmentation|title=Data Augmentation - deeplearning.ai {{!}} Coursera|website=Coursera|accessdate=2017-11-30}}</ref>
| |
− | | |
− | DNNs are prone to overfitting because of the added layers of abstraction, which allow them to model rare dependencies in the training data. Regularization methods such as Ivakhnenko's unit pruning Alternatively dropout regularization randomly omits units from the hidden layers during training. This helps to exclude rare dependencies. Finally, data can be augmented via methods such as cropping and rotating such that smaller training sets can be increased in size to reduce the chances of overfitting.
| |
− | | |
− | 由于增加了抽象层,dnn 容易过度拟合,这使得它们能够对训练数据中的稀有依赖关系建模。正则化方法,如 Ivakhnenko 的单位剪枝或者在训练期间从隐藏层随机删除正则化单位。这有助于排除罕见的依赖关系。最后,数据可以通过剪切和旋转等方法得到增强,这样可以增加较小的训练集的大小,以减少过拟合的机会。
| |
− | | |
− | | |
− | | |
− | DNNs must consider many training parameters, such as the size (number of layers and number of units per layer), the [[learning rate]], and initial weights. [[Hyperparameter optimization#Grid search|Sweeping through the parameter space]] for optimal parameters may not be feasible due to the cost in time and computational resources. Various tricks, such as batching (computing the gradient on several training examples at once rather than individual examples)<ref name="RBMTRAIN">{{Cite journal|last=Hinton|first=G. E.|date=2010|title=A Practical Guide to Training Restricted Boltzmann Machines|url=https://www.researchgate.net/publication/221166159|journal=Tech. Rep. UTML TR 2010-003|pages=|via=}}</ref> speed up computation. Large processing capabilities of many-core architectures (such as GPUs or the Intel Xeon Phi) have produced significant speedups in training, because of the suitability of such processing architectures for the matrix and vector computations.<ref>{{cite book|last1=You|first1=Yang|title=Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17|pages=1–12|last2=Buluç|first2=Aydın|last3=Demmel|first3=James|chapter=Scaling deep learning on GPU and knights landing clusters|chapter-url=https://dl.acm.org/citation.cfm?doid=3126908.3126912|publisher=SC '17, ACM|date=November 2017|accessdate=5 March 2018|doi=10.1145/3126908.3126912|isbn=9781450351140|url=http://www.escholarship.org/uc/item/6ch40821}}</ref><ref>{{cite journal|last1=Viebke|first1=André|last2=Memeti|first2=Suejb|last3=Pllana|first3=Sabri|last4=Abraham|first4=Ajith|title=CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi|journal=The Journal of Supercomputing|volume=75|pages=197–227|doi=10.1007/s11227-017-1994-x|accessdate=|arxiv=1702.07908|bibcode=2017arXiv170207908V|url=https://www.semanticscholar.org/paper/aa8a4d2de94cc0a8ccff21f651c005613e8ec0e8|year=2019}}</ref>
| |
− | | |
− | DNNs must consider many training parameters, such as the size (number of layers and number of units per layer), the learning rate, and initial weights. Sweeping through the parameter space for optimal parameters may not be feasible due to the cost in time and computational resources. Various tricks, such as batching (computing the gradient on several training examples at once rather than individual examples) speed up computation. Large processing capabilities of many-core architectures (such as GPUs or the Intel Xeon Phi) have produced significant speedups in training, because of the suitability of such processing architectures for the matrix and vector computations.
| |
− | | |
− | Dnn 必须考虑许多训练参数,如大小(层的数量和每层的单位数量) ,学习率和初始权重。在参数空间中搜索最优参数可能由于时间和计算资源的开销而不可行。各种技巧,例如批处理(一次计算多个训练示例的梯度,而不是单个示例)加快了计算速度。多核架构(如 gpu 或 Intel Xeon Phi)的大型处理能力在培训中产生了显著的加速,因为这种处理架构适合于矩阵和向量计算。
| |
− | | |
− | | |
− | | |
− | Alternatively, engineers may look for other types of neural networks with more straightforward and convergent training algorithms. CMAC ([[cerebellar model articulation controller]]) is one such kind of neural network. It doesn't require learning rates or randomized initial weights for CMAC. The training process can be guaranteed to converge in one step with a new batch of data, and the computational complexity of the training algorithm is linear with respect to the number of neurons involved.<ref name=Qin1>Ting Qin, et al. "A learning algorithm of CMAC based on RLS." Neural Processing Letters 19.1 (2004): 49-61.</ref><ref name=Qin2>Ting Qin, et al. "[http://www-control.eng.cam.ac.uk/Homepage/papers/cued_control_997.pdf Continuous CMAC-QRLS and its systolic array]." Neural Processing Letters 22.1 (2005): 1-16.</ref>
| |
− | | |
− | Alternatively, engineers may look for other types of neural networks with more straightforward and convergent training algorithms. CMAC (cerebellar model articulation controller) is one such kind of neural network. It doesn't require learning rates or randomized initial weights for CMAC. The training process can be guaranteed to converge in one step with a new batch of data, and the computational complexity of the training algorithm is linear with respect to the number of neurons involved.
| |
− | | |
− | 另外,工程师可能会寻找其他类型的神经网络与更直接和收敛的训练算法。小脑模型神经网络(CMAC)就是这样一种神经网络。它不需要学习率或 CMAC 的随机初始权重。该算法能够保证训练过程在新的一批数据中一步收敛,并且训练算法的计算复杂度与所涉及的神经元数目成线性关系。
| |
− | | |
− | | |
− | | |
− | == Applications ==
| |
− | | |
− | | |
− | | |
− | === Automatic speech recognition ===
| |
− | | |
− | {{Main|Speech recognition}}
| |
− | | |
− | | |
− | | |
− | Large-scale automatic speech recognition is the first and most convincing successful case of deep learning. LSTM RNNs can learn "Very Deep Learning" tasks<ref name="SCHIDHUB"/> that involve multi-second intervals containing speech events separated by thousands of discrete time steps, where one time step corresponds to about 10 ms. LSTM with forget gates<ref name=":10" /> is competitive with traditional speech recognizers on certain tasks.<ref name="graves2003"/>
| |
− | | |
− | Large-scale automatic speech recognition is the first and most convincing successful case of deep learning. LSTM RNNs can learn "Very Deep Learning" tasks that involve multi-second intervals containing speech events separated by thousands of discrete time steps, where one time step corresponds to about 10 ms. LSTM with forget gates is competitive with traditional speech recognizers on certain tasks.
| |
− | | |
− | 大规模自动语音识别是深度学习的第一个也是最有说服力的成功案例。Lstm RNNs 可以学习“深度学习”任务,这些任务涉及到包含语音事件的多秒间隔,这些语音事件由数千个离散时间步长分隔开,其中一个时间步长相当于10毫秒左右的 LSTM,带有忘记门,在某些任务上与传统的语音识别器相竞争。
| |
− | | |
− | | |
− | | |
− | The initial success in speech recognition was based on small-scale recognition tasks based on TIMIT. The data set contains 630 speakers from eight major [[dialect]]s of [[American English]], where each speaker reads 10 sentences.<ref name="LDCTIMIT">''TIMIT Acoustic-Phonetic Continuous Speech Corpus'' Linguistic Data Consortium, Philadelphia.</ref> Its small size lets many configurations be tried. More importantly, the TIMIT task concerns phone-sequence recognition, which, unlike word-sequence recognition, allows weak phone [[bigram]] language models. This lets the strength of the acoustic modeling aspects of speech recognition be more easily analyzed. The error rates listed below, including these early results and measured as percent phone error rates (PER), have been summarized since 1991.
| |
− | | |
− | The initial success in speech recognition was based on small-scale recognition tasks based on TIMIT. The data set contains 630 speakers from eight major dialects of American English, where each speaker reads 10 sentences. Its small size lets many configurations be tried. More importantly, the TIMIT task concerns phone-sequence recognition, which, unlike word-sequence recognition, allows weak phone bigram language models. This lets the strength of the acoustic modeling aspects of speech recognition be more easily analyzed. The error rates listed below, including these early results and measured as percent phone error rates (PER), have been summarized since 1991.
| |
− | | |
− | 语音识别的初步成功是基于基于 TIMIT 的小规模识别任务。该数据集包含来自美国英语八种主要方言的630名说话者,每人读10个句子。它体积小,可以尝试多种配置。更重要的是,TIMIT 任务涉及到电话序列识别,这不同于字序列识别,允许弱电话双字母语言模型。这使得语音识别的声学建模方面的强度更容易分析。下面列出的错误率,包括这些早期的结果和测量百分比电话错误率(PER) ,已经总结自1991年以来。
| |
− | | |
− | | |
− | | |
− | {| class="wikitable"
| |
− | | |
− | {| class="wikitable"
| |
− | | |
− | { | class“ wikitable”
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | ! Method !! Percent phone<br>error rate (PER) (%)
| |
− | | |
− | ! Method !! Percent phone<br>error rate (PER) (%)
| |
− | | |
− | !方法! !电话误码率(PER)(%)
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Randomly Initialized RNN<ref>{{cite journal |last1=Robinson |first1=Tony |authorlink=Tony Robinson (speech recognition)|title=Several Improvements to a Recurrent Error Propagation Network Phone Recognition System |journal=Cambridge University Engineering Department Technical Report |date=30 September 1991 |volume=CUED/F-INFENG/TR82 |doi=10.13140/RG.2.2.15418.90567 }}</ref>|| 26.1
| |
− | | |
− | | Randomly Initialized RNN|| 26.1
| |
− | | |
− | 随机初始化的 RNN | | 26.1
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Bayesian Triphone GMM-HMM || 25.6
| |
− | | |
− | | Bayesian Triphone GMM-HMM || 25.6
| |
− | | |
− | | Bayesian Triphone GMM-HMM | | 25.6
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Hidden Trajectory (Generative) Model|| 24.8
| |
− | | |
− | | Hidden Trajectory (Generative) Model|| 24.8
| |
− | | |
− | 隐藏轨迹(生成)模型 | 24.8
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Monophone Randomly Initialized DNN|| 23.4
| |
− | | |
− | | Monophone Randomly Initialized DNN|| 23.4
| |
− | | |
− | 随机初始化的 DNN | | 23.4
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Monophone DBN-DNN|| 22.4
| |
− | | |
− | | Monophone DBN-DNN|| 22.4
| |
− | | |
− | | Monophone DBN-DNN | | 22.4
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Triphone GMM-HMM with BMMI Training|| 21.7
| |
− | | |
− | | Triphone GMM-HMM with BMMI Training|| 21.7
| |
− | | |
− | | Triphone GMM-HMM with BMMI Training | | 21.7
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Monophone DBN-DNN on fbank || 20.7
| |
− | | |
− | | Monophone DBN-DNN on fbank || 20.7
| |
− | | |
− | | Monophone DBN-DNN on fbank | 20.7
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Convolutional DNN<ref name="CNN-2014">{{cite journal|last1=Abdel-Hamid|first1=O.|title=Convolutional Neural Networks for Speech Recognition|journal=IEEE/ACM Transactions on Audio, Speech, and Language Processing|date=2014|volume=22|issue=10|pages=1533–1545|doi=10.1109/taslp.2014.2339736|display-authors=etal|url=https://zenodo.org/record/891433}}</ref>|| 20.0
| |
− | | |
− | | Convolutional DNN|| 20.0
| |
− | | |
− | 20.0
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Convolutional DNN w. Heterogeneous Pooling|| 18.7
| |
− | | |
− | | Convolutional DNN w. Heterogeneous Pooling|| 18.7
| |
− | | |
− | | 卷积 dnw 异质池 | | 18.7
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Ensemble DNN/CNN/RNN<ref name="EnsembleDL">{{cite journal|last2=Platt|first2=J.|date=2014|title=Ensemble Deep Learning for Speech Recognition|url=https://pdfs.semanticscholar.org/8201/55ecb57325503183253b8796de5f4535eb16.pdf|journal=Proc. Interspeech|pages=|via=|last1=Deng|first1=L.}}</ref>|| 18.3
| |
− | | |
− | | Ensemble DNN/CNN/RNN|| 18.3
| |
− | | |
− | | ensemblednn / cnn / rnn | | 18.3
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Bidirectional LSTM|| 17.9
| |
− | | |
− | | Bidirectional LSTM|| 17.9
| |
− | | |
− | 双向 LSTM | | 17.9
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | |-
| |
− | | |
− | | Hierarchical Convolutional Deep Maxout Network<ref name="HCDMM">{{cite journal|last1=Tóth|first1=Laszló|date=2015|title=Phone Recognition with Hierarchical Convolutional Deep Maxout Networks|journal=EURASIP Journal on Audio, Speech, and Music Processing|volume=2015|doi=10.1186/s13636-015-0068-3|url=http://publicatio.bibl.u-szeged.hu/5976/1/EURASIP2015.pdf}}</ref> || 16.5
| |
− | | |
− | | Hierarchical Convolutional Deep Maxout Network || 16.5
| |
− | | |
− | | 分层卷积深度最大网络 | | 16.5
| |
− | | |
− | |}
| |
− | | |
− | |}
| |
− | | |
− | |}
| |
− | | |
− | | |
− | | |
− | The debut of DNNs for speaker recognition in the late 1990s and speech recognition around 2009-2011 and of LSTM around 2003–2007, accelerated progress in eight major areas:<ref name="BOOK2014" /><ref name="interspeech2014Keynote" /><ref name="ReferenceA" />
| |
− | | |
− | The debut of DNNs for speaker recognition in the late 1990s and speech recognition around 2009-2011 and of LSTM around 2003–2007, accelerated progress in eight major areas:
| |
− | | |
− | 20世纪90年代末,讲话者识别的 dnn 和2009-2011年左右的语音识别技术以及2003-2007年左右的语音识别技术,加速了以下八个主要领域的进展:
| |
− | | |
− | | |
− | | |
− | * Scale-up/out and accelerated DNN training and decoding
| |
| | | |
− | * Sequence discriminative training
| + | 在2012年一个由Dahl领导的队伍通利用多任务深度神经网络来预测一种药物的[https://en.wikipedia.org/wiki/Biomolecule 生物分子]靶标,<ref name="MERCK2012">{{cite web|url=https://www.kaggle.com/c/MerckActivity/details/winners|title=Announcement of the winners of the Merck Molecular Activity Challenge}}</ref><ref name=":5">{{Cite web|url=http://www.datascienceassn.org/content/multi-task-neural-networks-qsar-predictions|title=Multi-task Neural Networks for QSAR Predictions {{!}} Data Science Association|website=www.datascienceassn.org|accessdate=2017-06-14}}</ref> 赢得了 "Merck Molecular Activity Challenge" 。2014年,Hochreiter的团队利用深度学习来探测环境化学物质在营养物质、家庭产品和药物中的副作用和有毒影响,并赢得了的[https://en.wikipedia.org/wiki/National_Institutes_of_Health NIH]、[https://en.wikipedia.org/wiki/Food_and_Drug_Administration FDA]和[https://en.wikipedia.org/wiki/National_Center_for_Advancing_Translational_Sciences NCATS]的"Tox21 Data Challenge"。<ref name="TOX21">"Toxicology in the 21st century Data Challenge"</ref><ref name="TOX21Data">{{cite web|url=https://tripod.nih.gov/tox21/challenge/leaderboard.jsp|title=NCATS Announces Tox21 Data Challenge Winners|publisher=}}</ref><ref name=":11">{{cite web|url=http://www.ncats.nih.gov/news-and-events/features/tox21-challenge-winners.html|title=Archived copy|archiveurl=https://web.archive.org/web/20150228225709/http://www.ncats.nih.gov/news-and-events/features/tox21-challenge-winners.html|archivedate=2015-02-28|deadurl=yes|accessdate=2015-03-05|df=}}</ref> |
| | | |
− | * Feature processing by deep models with solid understanding of the underlying mechanisms
| + | 2011年至2012年期间,深度学习在图像和物体识别方面有了重大的额外影响。虽然使用反向传播训练的CNN已经存在了几十年,而且多年以来,神经网络比如CNNs的GPU实现也已经很多年了。计算机视觉方面,在Ciresan式的GPU上面快速实现有最大池化的CNNs还需要人们的努力。<ref name="jung2004" /><ref name="chellapilla2006" /><ref name="LECUN1989" /><ref name=":6">{{Cite journal|last=Ciresan|first=D. C.|last2=Meier|first2=U.|last3=Masci|first3=J.|last4=Gambardella|first4=L. M.|last5=Schmidhuber|first5=J.|date=2011|title=Flexible, High Performance Convolutional Neural Networks for Image Classification|url=http://ijcai.org/papers11/Papers/IJCAI11-210.pdf|journal=International Joint Conference on Artificial Intelligence|pages=|doi=10.5591/978-1-57735-516-8/ijcai11-210|via=}}</ref><ref name="SCHIDHUB" /> 在2011年,这个方法首次在计算机视觉的模式识别方面取得了超人的成就。<ref name=":8">{{Cite book|url=http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf|title=Advances in Neural Information Processing Systems 25|last=Ciresan|first=Dan|last2=Giusti|first2=Alessandro|last3=Gambardella|first3=Luca M.|last4=Schmidhuber|first4=Juergen|date=2012|publisher=Curran Associates, Inc.|editor-last=Pereira|editor-first=F.|pages=2843–2851|editor-last2=Burges|editor-first2=C. J. C.|editor-last3=Bottou|editor-first3=L.|editor-last4=Weinberger|editor-first4=K. Q.}}</ref> 同样是在2011年中,它也获得了ICDAR汉字手写识别竞赛的冠军,再后来的2012年3月,获得了ISBI图像分割竞赛的冠军。2011年之前,CNNs都不是计算机视觉方面的主力,但是在2012年7月,CVPR会议上,Ciresan等人发表的一篇论文显示,GPU上带最大池化的CNNs可以多么显著地提升计算机视觉方面性能。在2012年10月,由Krizhevsky等人开发出的一个类似的系统获得了大规模[https://en.wikipedia.org/wiki/ImageNet#ImageNet_Challenge ImageNet 竞赛]的冠军,他们的方法大幅领先了浅层的机器学习方法。在2012年11月,Ciresan等人的系统也赢得了大型癌症检测医学图像分析的ICPR竞赛,并且在接下来的一年里,MICCAI挑战也使用了同样的主题。在<ref name="ciresan2013miccai">{{Cite journal|last=Ciresan|first=D.|last2=Giusti|first2=A.|last3=Gambardella|first3=L.M.|last4=Schmidhuber|first4=J.|date=2013|title=Mitosis Detection in Breast Cancer Histology Images using Deep Neural Networks|url=http://people.idsia.ch/~ciresan/data/cvpr2012.pdf|journal=Proceedings MICCAI|pages=|via=}}</ref>2013年和2014年,使用深度学习的ImageNet任务的错误率再次下降,同样的趋势也发生在大规模语音识别上。[https://en.wikipedia.org/wiki/Stephen_Wolfram Wolfram]图像识别计划公布了这些改进。<ref>{{Cite web|url=https://www.imageidentify.com/|title=The Wolfram Language Image Identification Project|website=www.imageidentify.com|accessdate=2017-03-22}}</ref> |
| + | 然后将图像识别拓展到图像描述(标题)生成这一更加具有挑战性的任务,这些描述通常是CNNs和LSTMs的结合。<ref name="1411.4555">{{cite arxiv |eprint=1411.4555|last1=Vinyals|first1=Oriol|title=Show and Tell: A Neural Image Caption Generator|last2=Toshev|first2=Alexander|last3=Bengio|first3=Samy|last4=Erhan|first4=Dumitru|class=cs.CV|year=2014}}.</ref><ref name="1411.4952">{{cite arxiv |eprint=1411.4952|last1=Fang|first1=Hao|title=From Captions to Visual Concepts and Back|last2=Gupta|first2=Saurabh|last3=Iandola|first3=Forrest|last4=Srivastava|first4=Rupesh|last5=Deng|first5=Li|last6=Dollár|first6=Piotr|last7=Gao|first7=Jianfeng|last8=He|first8=Xiaodong|last9=Mitchell|first9=Margaret|last10=Platt|first10=John C|last11=Lawrence Zitnick|first11=C|last12=Zweig|first12=Geoffrey|class=cs.CV|year=2014}}.</ref><ref name="1411.2539">{{cite arxiv |eprint=1411.2539|last1=Kiros|first1=Ryan|title=Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models|last2=Salakhutdinov|first2=Ruslan|last3=Zemel|first3=Richard S|class=cs.LG|year=2014}}.</ref><ref>{{Cite journal|last=Zhong|first=Sheng-hua|last2=Liu|first2=Yan|last3=Liu|first3=Yang|date=2011|title=Bilinear Deep Learning for Image Classification|url=http://doi.acm.org/10.1145/2072298.2072344|journal=Proceedings of the 19th ACM International Conference on Multimedia|series=MM '11|location=New York, NY, USA|publisher=ACM|pages=343–352|doi=10.1145/2072298.2072344|isbn=9781450306164}}</ref> |
| | | |
− | * Adaptation of DNNs and related deep models
| + | 一些评估人员说,2012年10月的ImageNet胜利奠定了深度学习革命的开始,这场革命改变了人工智能行业。<ref>{{cite news|title=Why Deep Learning Is Suddenly Changing Your Life|url=http://fortune.com/ai-artificial-intelligence-deep-machine-learning/|accessdate=13 April 2018|work=Fortune|date=2016}}</ref> |
| | | |
− | * [[Multi-task learning|Multi-task]] and [[Inductive transfer|transfer learning]] by DNNs and related deep models
| + | === 神经网络 === |
| | | |
− | * CNNs and how to design them to best exploit [[domain knowledge]] of speech
| + | ==== 人工神经网络 ==== |
| | | |
− | * RNN and its rich LSTM variants
| + | 人工神经网络,Artificial neural networks (ANNs) ,或者叫[https://en.wikipedia.org/wiki/Connectionism 连接主义]系统是一个由生物神经网络启发得到的计算系统。这种系统通过样本来进行学习(逐步提高它们的性能)来完成任务,通常不会对特定任务进行编程。比如,在图像识别中,它们需要识别一张含有猫的图像来判断这张图像是不是猫,这些图像通常已经被人工打上是猫或者不是猫的[https://en.wikipedia.org/wiki/Labeled_data 标签]。人们发现,使用基于规则的[https://en.wikipedia.org/wiki/Labeled_data 传统计算机算法]很难在程序中进行表达。 |
| | | |
− | * Other types of deep models including tensor-based models and integrated deep generative/discriminative models.
| + | 一个人工神经网络是基于由[https://en.wikipedia.org/wiki/Artificial_neuron 人工神经元]进行连接的神经元的集合组成的(类似于[https://en.wikipedia.org/wiki/Brain 生物大脑]重话的生物神经元)。神经元之间的每一个连接(突触)都可以将信号传递给另一个神经元。接收([https://en.wikipedia.org/wiki/Synapse 突触]后)神经元可以处理信号,然后向后面的神经元发出信号。神经元本身可以具有状态,通常用一个0到1之间的[https://en.wikipedia.org/wiki/Real_number 实值数字]来表示。神经元和突触也可能具有随着学习过程而改变的权重,这些权重决定了向下一个神经元发送信号的强度的增减。 |
| | | |
| + | 通常情况下,神经元是分层组织的。不同的层会对它们的输入信号进行不同种类的转换。信号会从第一层(即输入)传递到最后一层(即输出),也有可能在这些层之间进行多次的传递。 |
| | | |
| + | 神经网络方法的最初目的是以类人类大脑的方法来解决问题。随着时间的推移,人们的集中于匹配特定的智力能力,这导致了从生物学上的偏离。比如,反向传播即反向传播信息以调整网络来反应这些信息。 |
| | | |
− | All major commercial speech recognition systems (e.g., Microsoft [[Cortana (software)|Cortana]], [[Xbox]], [[Skype Translator]], [[Amazon Alexa]], [[Google Now]], [[Siri|Apple Siri]], [[Baidu]] and [[IFlytek|iFlyTek]] voice search, and a range of [[Nuance Communications|Nuance]] speech products, etc.) are based on deep learning.<ref name=BOOK2014 /><ref>{{Cite journal|url=https://www.wired.com/2014/12/skype-used-ai-build-amazing-new-language-translator/|title=How Skype Used AI to Build Its Amazing New Language Translator {{!}} WIRED|journal=Wired|accessdate=2017-06-14|date=2014-12-17|last1=McMillan|first1=Robert}}</ref><ref name="Baidu">{{cite arxiv |eprint=1412.5567|last1=Hannun|first1=Awni|title=Deep Speech: Scaling up end-to-end speech recognition|last2=Case|first2=Carl|last3=Casper|first3=Jared|last4=Catanzaro|first4=Bryan|last5=Diamos|first5=Greg|last6=Elsen|first6=Erich|last7=Prenger|first7=Ryan|last8=Satheesh|first8=Sanjeev|last9=Sengupta|first9=Shubho|last10=Coates|first10=Adam|last11=Ng|first11=Andrew Y|class=cs.CL|year=2014}}</ref><ref>{{Cite web|url=http://research.microsoft.com/en-US/people/deng/ieee-icassp-plenary-2016-mar24-lideng-posted.pdf|title=Plenary presentation at ICASSP-2016|date=|website=|accessdate=}}</ref>
| + | 神经网络被用于各种各样的任务,包括计算机视觉,[https://en.wikipedia.org/wiki/Speech_recognition 语音识别],[https://en.wikipedia.org/wiki/Machine_translation 机器翻译],[https://en.wikipedia.org/wiki/Social_network 社交网络]过滤,棋类游戏,视频游戏和医学诊断。 |
| | | |
− | All major commercial speech recognition systems (e.g., Microsoft Cortana, Xbox, Skype Translator, Amazon Alexa, Google Now, Apple Siri, Baidu and iFlyTek voice search, and a range of Nuance speech products, etc.) are based on deep learning.
| + | 到2017年,神经网络通常有几千到几百万个单元和数百万个连接。尽管这个数字比人脑中的神经元数量少了几个数量级,但是这些神经网络已经可以在特定任务上超越人类了。(比如图像识别和玩围棋<ref>{{Cite journal|last=Silver|first=David|last2=Huang|first2=Aja|last3=Maddison|first3=Chris J.|last4=Guez|first4=Arthur|last5=Sifre|first5=Laurent|last6=Driessche|first6=George van den|last7=Schrittwieser|first7=Julian|last8=Antonoglou|first8=Ioannis|last9=Panneershelvam|first9=Veda|date=January 2016|title=Mastering the game of Go with deep neural networks and tree search|url=https://www.nature.com/articles/nature16961|journal=Nature|volume=529|issue=7587|pages=484–489|doi=10.1038/nature16961|issn=1476-4687|pmid=26819042|bibcode=2016Natur.529..484S}}</ref>) |
| | | |
− | 所有主要的商业语音识别系统(例如,Cortana 语音,Xbox,Skype Translator,Amazon Alexa,Google Now,Apple Siri,百度和科大讯飞语音搜索,以及一系列 Nuance 语音产品等等)都是基于深度学习。
| + | ==== 深度神经网络 ==== |
| | | |
| + | 注意:这一部分对于某些读者可能过于技术化了。 |
| | | |
| + | 深度神经网络(DNN)是输入层和输出层之间隔了很多层的人工神经网络。无论是[https://en.wikipedia.org/wiki/Correlation_and_dependence 线性关系]还是非线性关系,DNN都能找到正确的数学操作来将输入转化为输出。这个网络会经过每一层来计算每个输出的概率。比如,一个用来识别狗品种的DNN会检查给定的图像然后计算这只狗屎某个品种的概率。用户可以检查结果并选择神经网络网络应该显示哪个概率(通过指定阈值),并返回期望的标签。每一个这样的数学操作都作为一个层存在,而复杂的DNN有很多层,因此叫做“深度”网络。 |
| | | |
− | === Image recognition ===
| + | DNNs可以建立复杂的非线性关。DNN构架生成的组合模型中,对象被表达为一个分层的[https://en.wikipedia.org/wiki/Primitive_data_type 原语组合]。这些额外的层使得底层特征组成成为了可能,比同样表现的浅度网络使用更少的单元来对复杂的数据进行了建模。 |
| | | |
− | {{Main|Computer vision}}
| + | 深层构架包括了许多基本方法的变体。每个构架都在特定领域取得了成功。除非在相同的数据集上对它们进行评估,否则不可能对这些构架进行性能评估。 |
| | | |
| + | DNNs的一个典型的例子就是前馈神经网络,即数据从输入层流向输出层且不会向后进行循环。一开始,DNN创建一张虚拟神经元的映射图,并给予神经元之间的连接以随机数值,或者称之为权重。权重和输入相乘并得到一个0到1之间的输出。如果网络不能准确识别特定的模型,算法就会重新调整权重。这种算法可以使得某些参数更加具有影响力,直到它确定了正确的数学操作来处理全部的数据。 |
| | | |
| + | [https://en.wikipedia.org/wiki/Recurrent_neural_network 循环神经网络](RNNs),这种网络中的数据可以向任意方向流动,通常用于[https://en.wikipedia.org/wiki/Language_model 语言模型]的建模。LSTM对于这种应用非常的有效。 |
| | | |
− | A common evaluation set for image classification is the MNIST database data set. MNIST is composed of handwritten digits and includes 60,000 training examples and 10,000 test examples. As with TIMIT, its small size lets users test multiple configurations. A comprehensive list of results on this set is available.<ref name="YANNMNIST">{{cite web|url=http://yann.lecun.com/exdb/mnist/.|title=MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges|website=yann.lecun.com}}</ref>
| + | [https://en.wikipedia.org/wiki/Convolutional_neural_network 卷积神经网络](CNNs),这种网络通常用于计算机视觉。自动语言识别(ASR)[https://en.wikipedia.org/wiki/Acoustic_model 声学模型]也使用了CNNs。 |
| | | |
− | A common evaluation set for image classification is the MNIST database data set. MNIST is composed of handwritten digits and includes 60,000 training examples and 10,000 test examples. As with TIMIT, its small size lets users test multiple configurations. A comprehensive list of results on this set is available.
| + | ==== 挑战 ==== |
| | | |
− | 图像分类常用的评价集是 MNIST 数据库数据集。Mnist 由手写数字组成,包括60,000个训练样本和10,000个测试样本。与 TIMIT 一样,它的小尺寸允许用户测试多个配置。在这个集合中有一个全面的结果列表可用。
| + | 和ANNs一样,很多问题都可能出现在直接训练的DNNs伤。这里有两个常见的问题,一个是过拟合,一个是计算成本。<ref>{{Cite journal|last=Bengio|first=Y.|last2=Boulanger-Lewandowski|first2=N.|last3=Pascanu|first3=R.|date=May 2013|title=Advances in optimizing recurrent networks|url=http://ieeexplore.ieee.org/document/6639349/|journal=2013 IEEE International Conference on Acoustics, Speech and Signal Processing|pages=8624–8628|doi=10.1109/icassp.2013.6639349|isbn=978-1-4799-0356-6|arxiv=1212.0901}}</ref> |
| | | |
| + | 由于抽象层的增加,所以DNNs很容易过拟合,这使得模型不能对训练数据独立。一些正则化方法,比如Ivakhnenko的单元修剪和[https://en.wikipedia.org/wiki/Regularization_(mathematics)#Regularization_in_statistics_and_machine_learning 权重衰减](L2正则化)以及[https://en.wikipedia.org/wiki/Sparse_matrix sparsity](L1正则化)可以在训练期间使用以对抗过拟合。或者,在训练期间,将神经元从隐层中随机地移除。这有助于排除一些罕见的依赖关系。最后,数据可以通过裁剪和旋转等方法得到增强,这样可以在小数据集上降低过拟合的可能性。 |
| | | |
| + | DNNs必须考虑很多的训练的参数,比如规模(层的数量和每层的单元数量),学习速率,初始权重。由于时间成本和计算资源的限制,[https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search 遍历参数空间]来获得最佳的参数是不可能的。就有了各种各样的技巧来加速计算,比如分批(一次只使用一部分的样本来计算梯度)。多核架构(GPUs, Intel Xeon Phi)大型处理能力以及大幅度加速的训练的过程,因为这种处理架构非常适合于矩阵和向量计算。 |
| + | 或者,工程师可能会寻找其他类型的神经网络,这些网络具有更直接和具有收敛性的训练算法。<ref name="DAHL2013">{{Cite journal|last=Dahl|first=G.|display-authors=etal|date=2013|title=Improving DNNs for LVCSR using rectified linear units and dropout|url=http://www.cs.toronto.edu/~gdahl/papers/reluDropoutBN_icassp2013.pdf|journal=ICASSP|pages=|via=}}</ref>[https://en.wikipedia.org/wiki/Cerebellar_model_articulation_controller 小脑关节控制器模型](CMAC)就是一个不需要学习速率和随机初始权重的模型。训练在新的一批数据下可以保证一步就收敛,并且训练算法的复杂度与所涉及的神经元数量呈线性关系。<ref>{{cite web|last1=You|first1=Yang|last2=Buluç|first2=Aydın|last3=Demmel|first3=James|title=Scaling deep learning on GPU and knights landing clusters|url=https://dl.acm.org/citation.cfm?doid=3126908.3126912|publisher=SC '17, ACM|date=November 2017|accessdate=5 March 2018}}</ref><ref>{{cite journal|last1=Viebke|first1=André|last2=Memeti|first2=Suejb|last3=Pllana|first3=Sabri|last4=Abraham|first4=Ajith|title=CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi|journal=The Journal of Supercomputing|date=March 2017|pages=1–31|doi=10.1007/s11227-017-1994-x|accessdate=}}</ref> |
| | | |
− | Deep learning-based image recognition has become "superhuman", producing more accurate results than human contestants. This first occurred in 2011.<ref name=":7">{{Cite journal|last=Cireşan|first=Dan|last2=Meier|first2=Ueli|last3=Masci|first3=Jonathan|last4=Schmidhuber|first4=Jürgen|date=August 2012|title=Multi-column deep neural network for traffic sign classification|journal=Neural Networks|series=Selected Papers from IJCNN 2011|volume=32|pages=333–338|doi=10.1016/j.neunet.2012.02.023|pmid=22386783|citeseerx=10.1.1.226.8219}}</ref>
| + | === 应用 === |
| | | |
− | Deep learning-based image recognition has become "superhuman", producing more accurate results than human contestants. This first occurred in 2011.
| + | ==== 自动[https://en.wikipedia.org/wiki/Speech_recognition 语音识别] ==== |
| | | |
− | 基于深度学习的图像识别已经成为“超人” ,比人类参赛者产生更准确的结果。这第一次发生在2011年。
| + | 大规模自动语音识别是深度学习第一个也是最有说服力的成功案例。<ref name="SCHIDHUB"/> LSTM RNNs可以学习非常深结构的任务,这些任务涉及到了包含数千个离散时间步分割的语音事件的多时间间隔,其中一个时间步大概是10 ms。LSTM有遗忘门,这让它在某些任务上对传统语音识别器具有竞争力。<ref name="graves2003"/> |
| | | |
| + | 语音识别的初步成果是基于基于TIMIT的小规模语音识别任务。这套数据包含了来自[https://en.wikipedia.org/wiki/American_English 美国英语]八大[https://en.wikipedia.org/wiki/Dialect 方言]的630位说话者,每个说话者读10个句子。它的小规模可以让许多配置都可以尝试。更重要的是,TIMIT任务涉及到音素识别,这与单词识别不同,这需要弱音素[https://en.wikipedia.org/wiki/Bigram 双语]语言模型的支持。这使得语音识别的声学建模方面的强度更加容易分析。下面列出1991以来的错误率,包括了早期的结果和以PER方式衡量的。 |
| | | |
| + | 【表格】 |
| + | DNNs语者识别在1990年代末首次亮相,语音识别大概在2009到2011之间,而LSTM大概在2003至2007年,在以下的8个主要方面加快了进程:<ref name="BOOK2014" /><ref name="interspeech2014Keynote" /><ref name="ReferenceA" /> |
| + | *扩大和加速了DNN训练和解码。 |
| + | *序列判别训练。 |
| + | *深度模型的特征预处理的基本机制的深刻理解。 |
| + | *使用DNNs进行[https://en.wikipedia.org/wiki/Multi-task_learning 多任务]和[https://en.wikipedia.org/wiki/Transfer_learning 迁移学习]相关的深度模型。 |
| + | *如何设计CNNs来更好识别[https://en.wikipedia.org/wiki/Domain_knowledge 领域特定]的说话方式。 |
| + | *RNN和基于RNN的LSTM的丰富变体。 |
| + | *其他类型的深度学习模型,包括了基于张量的模型和整合深度生成/辨别式的模型。 |
| + | 所有的商业语音识别系统(Microsoft [https://en.wikipedia.org/wiki/Cortana Cortana]、[https://en.wikipedia.org/wiki/Xbox xbox]、[https://en.wikipedia.org/wiki/Skype_Translator Skype Translator]、[https://en.wikipedia.org/wiki/Amazon_Alexa Amazon Alexa]、[https://en.wikipedia.org/wiki/Google_Now Google Now], [https://en.wikipedia.org/wiki/Siri Apple Siri],[https://en.wikipedia.org/wiki/Baidu Baidu] 和[https://en.wikipedia.org/wiki/IFlytek iFlyTek ]语音搜索和一系列的[https://en.wikipedia.org/wiki/Nuance_Communications Nuance]语音产品)都是基于深度学习的。<ref name=BOOK2014 /><ref>{{Cite web|url=https://www.wired.com/2014/12/skype-used-ai-build-amazing-new-language-translator/|title=How Skype Used AI to Build Its Amazing New Language Translator {{!}} WIRED|website=www.wired.com|accessdate=2017-06-14}}</ref><ref name="Baidu">{{cite arxiv |eprint=1412.5567|last1=Hannun|first1=Awni|title=Deep Speech: Scaling up end-to-end speech recognition|last2=Case|first2=Carl|last3=Casper|first3=Jared|last4=Catanzaro|first4=Bryan|last5=Diamos|first5=Greg|last6=Elsen|first6=Erich|last7=Prenger|first7=Ryan|last8=Satheesh|first8=Sanjeev|last9=Sengupta|first9=Shubho|last10=Coates|first10=Adam|last11=Ng|first11=Andrew Y|class=cs.CL|year=2014}}</ref><ref>{{Cite web|url=http://research.microsoft.com/en-US/people/deng/ieee-icassp-plenary-2016-mar24-lideng-posted.pdf|title=Plenary presentation at ICASSP-2016|date=|website=|publisher=|accessdate=}}</ref> |
| | | |
− | Deep learning-trained vehicles now interpret 360° camera views.<ref>[http://www.technologyreview.com/news/533936/nvidia-demos-a-car-computer-trained-with-deep-learning/ Nvidia Demos a Car Computer Trained with "Deep Learning"] (2015-01-06), David Talbot, ''[[MIT Technology Review]]''</ref> Another example is Facial Dysmorphology Novel Analysis (FDNA) used to analyze cases of human malformation connected to a large database of genetic syndromes.
| |
| | | |
− | Deep learning-trained vehicles now interpret 360° camera views. Another example is Facial Dysmorphology Novel Analysis (FDNA) used to analyze cases of human malformation connected to a large database of genetic syndromes.
| + | ==== 图像识别 ==== |
| | | |
− | 深度学习训练的车辆现在可以解释360个摄像机视图。另一个例子是面部形态学异常新颖分析(FDNA)用于分析与大型遗传综合征数据库有关的人类畸形病例。
| + | 图像分类的一个常用的评估集是MNIST数据集。MNIST是由手写数字组成的,包括了6万个训练示例和10000个测试示例。和TIMIT一样,它的小规模运行用户尝试多种配置。关于这个评估集的一个结果在这里可以找到。<ref name="YANNMNIST">{{cite web|url=http://yann.lecun.com/exdb/mnist/.|title=MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges|website=yann.lecun.com}}</ref> |
| | | |
| + | 基于深度学习的图像识别已经成为了“超人”,比人类的参赛者准确率更高。这件事首先发生在2011年。<ref name=":7">{{Cite journal|last=Cireşan|first=Dan|last2=Meier|first2=Ueli|last3=Masci|first3=Jonathan|last4=Schmidhuber|first4=Jürgen|date=August 2012|title=Multi-column deep neural network for traffic sign classification|url=http://www.sciencedirect.com/science/article/pii/S0893608012000524|journal=Neural Networks|series=Selected Papers from IJCNN 2011|volume=32|pages=333–338|doi=10.1016/j.neunet.2012.02.023}}</ref> |
| | | |
| + | 经过深度学习训练的车辆现在可以理解360度相机的视图。<ref>[http://www.technologyreview.com/news/533936/nvidia-demos-a-car-computer-trained-with-deep-learning/ Nvidia Demos a Car Computer Trained with "Deep Learning"] (2015-01-06), David Talbot, ''[[MIT Technology Review]]''</ref>另一个例子是FDNA用于分析人类畸形和基因间的关系。 |
| | | |
− | === Visual art processing === | + | ==== 视觉艺术处理 ==== |
| | | |
− | Closely related to the progress that has been made in image recognition is the increasing application of deep learning techniques to various visual art tasks. DNNs have proven themselves capable, for example, of a) identifying the style period of a given painting, b) [[Neural Style Transfer]] - capturing the style of a given artwork and applying it in a visually pleasing manner to an arbitrary photograph or video, and c) generating striking imagery based on random visual input fields.<ref>{{cite journal |author1=G. W. Smith|author2=Frederic Fol Leymarie|date=10 April 2017|title=The Machine as Artist: An Introduction|journal=Arts|volume=6|issue=4|pages=5|doi=10.3390/arts6020005}}</ref><ref>{{cite journal |author=Blaise Agüera y Arcas|date=29 September 2017|title=Art in the Age of Machine Intelligence|journal=Arts|volume=6|issue=4|pages=18|doi=10.3390/arts6040018}}</ref>
| + | 因为图像识别方面的进步,应用于不同视觉艺术任务的应用也在增加。例如,DNNs已经可以做到:1. 识别给定图像是属于哪个时期的风格,2. 捕捉给定图像的风格并将其应用其他任意照片上产生良好的视觉效果,3. 用随机涂鸦生成完整图像<ref>{{cite web |url=http://www.mdpi.com/2076-0752/6/2/5|author1=G. W. Smith|author2=Frederic Fol Leymarie|date=10 April 2017|title=The Machine as Artist: An Introduction|publisher=Arts|accessdate=4 October 2017}}</ref><ref>{{cite web |url=http://www.mdpi.com/2076-0752/6/4/18|author=Blaise Agüera y Arcas|date=29 September 2017|title=Art in the Age of Machine Intelligence|publisher=Arts|accessdate=4 October 2017}}</ref> |
| | | |
− | Closely related to the progress that has been made in image recognition is the increasing application of deep learning techniques to various visual art tasks. DNNs have proven themselves capable, for example, of a) identifying the style period of a given painting, b) Neural Style Transfer - capturing the style of a given artwork and applying it in a visually pleasing manner to an arbitrary photograph or video, and c) generating striking imagery based on random visual input fields.
| + | ==== [https://en.wikipedia.org/wiki/Natural_language_processing 自然语言处理] ==== |
| | | |
− | 与图像识别技术的进步密切相关的是深度学习技术在各种视觉艺术任务中的应用日益增多。例如,dnn 已经证明自己能够: a)识别特定绘画的风格时期; b)神经风格转换——捕捉特定艺术作品的风格,并以视觉愉悦的方式将其应用于任意的图片或视频; c)基于随机视觉输入域生成引人注目的图像。
| + | 自2000年代初依赖,神经网络一直被用于实现语言模型。<ref name="gers2001" /><ref>{{Cite journal|last=Bengio|first=Yoshua|last2=Ducharme|first2=Réjean|last3=Vincent|first3=Pascal|last4=Janvin|first4=Christian|date=March 2003|title=A Neural Probabilistic Language Model|url=http://dl.acm.org/citation.cfm?id=944919.944966|journal=J. Mach. Learn. Res.|volume=3|pages=1137–1155|issn=1532-4435}}</ref>LSTM帮助提高了机器翻译和语言建模能力。<ref name="NIPS2014" /><ref name="vinyals2016" /><ref name="gillick2015" /> |
| | | |
| + | 这一领域的其他关键技术包括了负采样<ref name="GoldbergLevy2014">{{cite arXiv|last1=Goldberg|first1=Yoav|last2=Levy|first2=Omar|title=word2vec Explained: Deriving Mikolov et al.'s Negative-Sampling Word-Embedding Method|eprint=1402.3722|class=cs.CL|year=2014}}</ref> 和[https://en.wikipedia.org/wiki/Word_embedding 词嵌入]。词嵌入,比如[https://en.wikipedia.org/wiki/Word2vec word2vec],可以理解成一个表示层,在一个深度学习构架中,将原词汇转化成相对于数据集中其他单词的位置表示;<ref name=":7">{{Cite journal|last=Cireşan|first=Dan|last2=Meier|first2=Ueli|last3=Masci|first3=Jonathan|last4=Schmidhuber|first4=Jürgen|date=August 2012|title=Multi-column deep neural network for traffic sign classification|url=http://www.sciencedirect.com/science/article/pii/S0893608012000524|journal=Neural Networks|series=Selected Papers from IJCNN 2011|volume=32|pages=333–338|doi=10.1016/j.neunet.2012.02.023}}</ref>这个位置被表示成[https://en.wikipedia.org/wiki/Vector_space 矢量空间]中的一个点。<ref name="SocherManning2014">{{cite web|last1=Socher|first1=Richard|last2=Manning|first2=Christopher|title=Deep Learning for NLP|url=http://nlp.stanford.edu/courses/NAACL2013/NAACL2013-Socher-Manning-DeepLearning.pdf|accessdate=26 October 2014}}</ref>使用词嵌入作为RNN的输入层,这允许网络使用一个高效组合向量语法来解析词句。一个组合向量语法可以被认为是使用RNN实现的[https://en.wikipedia.org/wiki/Probabilistic_context-free_grammar 非语法概率语境](PCFG)。<ref>{{Cite paper |url= http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf|title = Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank|last = Socher|first = Richard|date = 2013 |accessdate = |doi = |pmid =}}</ref> information retrieval,<ref>{{Cite journal|last=Shen|first=Yelong|last2=He|first2=Xiaodong|last3=Gao|first3=Jianfeng|last4=Deng|first4=Li|last5=Mesnil|first5=Gregoire|date=2014-11-01|title=A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval|url=https://www.microsoft.com/en-us/research/publication/a-latent-semantic-model-with-convolutional-pooling-structure-for-information-retrieval/|journal=Microsoft Research}}</ref><ref>{{Cite journal|last=Huang|first=Po-Sen|last2=He|first2=Xiaodong|last3=Gao|first3=Jianfeng|last4=Deng|first4=Li|last5=Acero|first5=Alex|last6=Heck|first6=Larry|date=2013-10-01|title=Learning Deep Structured Semantic Models for Web Search using Clickthrough Data|url=https://www.microsoft.com/en-us/research/publication/learning-deep-structured-semantic-models-for-web-search-using-clickthrough-data/|journal=Microsoft Research}}</ref> spoken language understanding,<ref name="IEEE-TASL2015">{{cite journal | last1 = Mesnil | first1 = G. | last2 = Dauphin | first2 = Y. | last3 = Yao | first3 = K. | last4 = Bengio | first4 = Y. | last5 = Deng | first5 = L. | last6 = Hakkani-Tur | first6 = D. | last7 = He | first7 = X. | last8 = Heck | first8 = L. | last9 = Tur | first9 = G. | last10 = Yu | first10 = D. | last11 = Zweig | first11 = G. | year = 2015 | title = Using recurrent neural networks for slot filling in spoken language understanding | url= | journal = IEEE Transactions on Audio, Speech, and Language Processing | volume = 23 | issue = 3| pages = 530–539 | doi=10.1109/taslp.2014.2383614}}</ref> machine translation,<ref name="NIPS2014">{{Cite journal|last=Sutskever|first=L.|last2=Vinyals|first2=O.|last3=Le|first3=Q.|date=2014|title=Sequence to Sequence Learning with Neural Networks|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|journal=Proc. NIPS|pages=|via=}}</ref><ref name="auto">{{Cite journal|last=Gao|first=Jianfeng|last2=He|first2=Xiaodong|last3=Yih|first3=Scott Wen-tau|last4=Deng|first4=Li|date=2014-06-01|title=Learning Continuous Phrase Representations for Translation Modeling|url=https://www.microsoft.com/en-us/research/publication/learning-continuous-phrase-representations-for-translation-modeling/|journal=Microsoft Research}}</ref> contextual entity linking,<ref name="auto"/> writing style recognition,<ref name="BROC2017">Brocardo ML, Traore I, Woungang I, Obaidat MS. "[http://onlinelibrary.wiley.com/doi/10.1002/dac.3259/full Authorship verification using deep belief network systems]". Int J Commun Syst. 2017. doi:10.1002/dac.3259</ref>基于词嵌入的递归自编码机可以评估句子的相似性并检测其释义。深度神经网络结构在语法分析,[https://en.wikipedia.org/wiki/Statistical_parsing 词汇分析],[https://en.wikipedia.org/wiki/Sentiment_analysis 信息检索],口头语言理解,机器翻译,语境实体联系,写作风格识别,文本分类等领域都取得了最好的结果。<ref>{{Cite news|url=https://www.microsoft.com/en-us/research/project/deep-learning-for-natural-language-processing-theory-and-practice-cikm2014-tutorial/|title=Deep Learning for Natural Language Processing: Theory and Practice (CIKM2014 Tutorial) - Microsoft Research|work=Microsoft Research|accessdate=2017-06-14}}</ref> |
| | | |
| + | 谷歌翻译(Google Translate)使用了一个庞大的端到端LSTM网络。谷歌神经机器翻译(Google Neural Machine Translation)使用了一种基于示例的机器翻译方法,该系统从数百万个例子中学习。它可以将整个句子一次翻译出来,而不是一个片段地翻译。谷歌翻译支持超过100种语言。<ref>{{Cite paper |url= http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf|title = Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank|last = Socher|first = Richard|date = 2013 |accessdate = |doi = |pmid =}}</ref> information retrieval,<ref>{{Cite journal|last=Shen|first=Yelong|last2=He|first2=Xiaodong|last3=Gao|first3=Jianfeng|last4=Deng|first4=Li|last5=Mesnil|first5=Gregoire|date=2014-11-01|title=A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval|url=https://www.microsoft.com/en-us/research/publication/a-latent-semantic-model-with-convolutional-pooling-structure-for-information-retrieval/|journal=Microsoft Research}}</ref><ref>{{Cite journal|last=Huang|first=Po-Sen|last2=He|first2=Xiaodong|last3=Gao|first3=Jianfeng|last4=Deng|first4=Li|last5=Acero|first5=Alex|last6=Heck|first6=Larry|date=2013-10-01|title=Learning Deep Structured Semantic Models for Web Search using Clickthrough Data|url=https://www.microsoft.com/en-us/research/publication/learning-deep-structured-semantic-models-for-web-search-using-clickthrough-data/|journal=Microsoft Research}}</ref> spoken language understanding,<ref name="IEEE-TASL2015">{{cite journal | last1 = Mesnil | first1 = G. | last2 = Dauphin | first2 = Y. | last3 = Yao | first3 = K. | last4 = Bengio | first4 = Y. | last5 = Deng | first5 = L. | last6 = Hakkani-Tur | first6 = D. | last7 = He | first7 = X. | last8 = Heck | first8 = L. | last9 = Tur | first9 = G. | last10 = Yu | first10 = D. | last11 = Zweig | first11 = G. | year = 2015 | title = Using recurrent neural networks for slot filling in spoken language understanding | url= | journal = IEEE Transactions on Audio, Speech, and Language Processing | volume = 23 | issue = 3| pages = 530–539 | doi=10.1109/taslp.2014.2383614}}</ref> machine translation,<ref name="NIPS2014">{{Cite journal|last=Sutskever|first=L.|last2=Vinyals|first2=O.|last3=Le|first3=Q.|date=2014|title=Sequence to Sequence Learning with Neural Networks|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|journal=Proc. NIPS|pages=|via=}}</ref>神经网络赋予了句子语义,而不是简单的记忆词到词的翻译。谷歌翻译将英语作为大多数语言翻译直接的中间体。<ref>{{Cite news|url=https://www.microsoft.com/en-us/research/project/deep-learning-for-natural-language-processing-theory-and-practice-cikm2014-tutorial/|title=Deep Learning for Natural Language Processing: Theory and Practice (CIKM2014 Tutorial) - Microsoft Research|work=Microsoft Research|accessdate=2017-06-14}}</ref> |
| | | |
− | === Natural language processing === | + | ==== 药物发现及药理学 ==== |
− | | |
− | {{Main|Natural language processing}}
| |
− | | |
− | Neural networks have been used for implementing language models since the early 2000s.<ref name="gers2001" /><ref>{{Cite journal|last=Bengio|first=Yoshua|last2=Ducharme|first2=Réjean|last3=Vincent|first3=Pascal|last4=Janvin|first4=Christian|date=March 2003|title=A Neural Probabilistic Language Model|url=http://dl.acm.org/citation.cfm?id=944919.944966|journal=J. Mach. Learn. Res.|volume=3|pages=1137–1155|issn=1532-4435}}</ref> LSTM helped to improve machine translation and language modeling.<ref name="NIPS2014" /><ref name="vinyals2016" /><ref name="gillick2015" />
| |
− | | |
− | Neural networks have been used for implementing language models since the early 2000s. LSTM helped to improve machine translation and language modeling.
| |
− | | |
− | 自2000年以来,神经网络已经被用于实现语言模型。Lstm 有助于改进机器翻译和语言建模。
| |
− | | |
− | | |
− | | |
− | Other key techniques in this field are negative sampling<ref name="GoldbergLevy2014">{{cite arXiv|last1=Goldberg|first1=Yoav|last2=Levy|first2=Omar|title=word2vec Explained: Deriving Mikolov et al.'s Negative-Sampling Word-Embedding Method|eprint=1402.3722|class=cs.CL|year=2014}}</ref> and [[word embedding]]. Word embedding, such as ''[[word2vec]]'', can be thought of as a representational layer in a deep learning architecture that transforms an atomic word into a positional representation of the word relative to other words in the dataset; the position is represented as a point in a [[vector space]]. Using word embedding as an RNN input layer allows the network to parse sentences and phrases using an effective compositional vector grammar. A compositional vector grammar can be thought of as [[probabilistic context free grammar]] (PCFG) implemented by an RNN.<ref name="SocherManning2014">{{cite web|last1=Socher|first1=Richard|last2=Manning|first2=Christopher|title=Deep Learning for NLP|url=http://nlp.stanford.edu/courses/NAACL2013/NAACL2013-Socher-Manning-DeepLearning.pdf|accessdate=26 October 2014}}</ref> Recursive auto-encoders built atop word embeddings can assess sentence similarity and detect paraphrasing.<ref name="SocherManning2014" /> Deep neural architectures provide the best results for [[Statistical parsing|constituency parsing]],<ref>{{Cite journal |url= http://aclweb.org/anthology/P/P13/P13-1045.pdf|title = Parsing With Compositional Vector Grammars|last = Socher|first = Richard|date = 2013|journal = Proceedings of the ACL 2013 Conference|accessdate = |doi = |pmid = |last2 = Bauer|first2 = John|last3 = Manning|first3 = Christopher|last4 = Ng|first4 = Andrew}}</ref> [[sentiment analysis]],<ref>{{Cite journal |url= http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf|title = Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank|last = Socher|first = Richard|date = 2013 |accessdate = |doi = |pmid =}}</ref> information retrieval,<ref>{{Cite journal|last=Shen|first=Yelong|last2=He|first2=Xiaodong|last3=Gao|first3=Jianfeng|last4=Deng|first4=Li|last5=Mesnil|first5=Gregoire|date=2014-11-01|title=A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval|url=https://www.microsoft.com/en-us/research/publication/a-latent-semantic-model-with-convolutional-pooling-structure-for-information-retrieval/|journal=Microsoft Research}}</ref><ref>{{Cite journal|last=Huang|first=Po-Sen|last2=He|first2=Xiaodong|last3=Gao|first3=Jianfeng|last4=Deng|first4=Li|last5=Acero|first5=Alex|last6=Heck|first6=Larry|date=2013-10-01|title=Learning Deep Structured Semantic Models for Web Search using Clickthrough Data|url=https://www.microsoft.com/en-us/research/publication/learning-deep-structured-semantic-models-for-web-search-using-clickthrough-data/|journal=Microsoft Research}}</ref> spoken language understanding,<ref name="IEEE-TASL2015">{{cite journal | last1 = Mesnil | first1 = G. | last2 = Dauphin | first2 = Y. | last3 = Yao | first3 = K. | last4 = Bengio | first4 = Y. | last5 = Deng | first5 = L. | last6 = Hakkani-Tur | first6 = D. | last7 = He | first7 = X. | last8 = Heck | first8 = L. | last9 = Tur | first9 = G. | last10 = Yu | first10 = D. | last11 = Zweig | first11 = G. | year = 2015 | title = Using recurrent neural networks for slot filling in spoken language understanding | url= https://www.semanticscholar.org/paper/41911ef90a225a82597a2b576346759ea9c34247| journal = IEEE Transactions on Audio, Speech, and Language Processing | volume = 23 | issue = 3| pages = 530–539 | doi=10.1109/taslp.2014.2383614}}</ref> machine translation,<ref name="NIPS2014">{{Cite journal|last=Sutskever|first=L.|last2=Vinyals|first2=O.|last3=Le|first3=Q.|date=2014|title=Sequence to Sequence Learning with Neural Networks|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|journal=Proc. NIPS|pages=|via=|bibcode=2014arXiv1409.3215S|arxiv=1409.3215}}</ref><ref name="auto">{{Cite journal|last=Gao|first=Jianfeng|last2=He|first2=Xiaodong|last3=Yih|first3=Scott Wen-tau|last4=Deng|first4=Li|date=2014-06-01|title=Learning Continuous Phrase Representations for Translation Modeling|url=https://www.microsoft.com/en-us/research/publication/learning-continuous-phrase-representations-for-translation-modeling/|journal=Microsoft Research}}</ref> contextual entity linking,<ref name="auto"/> writing style recognition,<ref name="BROC2017">{{Cite journal |doi = 10.1002/dac.3259|title = Authorship verification using deep belief network systems|journal = International Journal of Communication Systems|volume = 30|issue = 12|pages = e3259|year = 2017|last1 = Brocardo|first1 = Marcelo Luiz|last2 = Traore|first2 = Issa|last3 = Woungang|first3 = Isaac|last4 = Obaidat|first4 = Mohammad S.}}</ref> Text classification and others.<ref>{{Cite news|url=https://www.microsoft.com/en-us/research/project/deep-learning-for-natural-language-processing-theory-and-practice-cikm2014-tutorial/|title=Deep Learning for Natural Language Processing: Theory and Practice (CIKM2014 Tutorial) - Microsoft Research|work=Microsoft Research|accessdate=2017-06-14}}</ref>
| |
− | | |
− | Other key techniques in this field are negative sampling and word embedding. Word embedding, such as word2vec, can be thought of as a representational layer in a deep learning architecture that transforms an atomic word into a positional representation of the word relative to other words in the dataset; the position is represented as a point in a vector space. Using word embedding as an RNN input layer allows the network to parse sentences and phrases using an effective compositional vector grammar. A compositional vector grammar can be thought of as probabilistic context free grammar (PCFG) implemented by an RNN. Recursive auto-encoders built atop word embeddings can assess sentence similarity and detect paraphrasing. sentiment analysis, information retrieval, spoken language understanding, machine translation, contextual entity linking, Text classification and others.
| |
− | | |
− | 该领域的其他关键技术包括负采样和字嵌入。单词嵌入,比如 word2vec,可以被认为是深度学习架构中的一个表征层,该架构将一个原子单词转换为该单词相对于数据集中其他单词的位置表示; 位置表示为矢量空间中的一个点。使用词嵌入作为一个 RNN 输入层允许网络解析句子和短语使用一个有效的组合向量文法。合成向量文法可以看作是由 RNN 实现的概率上下文无关文法(PCFG)。构建在词嵌入之上的递归自动编码器可以评估句子相似度和检测复述。情感分析、信息检索分析、口语理解、机器翻译、语境实体链接、文本分类等。
| |
− | | |
− | | |
− | | |
− | Recent developments generalize [[word embedding]] to [[sentence embedding]].
| |
− | | |
− | Recent developments generalize word embedding to sentence embedding.
| |
− | | |
− | 最近的发展将嵌入词概括为嵌入句。
| |
− | | |
− | | |
− | | |
− | [[Google Translate]] (GT) uses a large [[End-to-end principle|end-to-end]] long short-term memory network.<ref name="GT_Turovsky_2016">{{cite web|url=https://blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/|title=Found in translation: More accurate, fluent sentences in Google Translate|last=Turovsky|first=Barak|date=November 15, 2016|website=The Keyword Google Blog|accessdate=March 23, 2017}}</ref><ref name="googleblog_GNMT_2016">{{cite web|url=https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html|title=Zero-Shot Translation with Google's Multilingual Neural Machine Translation System|last1=Schuster|first1=Mike|last2=Johnson|first2=Melvin|date=November 22, 2016|website=Google Research Blog|accessdate=March 23, 2017|last3=Thorat|first3=Nikhil}}</ref><ref name="lstm1997">{{Cite journal|author=Sepp Hochreiter|author2=Jürgen Schmidhuber|year=1997|title=Long short-term memory|url=https://www.researchgate.net/publication/13853244|journal=[[Neural Computation (journal)|Neural Computation]]|volume=9|issue=8|pages=1735–1780|doi=10.1162/neco.1997.9.8.1735|pmid=9377276|via=}}</ref><ref name="lstm2000">{{Cite journal|author=Felix A. Gers|author2=Jürgen Schmidhuber|author3=Fred Cummins|year=2000|title=Learning to Forget: Continual Prediction with LSTM|journal=[[Neural Computation (journal)|Neural Computation]]|volume=12|issue=10|pages=2451–2471|doi=10.1162/089976600300015015|pmid=11032042|citeseerx=10.1.1.55.5709|url=https://www.semanticscholar.org/paper/11540131eae85b2e11d53df7f1360eeb6476e7f4}}</ref><ref name="GoogleTranslate">{{cite arXiv |eprint=1609.08144|last1=Wu|first1=Yonghui|title=Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation|last2=Schuster|first2=Mike|last3=Chen|first3=Zhifeng|last4=Le|first4=Quoc V|last5=Norouzi|first5=Mohammad|last6=Macherey|first6=Wolfgang|last7=Krikun|first7=Maxim|last8=Cao|first8=Yuan|last9=Gao|first9=Qin|last10=Macherey|first10=Klaus|last11=Klingner|first11=Jeff|last12=Shah|first12=Apurva|last13=Johnson|first13=Melvin|last14=Liu|first14=Xiaobing|last15=Kaiser|first15=Łukasz|last16=Gouws|first16=Stephan|last17=Kato|first17=Yoshikiyo|last18=Kudo|first18=Taku|last19=Kazawa|first19=Hideto|last20=Stevens|first20=Keith|last21=Kurian|first21=George|last22=Patil|first22=Nishant|last23=Wang|first23=Wei|last24=Young|first24=Cliff|last25=Smith|first25=Jason|last26=Riesa|first26=Jason|last27=Rudnick|first27=Alex|last28=Vinyals|first28=Oriol|last29=Corrado|first29=Greg|last30=Hughes|first30=Macduff|display-authors=29|class=cs.CL|year=2016}}</ref><ref name="WiredGoogleTranslate">"An Infusion of AI Makes Google Translate More Powerful Than Ever." Cade Metz, WIRED, Date of Publication: 09.27.16. https://www.wired.com/2016/09/google-claims-ai-breakthrough-machine-translation/</ref> [[Google Neural Machine Translation|Google Neural Machine Translation (GNMT)]] uses an [[example-based machine translation]] method in which the system "learns from millions of examples."<ref name="googleblog_GNMT_2016" /> It translates "whole sentences at a time, rather than pieces. Google Translate supports over one hundred languages.<ref name="googleblog_GNMT_2016" /> The network encodes the "semantics of the sentence rather than simply memorizing phrase-to-phrase translations".<ref name="googleblog_GNMT_2016" /><ref name="Biotet">{{cite web|url=http://www-clips.imag.fr/geta/herve.blanchon/Pdfs/NLP-KE-10.pdf|title=MT on and for the Web|last1=Boitet|first1=Christian|last2=Blanchon|first2=Hervé|date=2010|accessdate=December 1, 2016|last3=Seligman|first3=Mark|last4=Bellynck|first4=Valérie}}</ref> GT uses English as an intermediate between most language pairs.<ref name="Biotet" />
| |
− | | |
− | Google Translate (GT) uses a large end-to-end long short-term memory network. Google Neural Machine Translation (GNMT) uses an example-based machine translation method in which the system "learns from millions of examples." GT uses English as an intermediate between most language pairs.
| |
− | | |
− | 谷歌翻译(GT)使用一个大型的端到端长短期记忆网络。Google 神经机器翻译(GNMT)使用了一种基于示例的机器翻译方法,在这种方法中,系统“从数百万个示例中学习”Gt 使用英语作为大多数语言对的中间语。
| |
− | | |
− | | |
− | | |
− | === Drug discovery and toxicology ===
| |
− | | |
− | {{For|more information|Drug discovery|Toxicology}}
| |
− | | |
− | A large percentage of candidate drugs fail to win regulatory approval. These failures are caused by insufficient efficacy (on-target effect), undesired interactions (off-target effects), or unanticipated [[Toxicity|toxic effects]].<ref name="ARROWSMITH2013">{{Cite journal
| |
− | | |
− | A large percentage of candidate drugs fail to win regulatory approval. These failures are caused by insufficient efficacy (on-target effect), undesired interactions (off-target effects), or unanticipated toxic effects.<ref name="ARROWSMITH2013">{{Cite journal
| |
− | | |
− | 很大一部分候选药物未能获得监管部门的批准。这些失败是由于功效不足(靶向效应) ,非预期的相互作用(靶向效应) ,或者意外的毒性效应
| |
− | | |
− | | pmid = 23903212
| |
| | | |
| + | 有很大一部分候选的药物未能获得监管部门的批准。这是因为效果不足(目标效果),不理想的相互作用(非目标效果)或者意料之外的毒性影响。<ref name="ARROWSMITH2013">{{Cite journal |
| | pmid = 23903212 | | | pmid = 23903212 |
− |
| |
− | 23903212
| |
− |
| |
| | year = 2013 | | | year = 2013 |
− | | + | | author1 = Arrowsmith |
− | | year = 2013 | |
− | | |
− | 2013年
| |
− | | |
− | | last1 = Arrowsmith
| |
− | | |
− | | last1 = Arrowsmith
| |
− | | |
− | 最后一个 Arrowsmith
| |
− | | |
| | first1 = J | | | first1 = J |
− |
| |
− | | first1 = J
| |
− |
| |
− | 1 j
| |
− |
| |
− | | title = Trial watch: Phase II and phase III attrition rates 2011-2012
| |
− |
| |
| | title = Trial watch: Phase II and phase III attrition rates 2011-2012 | | | title = Trial watch: Phase II and phase III attrition rates 2011-2012 |
− |
| |
− | 审判观察: 2011-2012年第二阶段和第三阶段的减员率
| |
− |
| |
− | | journal = Nature Reviews Drug Discovery
| |
− |
| |
| | journal = Nature Reviews Drug Discovery | | | journal = Nature Reviews Drug Discovery |
− |
| |
− | 自然杂志评论药物发现
| |
− |
| |
− | | volume = 12
| |
− |
| |
| | volume = 12 | | | volume = 12 |
− |
| |
− | 第12卷
| |
− |
| |
− | | issue = 8
| |
− |
| |
| | issue = 8 | | | issue = 8 |
− |
| |
− | 第八期
| |
− |
| |
| | pages = 569 | | | pages = 569 |
− |
| |
− | | pages = 569
| |
− |
| |
− | 第569页
| |
− |
| |
| | last2 = Miller | | | last2 = Miller |
− |
| |
− | | last2 = Miller
| |
− |
| |
− | 2 Miller
| |
− |
| |
| | first2 = P | | | first2 = P |
− |
| |
− | | first2 = P
| |
− |
| |
− | | first2 p
| |
− |
| |
− | | doi = 10.1038/nrd4090
| |
− |
| |
| | doi = 10.1038/nrd4090 | | | doi = 10.1038/nrd4090 |
− |
| |
− | 10.1038 / nrd4090
| |
− |
| |
− | | url = https://www.semanticscholar.org/paper/9ab0f468a64762ca5069335c776e1ab07fa2b3e2
| |
− |
| |
− | | url = https://www.semanticscholar.org/paper/9ab0f468a64762ca5069335c776e1ab07fa2b3e2
| |
− |
| |
− | Https://www.semanticscholar.org/paper/9ab0f468a64762ca5069335c776e1ab07fa2b3e2
| |
− |
| |
− | }}</ref><ref name="VERBIEST2015">{{Cite journal
| |
− |
| |
| }}</ref><ref name="VERBIEST2015">{{Cite journal | | }}</ref><ref name="VERBIEST2015">{{Cite journal |
− |
| |
− | 2015"{ Cite journal"
| |
− |
| |
− | | pmid = 25582842
| |
− |
| |
| | pmid = 25582842 | | | pmid = 25582842 |
− |
| |
− | 25582842
| |
− |
| |
| | year = 2015 | | | year = 2015 |
− | | + | | author1 = Verbist |
− | | year = 2015 | |
− | | |
− | 2015年
| |
− | | |
− | | last1 = Verbist
| |
− | | |
− | | last1 = Verbist
| |
− | | |
− | 1 Verbist
| |
− | | |
| | first1 = B | | | first1 = B |
− |
| |
− | | first1 = B
| |
− |
| |
− | | first1 b
| |
− |
| |
− | | title = Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project
| |
− |
| |
| | title = Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project | | | title = Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project |
− |
| |
− | | 题目使用转录组学指导药物发现项目的领导优化: 从 QSTAR 项目中学到的经验教训
| |
− |
| |
− | | journal = Drug Discovery Today
| |
− |
| |
| | journal = Drug Discovery Today | | | journal = Drug Discovery Today |
− |
| |
− | 今日药物发现杂志
| |
− |
| |
− | | last2 = Klambauer
| |
− |
| |
| | last2 = Klambauer | | | last2 = Klambauer |
− |
| |
− | 最后2克拉姆鲍尔
| |
− |
| |
− | | first2 = G
| |
− |
| |
| | first2 = G | | | first2 = G |
− |
| |
− | | first2 g
| |
− |
| |
| | last3 = Vervoort | | | last3 = Vervoort |
− |
| |
− | | last3 = Vervoort
| |
− |
| |
− | | last 3 Vervoort
| |
− |
| |
| | first3 = L | | | first3 = L |
− |
| |
− | | first3 = L
| |
− |
| |
− | | first3 l
| |
− |
| |
| | last4 = Talloen | | | last4 = Talloen |
− |
| |
− | | last4 = Talloen
| |
− |
| |
− | 4 Talloen
| |
− |
| |
− | | first4 = W
| |
− |
| |
| | first4 = W | | | first4 = W |
− |
| |
− | | first4 w
| |
− |
| |
− | | last5 = The Qstar
| |
− |
| |
| | last5 = The Qstar | | | last5 = The Qstar |
− |
| |
− | 5 The Qstar
| |
− |
| |
− | | first5 = Consortium
| |
− |
| |
− | | first5 = Consortium
| |
− |
| |
| | first5 = Consortium | | | first5 = Consortium |
− |
| |
| | last6 = Shkedy | | | last6 = Shkedy |
− |
| |
− | | last6 = Shkedy
| |
− |
| |
− | 6 Shkedy
| |
− |
| |
− | | first6 = Z
| |
− |
| |
| | first6 = Z | | | first6 = Z |
− |
| |
− | | first6 z
| |
− |
| |
| | last7 = Thas | | | last7 = Thas |
− |
| |
− | | last7 = Thas
| |
− |
| |
− | | 最后7个
| |
− |
| |
− | | first7 = O
| |
− |
| |
| | first7 = O | | | first7 = O |
− |
| |
− | | first7 o
| |
− |
| |
− | | last8 = Bender
| |
− |
| |
| | last8 = Bender | | | last8 = Bender |
− |
| |
− | 最后8本德尔
| |
− |
| |
| | first8 = A | | | first8 = A |
− |
| |
− | | first8 = A
| |
− |
| |
− | | first8 a
| |
− |
| |
− | | last9 = Göhlmann
| |
− |
| |
| | last9 = Göhlmann | | | last9 = Göhlmann |
− |
| |
− | | last9 = Göhlmann
| |
− |
| |
| | first9 = H. W. | | | first9 = H. W. |
− |
| |
− | | first9 = H. W.
| |
− |
| |
− | 第一个9 h w。
| |
− |
| |
| | last10 = Hochreiter | | | last10 = Hochreiter |
− |
| |
− | | last10 = Hochreiter
| |
− |
| |
− | 10 Hochreiter
| |
− |
| |
| | first10 = S | | | first10 = S |
− |
| |
− | | first10 = S
| |
− |
| |
− | | first10 s
| |
− |
| |
− | | doi = 10.1016/j.drudis.2014.12.014
| |
− |
| |
| | doi = 10.1016/j.drudis.2014.12.014 | | | doi = 10.1016/j.drudis.2014.12.014 |
− |
| |
− | 10.1016 / j.drudis. 2014.12.014
| |
− |
| |
− | | volume=20
| |
− |
| |
| | volume=20 | | | volume=20 |
− |
| |
− | 第20卷
| |
− |
| |
− | | issue = 5
| |
− |
| |
| | issue = 5 | | | issue = 5 |
− |
| |
− | 第五期
| |
− |
| |
| | pages=505–513 | | | pages=505–513 |
| + | }}</ref>研究探讨了利用深度学习来预测生物分子靶标、环境化学品对营养物质、家庭产品和药物的有毒影响。<ref name="TOX21" /><ref name="TOX21Data" /><ref name=":11" /> |
| + | AtomNet是一个基于结构的有效药物设计的深度学习系统。<ref>{{cite arXiv|title = AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery|eprint= 1510.02855|date = 2015-10-09|first = Izhar|last = Wallach|first2 = Michael|last2 = Dzamba|first3 = Abraham|last3 = Heifets|class= cs.LG}}</ref> |
| + | AtomNet用于预测新的候选生物分子,用于治疗诸如埃博拉病毒<ref>{{Cite web|title = Toronto startup has a faster way to discover effective medicines |url= https://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/|website = The Globe and Mail |accessdate= 2015-11-09}}</ref>和多发性硬化症等疾病。<ref>{{Cite web|title = Startup Harnesses Supercomputers to Seek Cures |url= http://ww2.kqed.org/futureofyou/2015/05/27/startup-harnesses-supercomputers-to-seek-cures/|website = KQED Future of You|accessdate = 2015-11-09}}</ref><ref>{{cite web|url=https://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/%5D%20and%20multiple%20sclerosis%20%5B/|title=Toronto startup has a faster way to discover effective medicines|publisher=}}</ref> |
| | | |
− | | pages=505–513
| + | ==== 客户关系管理 ==== |
− | | |
− | 第505-513页
| |
− | | |
− | }}</ref> Research has explored use of deep learning to predict the [[biomolecular target]]s,<ref name="MERCK2012" /><ref name=":5" /> [[off-target]]s, and [[Toxicity|toxic effects]] of environmental chemicals in nutrients, household products and drugs.<ref name="TOX21" /><ref name="TOX21Data" /><ref name=":11" />
| |
− | | |
− | }}</ref> Research has explored use of deep learning to predict the biomolecular targets, off-targets, and toxic effects of environmental chemicals in nutrients, household products and drugs.
| |
− | | |
− | } / ref 研究已经探索了利用深度学习来预测营养物质、家用产品和药物中环境化学物质的生物分子靶点、非靶点和毒性效应。
| |
− | | |
− | | |
− | | |
− | AtomNet is a deep learning system for structure-based [[Drug design|rational drug design]].<ref>{{cite arXiv|title = AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery|eprint= 1510.02855|date = 2015-10-09|first = Izhar|last = Wallach|first2 = Michael|last2 = Dzamba|first3 = Abraham|last3 = Heifets|class= cs.LG}}</ref> AtomNet was used to predict novel candidate biomolecules for disease targets such as the [[Ebola virus]]<ref>{{Cite news|title = Toronto startup has a faster way to discover effective medicines |url= https://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/|website = The Globe and Mail |accessdate= 2015-11-09}}</ref> and [[multiple sclerosis]].<ref>{{Cite web|title = Startup Harnesses Supercomputers to Seek Cures |url= http://ww2.kqed.org/futureofyou/2015/05/27/startup-harnesses-supercomputers-to-seek-cures/|website = KQED Future of You|accessdate = 2015-11-09}}</ref><ref>{{cite web|url=https://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/%5D%20and%20multiple%20sclerosis%20%5B/|title=Toronto startup has a faster way to discover effective medicines}}</ref>
| |
− | | |
− | AtomNet is a deep learning system for structure-based rational drug design. AtomNet was used to predict novel candidate biomolecules for disease targets such as the Ebola virus and multiple sclerosis.
| |
− | | |
− | Atomnet 是一个基于结构的合理药物设计的深度学习系统。原子网络被用来预测像埃博拉病毒和多发性硬化症这样的疾病靶标的新的候选生物分子。
| |
− | | |
− | | |
− | | |
− | In 2019 generative neural networks were used to produce molecules that were validated experimentally all the way into mice.<ref>{{cite journal |last1=Zhavoronkov |first1=Alex|date=2019|title=Deep learning enables rapid identification of potent DDR1 kinase inhibitors |journal=Nature Biotechnology |volume=37|issue=9|pages=1038–1040|doi=10.1038/s41587-019-0224-x |pmid=31477924|url=https://www.semanticscholar.org/paper/d44ac0a7fd4734187bccafc4a2771027b8bb595e}}</ref><ref>{{cite journal |last1=Gregory |first1=Barber |title=A Molecule Designed By AI Exhibits 'Druglike' Qualities |url=https://www.wired.com/story/molecule-designed-ai-exhibits-druglike-qualities/ |journal=Wired}}</ref>
| |
− | | |
− | In 2019 generative neural networks were used to produce molecules that were validated experimentally all the way into mice.
| |
− | | |
− | 2019年,生成神经网络被用于制造分子,这些分子在小鼠体内得到了实验验证。
| |
− | | |
− | | |
− | | |
− | === Customer relationship management ===
| |
− | | |
− | {{Main|Customer relationship management}}
| |
− | | |
− | Deep reinforcement learning has been used to approximate the value of possible [[direct marketing]] actions, defined in terms of [[RFM (customer value)|RFM]] variables. The estimated value function was shown to have a natural interpretation as [[customer lifetime value]].<ref>{{cite arxiv|last=Tkachenko |first=Yegor |title=Autonomous CRM Control via CLV Approximation with Deep Reinforcement Learning in Discrete and Continuous Action Space |date=April 8, 2015 |eprint=1504.01840|class=cs.LG }}</ref>
| |
− | | |
− | Deep reinforcement learning has been used to approximate the value of possible direct marketing actions, defined in terms of RFM variables. The estimated value function was shown to have a natural interpretation as customer lifetime value.
| |
− | | |
− | 深强化学习已经被用来近似的价值可能的直接营销行动,根据 RFM 变量的定义。估计的价值函数作为客户生命周期价值有一个自然的解释。
| |
− | | |
− | | |
− | | |
− | === Recommendation systems ===
| |
− | | |
− | {{Main|Recommender system}}
| |
− | | |
− | Recommendation systems have used deep learning to extract meaningful features for a latent factor model for content-based music and journal recommendations.<ref>{{Cite book|url=http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf|title=Advances in Neural Information Processing Systems 26|last=van den Oord|first=Aaron|last2=Dieleman|first2=Sander|last3=Schrauwen|first3=Benjamin|date=2013|publisher=Curran Associates, Inc.|editor-last=Burges|editor-first=C. J. C.|pages=2643–2651|editor-last2=Bottou|editor-first2=L.|editor-last3=Welling|editor-first3=M.|editor-last4=Ghahramani|editor-first4=Z.|editor-last5=Weinberger|editor-first5=K. Q.}}</ref><ref>X.Y. Feng, H. Zhang, Y.J. Ren, P.H. Shang, Y. Zhu, Y.C. Liang, R.C. Guan, D. Xu, (2019), "[https://www.jmir.org/2019/5/e12957/ The Deep Learning–Based Recommender System “Pubmender” for Choosing a Biomedical Publication Venue: Development and Validation Study]", ''[[Journal of Medical Internet Research]]'', 21 (5): e12957</ref> Multi-view deep learning has been applied for learning user preferences from multiple domains.<ref>{{Cite journal|last=Elkahky|first=Ali Mamdouh|last2=Song|first2=Yang|last3=He|first3=Xiaodong|date=2015-05-01|title=A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems|url=https://www.microsoft.com/en-us/research/publication/a-multi-view-deep-learning-approach-for-cross-domain-user-modeling-in-recommendation-systems/|journal=Microsoft Research}}</ref> The model uses a hybrid collaborative and content-based approach and enhances recommendations in multiple tasks.
| |
| | | |
− | Recommendation systems have used deep learning to extract meaningful features for a latent factor model for content-based music and journal recommendations. Multi-view deep learning has been applied for learning user preferences from multiple domains. The model uses a hybrid collaborative and content-based approach and enhances recommendations in multiple tasks.
| + | 深度强化学习以及被用来评估可能的直接营销的行动结果,这些行动用RFM变量来定义,预测值的函数也证明有客户终生价值的自然解释。<ref>{{cite arxiv|last=Tkachenko |first=Yegor |title=Autonomous CRM Control via CLV Approximation with Deep Reinforcement Learning in Discrete and Continuous Action Space |date=April 8, 2015 |eprint=1504.01840|class=cs.LG }}</ref> |
| | | |
− | 推荐系统使用深度学习为基于内容的音乐和期刊推荐的潜在因素模型提取有意义的特征。多视角深度学习已被应用于从多个领域学习用户偏好。该模型采用了一种混合的协作和基于内容的方法,增强了对多项任务的建议。
| + | ==== 推荐系统 ==== |
| | | |
| + | 推荐系统已经使用深度学习来为基于内容的音乐推荐的潜在因素模型提取有意义的特征。<ref>{{Cite book|url=http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf|title=Advances in Neural Information Processing Systems 26|last=van den Oord|first=Aaron|last2=Dieleman|first2=Sander|last3=Schrauwen|first3=Benjamin|date=2013|publisher=Curran Associates, Inc.|editor-last=Burges|editor-first=C. J. C.|pages=2643–2651|editor-last2=Bottou|editor-first2=L.|editor-last3=Welling|editor-first3=M.|editor-last4=Ghahramani|editor-first4=Z.|editor-last5=Weinberger|editor-first5=K. Q.}}</ref>多视图深度学习已被应用于从多个领域学习用户偏好。<ref>{{Cite journal|last=Elkahky|first=Ali Mamdouh|last2=Song|first2=Yang|last3=He|first3=Xiaodong|date=2015-05-01|title=A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems|url=https://www.microsoft.com/en-us/research/publication/a-multi-view-deep-learning-approach-for-cross-domain-user-modeling-in-recommendation-systems/|journal=Microsoft Research}}</ref>该模型采用了一种基于内容的混合协作方式,并在多项任务中加强了推荐。 |
| | | |
| + | ==== 生物信息学 ==== |
| | | |
− | === Bioinformatics === | + | 生物信息学使用了一个自编码的ANN来预测基因本体的解释和功能关系。<ref>{{cite journal|url=http://doi.acm.org/10.1145/2649387.2649442|title=Deep Autoencoder Neural Networks for Gene Ontology Annotation Predictions |first1=Davide |last1=Chicco|first2=Peter|last2=Sadowski|first3=Pierre |last3=Baldi |date=1 January 2014|publisher=ACM|pages=533–540|via=ACM Digital Library |doi=10.1145/2649387.2649442|journal=Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '14|isbn=9781450328944 }}</ref> |
| | | |
− | {{Main|Bioinformatics}} | + | 在医学信息学中,深度学习基于可穿戴设备的数据<ref>{{Cite journal|last=Sathyanarayana|first=Aarti|date=2016-01-01|title=Sleep Quality Prediction From Wearable Data Using Deep Learning|url=http://doi.org/10.2196/mhealth.6562|journal=JMIR mHealth and uHealth|volume=4|issue=4|doi=10.2196/mhealth.6562|pmid=27815231|pmc=5116102|pages=e125}}</ref>和电子健康记录数据来对并发症进行预测,深度学习已被用来预测睡眠质量。<ref>{{Cite journal|last=Choi|first=Edward|last2=Schuetz|first2=Andy|last3=Stewart|first3=Walter F.|last4=Sun|first4=Jimeng|date=2016-08-13|title=Using recurrent neural network models for early detection of heart failure onset|url=http://jamia.oxfordjournals.org/content/early/2016/08/13/jamia.ocw112|journal=Journal of the American Medical Informatics Association|volume=24|issue=2|pages=361–370|doi=10.1093/jamia/ocw112|issn=1067-5027|pmid=27521897|pmc=5391725}}</ref>深度学习在医疗保健方面也显示出了效果。<ref>{{Cite web|url=https://medium.com/the-mission/deep-learning-in-healthcare-challenges-and-opportunities-d2eee7e2545|title=Deep Learning in Healthcare: Challenges and Opportunities|last=Startups|first=Requests for|date=2016-08-12|website=Medium|access-date=2018-04-10}}</ref> |
| | | |
− | An [[autoencoder]] ANN was used in [[bioinformatics]], to predict [[Gene Ontology|gene ontology]] annotations and gene-function relationships.<ref>{{cite book|title=Deep Autoencoder Neural Networks for Gene Ontology Annotation Predictions |first1=Davide |last1=Chicco|first2=Peter|last2=Sadowski|first3=Pierre |last3=Baldi |date=1 January 2014|publisher=ACM|pages=533–540|doi=10.1145/2649387.2649442|journal=Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '14|isbn=9781450328944 |hdl = 11311/964622|url=https://www.semanticscholar.org/paper/09f3132fdf103bdef1125ffbccb8b46f921b2ab7 }}</ref>
| + | ==== 移动设备广告 ==== |
| | | |
− | An autoencoder ANN was used in bioinformatics, to predict gene ontology annotations and gene-function relationships.
| + | 为移动设备广告寻找合适的移动设备用户总是非常具有挑战性,因为许多数据点必须被考虑和同化在广告商创建和使用目标片段之前。<ref>{{cite journal |
| + | |title=Using Deep Learning Neural Networks To Find Best Performing Audience Segments |
| + | |url=http://www.ijstr.org/final-print/apr2016/Using-Deep-Learning-Neural-Networks-To-Find-Best-Performing-Audience-Segments.pdf |
| + | |journal=IJSTR |
| + | |volume=5 |
| + | |issue=4}}</ref><ref>{{cite journal|last1=De|first1=Shaunak|last2=Maity|first2=Abhishek|last3=Goel|first3=Vritti|last4=Shitole|first4=Sanjay|last5=Bhattacharya|first5=Avik|title=Predicting the popularity of instagram posts for a lifestyle magazine using deep learning|journal=2nd IEEE Conference on Communication Systems, Computing and IT Applications |
| + | |pages=174–177|doi=10.1109/CSCITA.2017.8066548|url=https://ieeexplore.ieee.org/document/8066548|year=2017|isbn=978-1-5090-4381-1}}</ref> |
| + | 深度学习以及被用来解释大量和多维度广告数据集。许多数据点实在请求/服务/点击互联网广告的周期中收集得到的。这些信息可以构成及其学习的基础,以改善广告的投放。 |
| | | |
− | 将自动编码人工神经网络应用于生物信息学,预测基因本体和基因功能关系。
| + | ==== 图像复原 ==== |
| | | |
| + | 深学习已成功地应用于去噪,超分辨率,图像增强等逆问题。<ref>{{cite conference | url= http://research.uweschmidt.org/pubs/cvpr14schmidt.pdf |first1= Uwe |last1= Schmidt |first2= Stefan |last2= Roth |conference= Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on |title= Shrinkage Fields for Effective Image Restoration }}</ref>这些应用包括了“有效图像复原的收缩域”等学习方法,这些方法在图像数据集上进行训练以及Deep Image Prior,训练在需要复原的图像上面。 |
| | | |
| + | ==== 金融欺诈识别 ==== |
| | | |
− | In medical informatics, deep learning was used to predict sleep quality based on data from wearables<ref>{{Cite journal|last=Sathyanarayana|first=Aarti|date=2016-01-01|title=Sleep Quality Prediction From Wearable Data Using Deep Learning|journal=JMIR mHealth and uHealth|volume=4|issue=4|doi=10.2196/mhealth.6562|pmid=27815231|pmc=5116102|pages=e125|url=https://www.semanticscholar.org/paper/c82884f9d6d39c8a89ac46b8f688669fb2931144}}</ref> and predictions of health complications from [[electronic health record]] data.<ref>{{Cite journal|last=Choi|first=Edward|last2=Schuetz|first2=Andy|last3=Stewart|first3=Walter F.|last4=Sun|first4=Jimeng|date=2016-08-13|title=Using recurrent neural network models for early detection of heart failure onset|url=http://jamia.oxfordjournals.org/content/early/2016/08/13/jamia.ocw112|journal=Journal of the American Medical Informatics Association|volume=24|issue=2|pages=361–370|doi=10.1093/jamia/ocw112|issn=1067-5027|pmid=27521897|pmc=5391725}}</ref> Deep learning has also showed efficacy in [[Artificial intelligence in healthcare|healthcare]].<ref>{{Cite web|url=https://medium.com/the-mission/deep-learning-in-healthcare-challenges-and-opportunities-d2eee7e2545|title=Deep Learning in Healthcare: Challenges and Opportunities|date=2016-08-12|website=Medium|access-date=2018-04-10}}</ref>
| + | 深度学习正成功地应用于金融欺诈识别和反洗钱活动。“深度反洗钱识别系统”可以发现和识别数据间的相似性,并在之后的过程中发现异常现象或对具体事件进行分类和预测。这个解决方案利用了有监督的学习技术,如可疑交易的分类,以及非监督学习,如异常检测。<ref>{{cite journal |
− | | |
− | In medical informatics, deep learning was used to predict sleep quality based on data from wearables and predictions of health complications from electronic health record data. Deep learning has also showed efficacy in healthcare.
| |
− | | |
− | 在医学信息学中,深度学习被用来根据可穿戴设备的数据和电子健康记录数据中健康并发症的预测来预测睡眠质量。深度学习在医疗保健方面也显示出了效果。
| |
− | | |
− | | |
− | | |
− | === Medical Image Analysis ===
| |
− | | |
− | Deep learning has been shown to produce competitive results in medical application such as cancer cell classification, lesion detection, organ segmentation and image enhancement<ref>{{Cite journal|last=Litjens|first=Geert|last2=Kooi|first2=Thijs|last3=Bejnordi|first3=Babak Ehteshami|last4=Setio|first4=Arnaud Arindra Adiyoso|last5=Ciompi|first5=Francesco|last6=Ghafoorian|first6=Mohsen|last7=van der Laak|first7=Jeroen A.W.M.|last8=van Ginneken|first8=Bram|last9=Sánchez|first9=Clara I.|date=December 2017|title=A survey on deep learning in medical image analysis|journal=Medical Image Analysis|volume=42|pages=60–88|doi=10.1016/j.media.2017.07.005|pmid=28778026|arxiv=1702.05747|bibcode=2017arXiv170205747L|url=https://www.semanticscholar.org/paper/2abde28f75a9135c8ed7c50ea16b7b9e49da0c09}}</ref><ref>{{Cite book |doi=10.1109/ICCVW.2017.18|isbn=9781538610343|chapter=Deep Convolutional Neural Networks for Detecting Cellular Changes Due to Malignancy|title=2017 IEEE International Conference on Computer Vision Workshops (ICCVW)|pages=82–89|year=2017|last1=Forslid|first1=Gustav|last2=Wieslander|first2=Hakan|last3=Bengtsson|first3=Ewert|last4=Wahlby|first4=Carolina|last5=Hirsch|first5=Jan-Michael|last6=Stark|first6=Christina Runow|last7=Sadanandan|first7=Sajith Kecheril|chapter-url=http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-326160|url=https://www.semanticscholar.org/paper/6ae67bb4528bd5d922fd5a0c1a180ff1940f803c}}</ref>
| |
− | | |
− | Deep learning has been shown to produce competitive results in medical application such as cancer cell classification, lesion detection, organ segmentation and image enhancement
| |
− | | |
− | 深度学习在肿瘤细胞分类、病变检测、器官分割和图像增强等医学应用领域具有广阔的应用前景
| |
− | | |
− | | |
− | | |
− | === Mobile advertising ===
| |
− | | |
− | Finding the appropriate mobile audience for [[mobile advertising]] is always challenging, since many data points must be considered and analyzed before a target segment can be created and used in ad serving by any ad server.<ref>{{cite book |doi=10.1109/CSCITA.2017.8066548 |isbn=978-1-5090-4381-1|chapter=Predicting the popularity of instagram posts for a lifestyle magazine using deep learning|title=2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA)|pages=174–177|year=2017|last1=De|first1=Shaunak|last2=Maity|first2=Abhishek|last3=Goel|first3=Vritti|last4=Shitole|first4=Sanjay|last5=Bhattacharya|first5=Avik|chapter-url=https://www.semanticscholar.org/paper/c4389f8a63a7be58e007c183a49e491141f9e204}}</ref> Deep learning has been used to interpret large, many-dimensioned advertising datasets. Many data points are collected during the request/serve/click internet advertising cycle. This information can form the basis of machine learning to improve ad selection.
| |
− | | |
− | Finding the appropriate mobile audience for mobile advertising is always challenging, since many data points must be considered and analyzed before a target segment can be created and used in ad serving by any ad server. Deep learning has been used to interpret large, many-dimensioned advertising datasets. Many data points are collected during the request/serve/click internet advertising cycle. This information can form the basis of machine learning to improve ad selection.
| |
− | | |
− | 为移动广告寻找合适的移动受众总是具有挑战性的,因为在任何广告服务器创建和使用目标细分之前,必须考虑和分析许多数据点。深度学习已经被用来解释大型的、多维的广告数据集。许多数据点是在请求 / 服务 / 点击互联网广告周期中收集的。这些信息可以作为机器学习改进广告选择的基础。
| |
− | | |
− | | |
− | | |
− | === Image restoration ===
| |
− | | |
− | Deep learning has been successfully applied to [[inverse problems]] such as [[denoising]], [[super-resolution]], [[inpainting]], and [[film colorization]].<ref>{{Cite web|url=https://blog.floydhub.com/colorizing-and-restoring-old-images-with-deep-learning/|title=Colorizing and Restoring Old Images with Deep Learning|date=2018-11-13|website=FloydHub Blog|language=en|access-date=2019-10-11}}</ref> These applications include learning methods such as "Shrinkage Fields for Effective Image Restoration"<ref>{{cite conference | url= http://research.uweschmidt.org/pubs/cvpr14schmidt.pdf |first1= Uwe |last1= Schmidt |first2= Stefan |last2= Roth |conference= Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on |title= Shrinkage Fields for Effective Image Restoration }}</ref> which trains on an image dataset, and [[Deep Image Prior]], which trains on the image that needs restoration.
| |
− | | |
− | Deep learning has been successfully applied to inverse problems such as denoising, super-resolution, inpainting, and film colorization. These applications include learning methods such as "Shrinkage Fields for Effective Image Restoration" which trains on an image dataset, and Deep Image Prior, which trains on the image that needs restoration.
| |
− | | |
− | 深度学习已成功地应用于反问题,如去噪,超分辨率,修补,和电影着色。这些应用包括一些学习方法,比如在图像数据集上训练的“有效影像复原收缩域” ,以及训练需要恢复的图像的“深度图像优先”。
| |
− | | |
− | | |
− | | |
− | === Financial fraud detection ===
| |
− | | |
− | Deep learning is being successfully applied to financial [[fraud detection]] and anti-money laundering. "Deep anti-money laundering detection system can spot and recognize relationships and similarities between data and, further down the road, learn to detect anomalies or classify and predict specific events". The solution leverages both supervised learning techniques, such as the classification of suspicious transactions, and unsupervised learning, e.g. anomaly detection.
| |
− | | |
− | Deep learning is being successfully applied to financial fraud detection and anti-money laundering. "Deep anti-money laundering detection system can spot and recognize relationships and similarities between data and, further down the road, learn to detect anomalies or classify and predict specific events". The solution leverages both supervised learning techniques, such as the classification of suspicious transactions, and unsupervised learning, e.g. anomaly detection.
| |
− | | |
− | 深度学习正被成功地应用于金融欺诈侦查和反洗钱。”深度反洗钱侦测系统能够发现和识别数据之间的关系和相似之处,并在今后学会侦测异常现象或对具体事件进行分类和预测”。该解决方案利用了监督式学习交易管理技术,如可疑交易的分类,以及非监督式学习交易管理技术。异常检测。
| |
− | | |
− | <ref>{{cite journal
| |
− | | |
− | <ref>{{cite journal | |
− | | |
− | 这是一个很好的例子
| |
− | | |
| |first=Tomasz |last=Czech | | |first=Tomasz |last=Czech |
− |
| |
− | |first=Tomasz |last=Czech
| |
− |
| |
− | 最后一个捷克
| |
− |
| |
| |title=Deep learning: the next frontier for money laundering detection | | |title=Deep learning: the next frontier for money laundering detection |
− |
| |
− | |title=Deep learning: the next frontier for money laundering detection
| |
− |
| |
− | 深度学习: 洗钱探测的下一个前沿
| |
− |
| |
| |url=https://www.globalbankingandfinance.com/deep-learning-the-next-frontier-for-money-laundering-detection/ | | |url=https://www.globalbankingandfinance.com/deep-learning-the-next-frontier-for-money-laundering-detection/ |
− |
| |
− | |url=https://www.globalbankingandfinance.com/deep-learning-the-next-frontier-for-money-laundering-detection/
| |
− |
| |
− | Https://www.globalbankingandfinance.com/deep-learning-the-next-frontier-for-money-laundering-detection/
| |
− |
| |
− | |journal=Global Banking and Finance Review
| |
− |
| |
| |journal=Global Banking and Finance Review | | |journal=Global Banking and Finance Review |
− | | + | }}</ref><ref name=":12">{{Cite web|url=https://www.eurekalert.org/pub_releases/2018-02/uarl-ard020218.php|title=Army researchers develop new algorithms to train robots|website=EurekAlert!|language=en|access-date=2018-08-29}}</ref> |
− | 全球银行和金融评论
| |
− | | |
− | }}</ref> | |
− | | |
− | }}</ref>
| |
− | | |
− | {} / ref | |
− | | |
− | | |
| | | |
| === Military === | | === Military === |
第1,265行: |
第263行: |
| | | |
| | | |
− | == Relation to human cognitive and brain development == | + | === 商业活动 === |
− | | |
− | Deep learning is closely related to a class of theories of [[brain development]] (specifically, neocortical development) proposed by [[cognitive neuroscientist]]s in the early 1990s.<ref name="UTGOFF">{{cite journal | last1 = Utgoff | first1 = P. E. | last2 = Stracuzzi | first2 = D. J. | year = 2002 | title = Many-layered learning | url= https://www.semanticscholar.org/paper/398c477f674b228fec7f3f418a8cec047e2dafe5| journal = Neural Computation | volume = 14 | issue = 10| pages = 2497–2529 | doi=10.1162/08997660260293319| pmid = 12396572 }}</ref><ref name="ELMAN">{{cite book|url={{google books |plainurl=y |id=vELaRu_MrwoC}}|title=Rethinking Innateness: A Connectionist Perspective on Development|last=Elman|first=Jeffrey L.|publisher=MIT Press|year=1998|isbn=978-0-262-55030-7}}</ref><ref name="SHRAGER">{{cite journal | last1 = Shrager | first1 = J. | last2 = Johnson | first2 = MH | year = 1996 | title = Dynamic plasticity influences the emergence of function in a simple cortical array | url= | journal = Neural Networks | volume = 9 | issue = 7| pages = 1119–1129 | doi=10.1016/0893-6080(96)00033-0| pmid = 12662587 }}</ref><ref name="QUARTZ">{{cite journal | last1 = Quartz | first1 = SR | last2 = Sejnowski | first2 = TJ | year = 1997 | title = The neural basis of cognitive development: A constructivist manifesto | url= | journal = Behavioral and Brain Sciences | volume = 20 | issue = 4| pages = 537–556 | doi=10.1017/s0140525x97001581| pmid = 10097006 | citeseerx = 10.1.1.41.7854 }}</ref> These developmental theories were instantiated in computational models, making them predecessors of deep learning systems. These developmental models share the property that various proposed learning dynamics in the brain (e.g., a wave of [[nerve growth factor]]) support the [[self-organization]] somewhat analogous to the neural networks utilized in deep learning models. Like the [[neocortex]], neural networks employ a hierarchy of layered filters in which each layer considers information from a prior layer (or the operating environment), and then passes its output (and possibly the original input), to other layers. This process yields a self-organizing stack of [[transducer]]s, well-tuned to their operating environment. A 1995 description stated, "...the infant's brain seems to organize itself under the influence of waves of so-called trophic-factors ... different regions of the brain become connected sequentially, with one layer of tissue maturing before another and so on until the whole brain is mature."<ref name="BLAKESLEE">S. Blakeslee., "In brain's early growth, timetable may be critical," ''The New York Times, Science Section'', pp. B5–B6, 1995.</ref>
| |
− | | |
− | Deep learning is closely related to a class of theories of brain development (specifically, neocortical development) proposed by cognitive neuroscientists in the early 1990s. These developmental theories were instantiated in computational models, making them predecessors of deep learning systems. These developmental models share the property that various proposed learning dynamics in the brain (e.g., a wave of nerve growth factor) support the self-organization somewhat analogous to the neural networks utilized in deep learning models. Like the neocortex, neural networks employ a hierarchy of layered filters in which each layer considers information from a prior layer (or the operating environment), and then passes its output (and possibly the original input), to other layers. This process yields a self-organizing stack of transducers, well-tuned to their operating environment. A 1995 description stated, "...the infant's brain seems to organize itself under the influence of waves of so-called trophic-factors ... different regions of the brain become connected sequentially, with one layer of tissue maturing before another and so on until the whole brain is mature."
| |
− | | |
− | 深度学习与上世纪90年代早期认知神经科学家提出的一类大脑发育理论(特别是新皮层发育理论)密切相关。这些发展理论在计算模型中被实例化,使它们成为深度学习系统的前辈。这些发展模型都有一个共同的特性,那就是大脑中各种被提出的学习动力学(例如,神经生长因子的波动)支持着自我组织神经网络,有点类似于深度学习模型中使用的神经网络。与新皮层一样,神经网络采用了一个层次化的过滤器,其中每一层考虑来自前一层(或操作环境)的信息,然后将其输出(可能还有原始输入)传递到其他层。这个过程产生一个自组织堆栈的传感器,很好地调整到他们的操作环境。一份1995年的描述说,“ ... 婴儿的大脑似乎在所谓的营养因子波的影响下自我组织... 大脑的不同区域依次连接起来,一层组织先于另一层组织成熟,以此类推,直到整个大脑成熟。”
| |
− | | |
− | | |
− | | |
− | A variety of approaches have been used to investigate the plausibility of deep learning models from a neurobiological perspective. On the one hand, several variants of the [[backpropagation]] algorithm have been proposed in order to increase its processing realism.<ref>{{Cite journal|last=Mazzoni|first=P.|last2=Andersen|first2=R. A.|last3=Jordan|first3=M. I.|date=1991-05-15|title=A more biologically plausible learning rule for neural networks.|journal=Proceedings of the National Academy of Sciences|volume=88|issue=10|pages=4433–4437|doi=10.1073/pnas.88.10.4433|issn=0027-8424|pmid=1903542|pmc=51674|bibcode=1991PNAS...88.4433M}}</ref><ref>{{Cite journal|last=O'Reilly|first=Randall C.|date=1996-07-01|title=Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm|journal=Neural Computation|volume=8|issue=5|pages=895–938|doi=10.1162/neco.1996.8.5.895|issn=0899-7667|url=https://www.semanticscholar.org/paper/ed9133009dd451bd64215cca7deba6e0b8d7c7b1}}</ref> Other researchers have argued that unsupervised forms of deep learning, such as those based on hierarchical [[generative model]]s and [[deep belief network]]s, may be closer to biological reality.<ref>{{Cite journal|last=Testolin|first=Alberto|last2=Zorzi|first2=Marco|date=2016|title=Probabilistic Models and Generative Neural Networks: Towards an Unified Framework for Modeling Normal and Impaired Neurocognitive Functions|journal=Frontiers in Computational Neuroscience|volume=10|pages=73|doi=10.3389/fncom.2016.00073|pmid=27468262|pmc=4943066|issn=1662-5188|url=https://www.semanticscholar.org/paper/9ff36a621ee2c831fbbda5b719942f9ed8ac844f}}</ref><ref>{{Cite journal|last=Testolin|first=Alberto|last2=Stoianov|first2=Ivilin|last3=Zorzi|first3=Marco|date=September 2017|title=Letter perception emerges from unsupervised deep learning and recycling of natural image features|journal=Nature Human Behaviour|volume=1|issue=9|pages=657–664|doi=10.1038/s41562-017-0186-2|pmid=31024135|issn=2397-3374|url=https://www.semanticscholar.org/paper/ec2463bd610dcb30d67681160e895761e2dde482}}</ref> In this respect, generative neural network models have been related to neurobiological evidence about sampling-based processing in the cerebral cortex.<ref>{{Cite journal|last=Buesing|first=Lars|last2=Bill|first2=Johannes|last3=Nessler|first3=Bernhard|last4=Maass|first4=Wolfgang|date=2011-11-03|title=Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons|journal=PLOS Computational Biology|volume=7|issue=11|pages=e1002211|doi=10.1371/journal.pcbi.1002211|pmid=22096452|pmc=3207943|issn=1553-7358|bibcode=2011PLSCB...7E2211B|url=https://www.semanticscholar.org/paper/e4e100e44bf7618c7d96188605fd9870012bdb50}}</ref>
| |
− | | |
− | A variety of approaches have been used to investigate the plausibility of deep learning models from a neurobiological perspective. On the one hand, several variants of the backpropagation algorithm have been proposed in order to increase its processing realism. Other researchers have argued that unsupervised forms of deep learning, such as those based on hierarchical generative models and deep belief networks, may be closer to biological reality. In this respect, generative neural network models have been related to neurobiological evidence about sampling-based processing in the cerebral cortex.
| |
− | | |
− | 从神经生物学的角度来研究深度学习模型的可行性已经被使用了各种各样的方法。一方面,提出了几种不同的反向传播算法,以增加其处理的真实性。其他研究人员认为,无监督的深度学习形式,比如基于层次生成模型和深度信念网络的深度学习形式,可能更接近生物现实。在这方面,生成神经网络模型已经与神经生物学证据的采样为基础的处理在大脑皮层。
| |
− | | |
− | | |
− | | |
− | Although a systematic comparison between the human brain organization and the neuronal encoding in deep networks has not yet been established, several analogies have been reported. For example, the computations performed by deep learning units could be similar to those of actual neurons<ref>{{Cite journal|last=Morel|first=Danielle|last2=Singh|first2=Chandan|last3=Levy|first3=William B.|date=2018-01-25|title=Linearization of excitatory synaptic integration at no extra cost|journal=Journal of Computational Neuroscience|volume=44|issue=2|pages=173–188|doi=10.1007/s10827-017-0673-5|pmid=29372434|issn=0929-5313|url=https://www.semanticscholar.org/paper/3a528f2cde957d4e6417651f8005ca2ee81ca367}}</ref><ref>{{Cite journal|last=Cash|first=S.|last2=Yuste|first2=R.|date=February 1999|title=Linear summation of excitatory inputs by CA1 pyramidal neurons|journal=Neuron|volume=22|issue=2|pages=383–394|issn=0896-6273|pmid=10069343|doi=10.1016/s0896-6273(00)81098-3}}</ref> and neural populations.<ref>{{Cite journal|date=2004-08-01|title=Sparse coding of sensory inputs|journal=Current Opinion in Neurobiology|volume=14|issue=4|pages=481–487|doi=10.1016/j.conb.2004.07.007|pmid=15321069|issn=0959-4388 | last1 = Olshausen | first1 = B | last2 = Field | first2 = D|url=https://www.semanticscholar.org/paper/0dd289358b14f8176adb7b62bf2fb53ea62b3818}}</ref> Similarly, the representations developed by deep learning models are similar to those measured in the primate visual system<ref>{{Cite journal|last=Yamins|first=Daniel L K|last2=DiCarlo|first2=James J|date=March 2016|title=Using goal-driven deep learning models to understand sensory cortex|journal=Nature Neuroscience|volume=19|issue=3|pages=356–365|doi=10.1038/nn.4244|pmid=26906502|issn=1546-1726|url=https://www.semanticscholar.org/paper/94c4ba7246f781632aa68ca5b1acff0fdbb2d92f}}</ref> both at the single-unit<ref>{{Cite journal|last=Zorzi|first=Marco|last2=Testolin|first2=Alberto|date=2018-02-19|title=An emergentist perspective on the origin of number sense|journal=Phil. Trans. R. Soc. B|volume=373|issue=1740|pages=20170043|doi=10.1098/rstb.2017.0043|issn=0962-8436|pmid=29292348|pmc=5784047|url=https://www.semanticscholar.org/paper/c91db0c8349a78384f54c6a9a98370f5c9381b6c}}</ref> and at the population<ref>{{Cite journal|last=Güçlü|first=Umut|last2=van Gerven|first2=Marcel A. J.|date=2015-07-08|title=Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream|journal=Journal of Neuroscience|volume=35|issue=27|pages=10005–10014|doi=10.1523/jneurosci.5023-14.2015|pmid=26157000|pmc=6605414|arxiv=1411.6422}}</ref> levels.
| |
− | | |
− | Although a systematic comparison between the human brain organization and the neuronal encoding in deep networks has not yet been established, several analogies have been reported. For example, the computations performed by deep learning units could be similar to those of actual neurons and neural populations. Similarly, the representations developed by deep learning models are similar to those measured in the primate visual system both at the single-unit and at the population levels.
| |
− | | |
− | 虽然人类大脑组织和神经元编码的深层网络之间的系统性比较还没有建立,但已经有几个类似的报道。例如,由深度学习单元执行的计算可能与实际的神经元和神经元群体的计算相似。类似地,深度学习模型在单个单位和人群水平上的表征与灵长类动物视觉系统的测量结果相似。
| |
− | | |
− | | |
− | | |
− | == Commercial activity ==
| |
− | | |
− | [[Facebook]]'s AI lab performs tasks such as [[Automatic image annotation|automatically tagging uploaded pictures]] with the names of the people in them.<ref name="METZ2013">{{cite magazine|first=C. |last=Metz |title=Facebook's 'Deep Learning' Guru Reveals the Future of AI |url=https://www.wired.com/wiredenterprise/2013/12/facebook-yann-lecun-qa/ |magazine=Wired |date=12 December 2013}}</ref>
| |
− | | |
− | Facebook's AI lab performs tasks such as automatically tagging uploaded pictures with the names of the people in them.
| |
− | | |
− | Facebook 的人工智能实验室执行的任务包括自动为上传的图片加上人员姓名。
| |
− | | |
− | | |
− | | |
− | Google's [[DeepMind Technologies]] developed a system capable of learning how to play [[Atari]] video games using only pixels as data input. In 2015 they demonstrated their [[AlphaGo]] system, which learned the game of [[Go (game)|Go]] well enough to beat a professional Go player.<ref>{{Cite web|title = Google AI algorithm masters ancient game of Go |url= http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234|website = Nature News & Comment|accessdate = 2016-01-30}}</ref><ref>{{Cite journal|title = Mastering the game of Go with deep neural networks and tree search|journal = [[Nature (journal)|Nature]]| issn= 0028-0836|pages = 484–489|volume = 529|issue = 7587|doi = 10.1038/nature16961|pmid = 26819042|first1 = David|last1 = Silver|author-link1=David Silver (programmer)|first2 = Aja|last2 = Huang|author-link2=Aja Huang|first3 = Chris J.|last3 = Maddison|first4 = Arthur|last4 = Guez|first5 = Laurent|last5 = Sifre|first6 = George van den|last6 = Driessche|first7 = Julian|last7 = Schrittwieser|first8 = Ioannis|last8 = Antonoglou|first9 = Veda|last9 = Panneershelvam|first10= Marc|last10= Lanctot|first11= Sander|last11= Dieleman|first12=Dominik|last12= Grewe|first13= John|last13= Nham|first14= Nal|last14= Kalchbrenner|first15= Ilya|last15= Sutskever|author-link15=Ilya Sutskever|first16= Timothy|last16= Lillicrap|first17= Madeleine|last17= Leach|first18= Koray|last18= Kavukcuoglu|first19= Thore|last19= Graepel|first20= Demis |last20=Hassabis|author-link20=Demis Hassabis|date= 28 January 2016|bibcode = 2016Natur.529..484S|url = https://www.semanticscholar.org/paper/846aedd869a00c09b40f1f1f35673cb22bc87490}}{{closed access}}</ref><ref>{{Cite web|title = A Google DeepMind Algorithm Uses Deep Learning and More to Master the Game of Go {{!}} MIT Technology Review |url= http://www.technologyreview.com/news/546066/googles-ai-masters-the-game-of-go-a-decade-earlier-than-expected/|website = MIT Technology Review|accessdate = 2016-01-30}}</ref> [[Google Translate]] uses a neural network to translate between more than 100 languages.
| |
− | | |
− | Google's DeepMind Technologies developed a system capable of learning how to play Atari video games using only pixels as data input. In 2015 they demonstrated their AlphaGo system, which learned the game of Go well enough to beat a professional Go player. Google Translate uses a neural network to translate between more than 100 languages.
| |
− | | |
− | 谷歌 Google DeepMind 开发了一个系统,能够学习如何使用像素作为数据输入来玩雅达利的视频游戏。2015年,他们展示了自己的 AlphaGo 系统,这个系统学得很好,足以击败一个职业围棋选手。谷歌翻译使用一个神经网络来翻译100多种语言。
| |
− | | |
− | | |
− | | |
− | In 2015, [[Blippar]] demonstrated a mobile [[augmented reality]] application that uses deep learning to recognize objects in real time.<ref>{{Cite web|title=Blippar Demonstrates New Real-Time Augmented Reality App|url=https://techcrunch.com/2015/12/08/blippar-demonstrates-new-real-time-augmented-reality-app/|website=TechCrunch}}</ref>
| |
− | | |
− | In 2015, Blippar demonstrated a mobile augmented reality application that uses deep learning to recognize objects in real time.
| |
− | | |
− | 在2015年,Blippar 展示了一个移动扩增实境应用程序,它使用深度学习来实时识别物体。
| |
− | | |
− | | |
− | | |
− | In 2017, Covariant.ai was launched, which focuses on integrating deep learning into factories.<ref>[https://www.nytimes.com/2017/11/06/technology/artificial-intelligence-start-up.html A.I. Researchers Leave Elon Musk Lab to Begin Robotics Start-Up]</ref>
| |
− | | |
− | In 2017, Covariant.ai was launched, which focuses on integrating deep learning into factories.
| |
− | | |
− | 2017年,Covariant.ai 推出,致力于将深度学习融入工厂。
| |
− | | |
− | | |
− | | |
− | As of 2008,<ref>{{Cite document|title=TAMER: Training an Agent Manually via Evaluative Reinforcement - IEEE Conference Publication|doi=10.1109/DEVLRN.2008.4640845}}</ref> researchers at [[University of Texas at Austin|The University of Texas at Austin]] (UT) developed a machine learning framework called Training an Agent Manually via Evaluative Reinforcement, or TAMER, which proposed new methods for robots or computer programs to learn how to perform tasks by interacting with a human instructor.<ref name=":12" /> First developed as TAMER, a new algorithm called Deep TAMER was later introduced in 2018 during a collaboration between [[U.S. Army Research Laboratory]] (ARL) and UT researchers. Deep TAMER used deep learning to provide a robot the ability to learn new tasks through observation.<ref name=":12" /> Using Deep TAMER, a robot learned a task with a human trainer, watching video streams or observing a human perform a task in-person. The robot later practiced the task with the help of some coaching from the trainer, who provided feedback such as “good job” and “bad job.”<ref>{{Cite web|url=https://governmentciomedia.com/talk-algorithms-ai-becomes-faster-learner|title=Talk to the Algorithms: AI Becomes a Faster Learner|website=governmentciomedia.com|access-date=2018-08-29}}</ref>
| |
− | | |
− | As of 2008, researchers at The University of Texas at Austin (UT) developed a machine learning framework called Training an Agent Manually via Evaluative Reinforcement, or TAMER, which proposed new methods for robots or computer programs to learn how to perform tasks by interacting with a human instructor.
| |
− | | |
− | 2008年,德克萨斯州大学奥斯汀分校的研究人员开发了一个机器学习框架,称为通过评估强化手动训练一个代理,或 TAMER,它提出了新的方法为机器人或计算机程序学习如何执行任务与人类教师互动。
| |
− | | |
− | | |
− | | |
− | == Criticism and comment ==
| |
− | | |
− | Deep learning has attracted both criticism and comment, in some cases from outside the field of computer science.
| |
− | | |
− | Deep learning has attracted both criticism and comment, in some cases from outside the field of computer science.
| |
− | | |
− | 深度学习已经招致了批评和评论,在某些情况下来自计算机科学领域之外。
| |
− | | |
− | | |
− | | |
− | === Theory ===
| |
− | | |
− | {{see also|Explainable AI}}
| |
− | | |
− | A main criticism concerns the lack of theory surrounding some methods.<ref>{{Cite web|url=https://medium.com/@GaryMarcus/in-defense-of-skepticism-about-deep-learning-6e8bfd5ae0f1|title=In defense of skepticism about deep learning|last=Marcus|first=Gary|date=2018-01-14|website=Gary Marcus|access-date=2018-10-11}}</ref> Learning in the most common deep architectures is implemented using well-understood gradient descent. However, the theory surrounding other algorithms, such as contrastive divergence is less clear.{{citation needed|date=July 2016}} (e.g., Does it converge? If so, how fast? What is it approximating?) Deep learning methods are often looked at as a [[black box]], with most confirmations done empirically, rather than theoretically.<ref name="Knight 2017">{{cite web | last=Knight | first=Will | title=DARPA is funding projects that will try to open up AI's black boxes | website=MIT Technology Review | date=2017-03-14 | url=https://www.technologyreview.com/s/603795/the-us-military-wants-its-autonomous-machines-to-explain-themselves/ | accessdate=2017-11-02}}</ref>
| |
− | | |
− | A main criticism concerns the lack of theory surrounding some methods. Learning in the most common deep architectures is implemented using well-understood gradient descent. However, the theory surrounding other algorithms, such as contrastive divergence is less clear. (e.g., Does it converge? If so, how fast? What is it approximating?) Deep learning methods are often looked at as a black box, with most confirmations done empirically, rather than theoretically.
| |
− | | |
− | 一个主要的批评是关于围绕某些方法的理论缺乏。在最常见的深层架构中学习是通过使用易于理解的梯度下降法实现的。然而,围绕其他算法的理论,比如对比发散,还不是很清楚。(例如,它会聚吗?如果是这样,有多快?它的近似值是多少?)深度学习方法常常被看作是一个黑盒子,大多数的证实都是根据经验进行的,而不是理论上的。
| |
− | | |
− | | |
− | | |
− | Others point out that deep learning should be looked at as a step towards realizing strong AI, not as an all-encompassing solution. Despite the power of deep learning methods, they still lack much of the functionality needed for realizing this goal entirely. Research psychologist Gary Marcus noted:<blockquote>"Realistically, deep learning is only part of the larger challenge of building intelligent machines. Such techniques lack ways of representing [[causality|causal relationships]] (...) have no obvious ways of performing [[inference|logical inferences]], and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used. The most powerful A.I. systems, like [[Watson (computer)|Watson]] (...) use techniques like deep learning as just one element in a very complicated ensemble of techniques, ranging from the statistical technique of [[Bayesian inference]] to [[deductive reasoning]]."<ref>{{cite magazine|url=https://www.newyorker.com/|title=Is "Deep Learning" a Revolution in Artificial Intelligence?|last=Marcus|first=Gary|date=November 25, 2012|magazine=The New Yorker|accessdate=2017-06-14}}</ref></blockquote>As an alternative to this emphasis on the limits of deep learning, one author speculated that it might be possible to train a machine vision stack to perform the sophisticated task of discriminating between "old master" and amateur figure drawings, and hypothesized that such a sensitivity might represent the rudiments of a non-trivial machine empathy.<ref>{{cite web|url=http://artent.net/2015/03/27/art-and-artificial-intelligence-by-g-w-smith/|title=Art and Artificial Intelligence|date=March 27, 2015|publisher=ArtEnt|author=Smith, G. W.|accessdate=March 27, 2015|url-status=bot: unknown|archiveurl=https://web.archive.org/web/20170625075845/http://artent.net/2015/03/27/art-and-artificial-intelligence-by-g-w-smith/|archivedate=June 25, 2017}}</ref> This same author proposed that this would be in line with anthropology, which identifies a concern with aesthetics as a key element of [[behavioral modernity]].<ref>{{cite web |url=http://repositriodeficheiros.yolasite.com/resources/Texto%2028.pdf |author=Mellars, Paul |date=February 1, 2005 |title=The Impossible Coincidence: A Single-Species Model for the Origins of Modern Human Behavior in Europe|publisher=Evolutionary Anthropology: Issues, News, and Reviews |accessdate=April 5, 2017}}</ref>
| |
− | | |
− | Others point out that deep learning should be looked at as a step towards realizing strong AI, not as an all-encompassing solution. Despite the power of deep learning methods, they still lack much of the functionality needed for realizing this goal entirely. Research psychologist Gary Marcus noted:<blockquote>"Realistically, deep learning is only part of the larger challenge of building intelligent machines. Such techniques lack ways of representing causal relationships (...) have no obvious ways of performing logical inferences, and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used. The most powerful A.I. systems, like Watson (...) use techniques like deep learning as just one element in a very complicated ensemble of techniques, ranging from the statistical technique of Bayesian inference to deductive reasoning."</blockquote>As an alternative to this emphasis on the limits of deep learning, one author speculated that it might be possible to train a machine vision stack to perform the sophisticated task of discriminating between "old master" and amateur figure drawings, and hypothesized that such a sensitivity might represent the rudiments of a non-trivial machine empathy. This same author proposed that this would be in line with anthropology, which identifies a concern with aesthetics as a key element of behavioral modernity.
| |
− | | |
− | 其他人指出,深度学习应该被看作是实现强大人工智能的一个步骤,而不是一个包罗万象的解决方案。尽管深度学习方法很强大,但它们仍然缺乏完全实现这一目标所需的大量功能。研究心理学家加里 · 马库斯指出: “实际上,深度学习只是构建智能机器这一更大挑战的一部分。这些技术缺乏表示因果关系的方法(...) ,没有明显的方法进行逻辑推理,而且它们距离整合抽象知识还有很长的路要走,比如关于物体是什么,它们是用来做什么的信息,以及它们是如何被典型地使用的。最强的人工智能。像 Watson (...)这样的系统使用像深度学习这样的技术作为一个非常复杂的技术集合中的一个元素,从贝叶斯推断的统计技术到演绎推理。” / blockquote 作为对深度学习局限性的强调的替代,一位作者推测也许可以训练机器视觉堆栈来执行区分“老大师”和业余人物画的复杂任务,并假设这种敏感性可能代表了非微不足道的机器同理心的雏形。这位作者提出,这与人类学是一致的,人类学认为美学是行为现代性的一个关键因素。
| |
− | | |
− | | |
− | | |
− | In further reference to the idea that artistic sensitivity might inhere within relatively low levels of the cognitive hierarchy, a published series of graphic representations of the internal states of deep (20-30 layers) neural networks attempting to discern within essentially random data the images on which they were trained<ref>{{cite web|url=http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html |author1=Alexander Mordvintsev |author2=Christopher Olah |author3=Mike Tyka |date=June 17, 2015 |title=Inceptionism: Going Deeper into Neural Networks |publisher=Google Research Blog |accessdate=June 20, 2015}}</ref> demonstrate a visual appeal: the original research notice received well over 1,000 comments, and was the subject of what was for a time the most frequently accessed article on ''[[The Guardian]]'s''<ref>{{cite news|url=https://www.theguardian.com/technology/2015/jun/18/google-image-recognition-neural-network-androids-dream-electric-sheep|title=Yes, androids do dream of electric sheep|date=June 18, 2015|newspaper=The Guardian|author=Alex Hern|accessdate=June 20, 2015}}</ref> website.
| |
− | | |
− | In further reference to the idea that artistic sensitivity might inhere within relatively low levels of the cognitive hierarchy, a published series of graphic representations of the internal states of deep (20-30 layers) neural networks attempting to discern within essentially random data the images on which they were trained demonstrate a visual appeal: the original research notice received well over 1,000 comments, and was the subject of what was for a time the most frequently accessed article on The Guardian's website.
| |
− | | |
− | 为了进一步说明艺术敏感性可能存在于认知层级相对较低的层次中,一系列已发表的关于深层(20-30层)神经网络内部状态的图形表示试图在本质上是随机的数据中辨别他们所受训练的图像展示了一种视觉吸引力: 最初的研究通知收到了超过1000条评论,并且一度是《卫报》网站上最常被访问的文章的主题。
| |
− | | |
− | | |
− | | |
− | === Errors ===
| |
− | | |
− | Some deep learning architectures display problematic behaviors,<ref name=goertzel>{{cite web|first=Ben |last=Goertzel |title=Are there Deep Reasons Underlying the Pathologies of Today's Deep Learning Algorithms? |year=2015 |url=http://goertzel.org/DeepLearning_v1.pdf}}</ref> such as confidently classifying unrecognizable images as belonging to a familiar category of ordinary images<ref>{{cite arxiv |eprint=1412.1897|last1=Nguyen|first1=Anh|title=Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images|last2=Yosinski|first2=Jason|last3=Clune|first3=Jeff|class=cs.CV|year=2014}}</ref> and misclassifying minuscule perturbations of correctly classified images.<ref>{{cite arxiv |eprint=1312.6199|last1=Szegedy|first1=Christian|title=Intriguing properties of neural networks|last2=Zaremba|first2=Wojciech|last3=Sutskever|first3=Ilya|last4=Bruna|first4=Joan|last5=Erhan|first5=Dumitru|last6=Goodfellow|first6=Ian|last7=Fergus|first7=Rob|class=cs.CV|year=2013}}</ref> [[Ben Goertzel|Goertzel]] hypothesized that these behaviors are due to limitations in their internal representations and that these limitations would inhibit integration into heterogeneous multi-component [[artificial general intelligence]] (AGI) architectures.<ref name="goertzel" /> These issues may possibly be addressed by deep learning architectures that internally form states homologous to image-grammar<ref>{{cite journal | last1 = Zhu | first1 = S.C. | last2 = Mumford | first2 = D. | year = 2006| title = A stochastic grammar of images | url= | journal = Found. Trends Comput. Graph. Vis. | volume = 2 | issue = 4| pages = 259–362 | doi = 10.1561/0600000018| citeseerx = 10.1.1.681.2190 }}</ref> decompositions of observed entities and events.<ref name="goertzel"/> [[Grammar induction|Learning a grammar]] (visual or linguistic) from training data would be equivalent to restricting the system to [[commonsense reasoning]] that operates on concepts in terms of grammatical [[Production (computer science)|production rules]] and is a basic goal of both human language acquisition<ref>Miller, G. A., and N. Chomsky. "Pattern conception." Paper for Conference on pattern detection, University of Michigan. 1957.</ref> and [[artificial intelligence]] (AI).<ref>{{cite web|first=Jason |last=Eisner |title=Deep Learning of Recursive Structure: Grammar Induction |url=http://techtalks.tv/talks/deep-learning-of-recursive-structure-grammar-induction/58089/}}</ref>
| |
− | | |
− | Some deep learning architectures display problematic behaviors, such as confidently classifying unrecognizable images as belonging to a familiar category of ordinary images and misclassifying minuscule perturbations of correctly classified images. Goertzel hypothesized that these behaviors are due to limitations in their internal representations and that these limitations would inhibit integration into heterogeneous multi-component artificial general intelligence (AGI) architectures. decompositions of observed entities and events. and artificial intelligence (AI).
| |
− | | |
− | 一些深度学习架构表现出问题行为,例如自信地将不可识别的图像分类为熟悉的普通图像类别,以及对正确分类的图像的微小扰动进行错误分类。Goertzel 假设这些行为是由于它们内部表示的限制,这些限制会抑制集成到异构的多组件人工通用智能(AGI)架构中。观测实体和事件的分解。和人工智能(AI)。
| |
− | | |
− | | |
− | | |
− | === Cyber threat ===
| |
− | | |
− | As deep learning moves from the lab into the world, research and experience shows that artificial neural networks are vulnerable to hacks and deception.<ref>{{Cite web|url=https://gizmodo.com/hackers-have-already-started-to-weaponize-artificial-in-1797688425|title=Hackers Have Already Started to Weaponize Artificial Intelligence|website=Gizmodo|access-date=2019-10-11}}</ref> By identifying patterns that these systems use to function, attackers can modify inputs to ANNs in such a way that the ANN finds a match that human observers would not recognize. For example, an attacker can make subtle changes to an image such that the ANN finds a match even though the image looks to a human nothing like the search target. Such a manipulation is termed an “adversarial attack.”<ref>{{Cite web|url=https://www.dailydot.com/debug/adversarial-attacks-ai-mistakes/|title=How hackers can force AI to make dumb mistakes|date=2018-06-18|website=The Daily Dot|language=en|access-date=2019-10-11}}</ref> In 2016 researchers used one ANN to doctor images in trial and error fashion, identify another's focal points and thereby generate images that deceived it. The modified images looked no different to human eyes. Another group showed that printouts of doctored images then photographed successfully tricked an image classification system.<ref name=":4">{{Cite news|url=https://singularityhub.com/2017/10/10/ai-is-easy-to-fool-why-that-needs-to-change|title=AI Is Easy to Fool—Why That Needs to Change|last=|first=|date=2017-10-10|work=Singularity Hub|accessdate=2017-10-11}}</ref> One defense is reverse image search, in which a possible fake image is submitted to a site such as [[TinEye]] that can then find other instances of it. A refinement is to search using only parts of the image, to identify images from which that piece may have been taken'''.'''<ref>{{Cite journal|last=Gibney|first=Elizabeth|title=The scientist who spots fake videos|url=https://www.nature.com/news/the-scientist-who-spots-fake-videos-1.22784|journal=Nature|pages=|doi=10.1038/nature.2017.22784|via=|year=2017}}</ref>
| |
− | | |
− | As deep learning moves from the lab into the world, research and experience shows that artificial neural networks are vulnerable to hacks and deception. By identifying patterns that these systems use to function, attackers can modify inputs to ANNs in such a way that the ANN finds a match that human observers would not recognize. For example, an attacker can make subtle changes to an image such that the ANN finds a match even though the image looks to a human nothing like the search target. Such a manipulation is termed an “adversarial attack.” In 2016 researchers used one ANN to doctor images in trial and error fashion, identify another's focal points and thereby generate images that deceived it. The modified images looked no different to human eyes. Another group showed that printouts of doctored images then photographed successfully tricked an image classification system. One defense is reverse image search, in which a possible fake image is submitted to a site such as TinEye that can then find other instances of it. A refinement is to search using only parts of the image, to identify images from which that piece may have been taken.
| |
− | | |
− | 随着深度学习从实验室走向世界,研究和经验表明,人工神经网络很容易受到黑客和欺骗的攻击。通过识别这些系统使用的模式,攻击者可以修改人工神经网络的输入,使人工神经网络找到人类观察者无法识别的匹配。例如,攻击者可以对图像进行细微的修改,使 ANN 找到匹配的图像,即使图像看起来与搜索目标完全不同。这种操纵被称为“对抗性攻击” 2016年,研究人员使用一种人工神经网络,以试错的方式对图像进行修改,确定另一个人的焦点,从而生成欺骗它的图像。修改后的图像与人眼看到的没有什么不同。另一组研究表明,打印出的经过修改的图片然后被拍摄成功地欺骗了一张图片分类方案。其中一个防御措施是反向图像搜索,即将一个可能的假图像提交给 TinEye 这样的网站,然后该网站可以找到其他的实例。一个改进是搜索只使用图像的一部分,以确定图像可能是从哪一部分采取。
| |
− | | |
− | | |
− | | |
− | Another group showed that certain [[Psychedelic art|psychedelic]] spectacles could fool a [[facial recognition system]] into thinking ordinary people were celebrities, potentially allowing one person to impersonate another. In 2017 researchers added stickers to [[stop sign]]s and caused an ANN to misclassify them.<ref name=":4" />
| |
− | | |
− | Another group showed that certain psychedelic spectacles could fool a facial recognition system into thinking ordinary people were celebrities, potentially allowing one person to impersonate another. In 2017 researchers added stickers to stop signs and caused an ANN to misclassify them.
| |
− | | |
− | 另一组研究表明,某种迷幻眼镜可以欺骗面部识别系统,使其认为普通人是名人,从而使一个人可以模仿另一个人。2017年,研究人员在停车标志上增加了贴纸,导致人工神经网络将其错误分类。
| |
− | | |
− | | |
− | | |
− | ANNs can however be further trained to detect attempts at deception, potentially leading attackers and defenders into an arms race similar to the kind that already defines the [[malware]] defense industry. ANNs have been trained to defeat ANN-based anti-malware software by repeatedly attacking a defense with malware that was continually altered by a [[genetic algorithm]] until it tricked the anti-malware while retaining its ability to damage the target.<ref name=":4" />
| |
− | | |
− | ANNs can however be further trained to detect attempts at deception, potentially leading attackers and defenders into an arms race similar to the kind that already defines the malware defense industry. ANNs have been trained to defeat ANN-based anti-malware software by repeatedly attacking a defense with malware that was continually altered by a genetic algorithm until it tricked the anti-malware while retaining its ability to damage the target.
| |
− | | |
− | 然而,人工神经网络可以进一步训练,以发现欺骗企图,潜在地导致攻击者和防御者进入军备竞赛类似的类型,已经定义了恶意软件防御工业。人工神经网络已经被训练来击败基于人工神经网络的反恶意软件,方法是反复使用恶意软件攻击防御系统,这种防御系统被遗传算法不断改变,直到它欺骗了反恶意软件,同时保留了它破坏目标的能力。
| |
− | | |
− | | |
− | | |
− | Another group demonstrated that certain sounds could make the [[Google Now]] voice command system open a particular web address that would download malware.<ref name=":4" />
| |
− | | |
− | Another group demonstrated that certain sounds could make the Google Now voice command system open a particular web address that would download malware.
| |
− | | |
− | 另一个研究小组证明,某些声音可以使 Google Now 语音命令系统打开一个特定的网址,从而下载恶意软件。
| |
− | | |
− | | |
− | | |
− | In “data poisoning,” false data is continually smuggled into a machine learning system's training set to prevent it from achieving mastery.<ref name=":4" />
| |
− | | |
− | In “data poisoning,” false data is continually smuggled into a machine learning system's training set to prevent it from achieving mastery.
| |
− | | |
− | 在“数据中毒”中,错误的数据不断地被偷偷带入机器学习系统的训练集中,以防止它获得掌握。
| |
− | | |
− | | |
− | | |
− | === Reliance on human [[microwork]] ===
| |
− | | |
− | Most Deep Learning systems rely on training and verification data that is generated and/or annotated by humans. It has been argued in [[Media studies|media philosophy]] that not only low-paid [[Clickworkers|clickwork]] (e.g. on [[Amazon Mechanical Turk]]) is regularly deployed for this purpose, but also implicit forms of human [[microwork]] that are often not recognized as such.<ref name=":13">{{Cite journal|last=Mühlhoff|first=Rainer|date=2019-11-06|title=Human-aided artificial intelligence: Or, how to run large computations in human brains? Toward a media sociology of machine learning|journal=New Media & Society|language=en|volume=|pages=146144481988533|doi=10.1177/1461444819885334|issn=1461-4448}}</ref> The philosopher Rainer Mühlhoff distinguishes five types of "machinic capture" of human microwork to generate training data: (1) [[gamification]] (the embedding of annotation or computation tasks in the flow of a game), (2) "trapping and tracking" (e.g. [[CAPTCHA]]s for image recognition or click-tracking on Google [[Search engine results page|search results pages]]), (3) exploitation of social motivations (e.g. [[Tag (Facebook)|tagging faces]] on [[Facebook]] to obtain labeled facial images), (4) [[information mining]] (e.g. by leveraging [[Quantified self|quantified-self]] devices such as [[activity tracker]]s) and (5) [[Clickworkers|clickwork]].<ref name=":13" /> Mühlhoff argues that in most commercial end-user applications of Deep Learning such as [[DeepFace|Facebook's face recognition system]], the need for training data does not stop once an ANN is trained. Rather, there is a continued demand for human-generated verification data to constantly calibrate and update the ANN. For this purpose Facebook introduced the feature that once a user is automatically recognized in an image, they receive a notification. They can choose whether of not they like to be publicly labeled on the image, or tell Facebook that it is not them in the picture.<ref>{{Cite news|url=https://www.wired.com/story/facebook-will-find-your-face-even-when-its-not-tagged/|title=Facebook Can Now Find Your Face, Even When It's Not Tagged|work=Wired|access-date=2019-11-22|language=en|issn=1059-1028}}</ref> This user interface is a mechanism to generate "a constant stream of verification data"<ref name=":13" /> to further train the network in real-time. As Mühlhoff argues, involvement of human users to generate training and verification data is so typical for most commercial end-user applications of Deep Learning that such systems may be referred to as "human-aided artificial intelligence".<ref name=":13" />
| |
− | | |
− | Most Deep Learning systems rely on training and verification data that is generated and/or annotated by humans. It has been argued in media philosophy that not only low-paid clickwork (e.g. on Amazon Mechanical Turk) is regularly deployed for this purpose, but also implicit forms of human microwork that are often not recognized as such. The philosopher Rainer Mühlhoff distinguishes five types of "machinic capture" of human microwork to generate training data: (1) gamification (the embedding of annotation or computation tasks in the flow of a game), (2) "trapping and tracking" (e.g. CAPTCHAs for image recognition or click-tracking on Google search results pages), (3) exploitation of social motivations (e.g. tagging faces on Facebook to obtain labeled facial images), (4) information mining (e.g. by leveraging quantified-self devices such as activity trackers) and (5) clickwork. This user interface is a mechanism to generate "a constant stream of verification data" to further train the network in real-time. As Mühlhoff argues, involvement of human users to generate training and verification data is so typical for most commercial end-user applications of Deep Learning that such systems may be referred to as "human-aided artificial intelligence".
| |
− | | |
− | 大多数深度学习系统依赖于由人工生成和 / 或注释的训练和验证数据。传媒哲学一直认为,不仅仅是低薪点击工作(例如:。亚马逊土耳其机器人上的一个应用程序)定期为此目的而部署,但也有一些隐含的人类微工作形式通常不被认可。哲学家 Rainer m hlhoff 将人类微操作的“机械捕获”分为五种类型,用于生成训练数据: (1)游戏化(在游戏流中嵌入注释或计算任务) ,(2)“捕获和跟踪”(例如:。Captchas 用于图像识别或 Google 搜索结果页面上的点击跟踪) ,(3)利用社会动机(例如:。在脸谱网上标记脸孔以获取被标记的脸部图像) ,(4)信息挖掘(例如:。通过利用量化的自我设备,如活动跟踪器)和(5)点击。这个用户界面是一个生成“恒定验证数据流”的机制,以进一步实时训练网络。正如 m hlhoff 所说,人类用户参与生成培训和验证数据对于深度学习的大多数商业最终用户应用程序来说是如此典型,以至于这类系统可以被称为”人工智能辅助系统”。
| |
− | | |
− | | |
− | | |
− | == See also ==
| |
− | | |
− | * [[Applications of artificial intelligence]]
| |
− | | |
− | * [[Comparison of deep learning software]]
| |
− | | |
− | * [[Compressed sensing]]
| |
− | | |
− | * [[Echo state network]]
| |
− | | |
− | * [[List of artificial intelligence projects]]
| |
− | | |
− | * [[Liquid state machine]]
| |
− | | |
− | * [[List of datasets for machine learning research]]
| |
− | | |
− | * [[Reservoir computing]]
| |
− | | |
− | * [[Sparse coding]]
| |
− | | |
− | | |
− | | |
− | == References ==
| |
− | | |
− | {{Reflist|30em}}
| |
− | | |
− | | |
− | | |
− | == Further reading ==
| |
− | | |
− | {{refbegin}}
| |
− | | |
− | * {{cite book |title=Deep Learning |year=2016
| |
− | | |
− | |first1=Ian |last1=Goodfellow |authorlink1=Ian Goodfellow
| |
− | | |
− | |first1=Ian |last1=Goodfellow |authorlink1=Ian Goodfellow
| |
− | | |
− | 作者: 伊恩 · 古德菲勒
| |
− | | |
− | |first2=Yoshua |last2=Bengio |authorlink2=Yoshua Bengio
| |
− | | |
− | |first2=Yoshua |last2=Bengio |authorlink2=Yoshua Bengio
| |
− | | |
− | |first2=Yoshua |last2=Bengio |authorlink2=Yoshua Bengio
| |
− | | |
− | |first3=Aaron |last3=Courville
| |
− | | |
− | |first3=Aaron |last3=Courville
| |
− | | |
− | 3 Aaron | last 3 Courville
| |
− | | |
− | |publisher=MIT Press
| |
− | | |
− | |publisher=MIT Press
| |
− | | |
− | 出版商: MIT 出版社
| |
− | | |
− | |url=http://www.deeplearningbook.org
| |
− | | |
− | |url=http://www.deeplearningbook.org
| |
− | | |
− | Http://www.deeplearningbook.org
| |
− | | |
− | |isbn=978-0-26203561-3
| |
− | | |
− | |isbn=978-0-26203561-3
| |
− | | |
− | [国际标准图书馆编号978-0-26203561-3]
| |
− | | |
− | |postscript=, introductory textbook.
| |
− | | |
− | |postscript=, introductory textbook.
| |
− | | |
− | 附言,入门教科书。
| |
− | | |
− | }}
| |
− | | |
− | }}
| |
− | | |
− | }}
| |
− | | |
− | | |
− | | |
− | {{Prone to spam|date=June 2015}}{{Z148}}<!-- {{No more links}}
| |
− | | |
− | | |
− | | |
− | Please be cautious adding more external links.
| |
− | | |
− | Please be cautious adding more external links.
| |
− | | |
− | 请谨慎添加更多的外部链接。
| |
− | | |
− | | |
− | | |
− | Wikipedia is not a collection of links and should not be used for advertising.
| |
− | | |
− | Wikipedia is not a collection of links and should not be used for advertising.
| |
− | | |
− | 维基百科不是一个链接的集合,不应该被用来做广告。
| |
− | | |
− | | |
− | | |
− | Excessive or inappropriate links will be removed.
| |
− | | |
− | Excessive or inappropriate links will be removed.
| |
− | | |
− | 过多或不适当的链接将被删除。
| |
− | | |
− | | |
− | | |
− | See [[Wikipedia:External links]] and [[Wikipedia:Spam]] for details.
| |
− | | |
− | See Wikipedia:External links and Wikipedia:Spam for details.
| |
− | | |
− | 详见 Wikipedia: External links and Wikipedia: Spam for details。
| |
− | | |
− | | |
− | | |
− | If there are already suitable links, propose additions or replacements on
| |
− | | |
− | If there are already suitable links, propose additions or replacements on
| |
− | | |
− | 如果已经有合适的链接,建议在
| |
− | | |
− | the article's talk page, or submit your link to the relevant category at
| |
− | | |
− | the article's talk page, or submit your link to the relevant category at
| |
− | | |
− | 文章的讨论页面,或提交相关分类的链接
| |
− | | |
− | DMOZ (dmoz.org) and link there using {{Dmoz}}.
| |
− | | |
− | DMOZ (dmoz.org) and link there using .
| |
− | | |
− | Dmoz (DMOZ. org)并使用。
| |
− | | |
− | | |
− | | |
− | -->
| |
| | | |
− | --> | + | 许多组织为特定的应用程序采用深度学习。Facebook的人工智能实验室会有一些任务,比如自动给上传的图片加图片人名字的标签。<ref name="METZ2013">{{cite news|first=C. |last=Metz |title=Facebook's 'Deep Learning' Guru Reveals the Future of AI |url=https://www.wired.com/wiredenterprise/2013/12/facebook-yann-lecun-qa/ |publisher=Wired |date=12 December 2013}}</ref> |
| + | 谷歌的DeepMind开发了一个能够学习如何使用像素作为数据输入来玩游戏的系统。<ref>{{Cite web|title=Blippar Demonstrates New Real-Time Augmented Reality App|url=https://techcrunch.com/2015/12/08/blippar-demonstrates-new-real-time-augmented-reality-app/|website=TechCrunch}}</ref>2015年,他们展示了AlphaGo系统,这个系统学会了如何打败专业的围棋玩家。<ref>{{Cite web|title = Google AI algorithm masters ancient game of Go |url= http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234|website = Nature News & Comment|accessdate = 2016-01-30}}</ref><ref>{{Cite journal|title = Mastering the game of Go with deep neural networks and tree search|url = https://www.nature.com/nature/journal/v529/n7587/full/nature16961.html|journal = [[Nature (journal)|Nature]]| issn= 0028-0836|pages = 484–489|volume = 529|issue = 7587|doi = 10.1038/nature16961|pmid = 26819042|first1 = David|last1 = Silver|author-link1=David Silver (programmer)|first2 = Aja|last2 = Huang|author-link2=Aja Huang|first3 = Chris J.|last3 = Maddison|first4 = Arthur|last4 = Guez|first5 = Laurent|last5 = Sifre|first6 = George van den|last6 = Driessche|first7 = Julian|last7 = Schrittwieser|first8 = Ioannis|last8 = Antonoglou|first9 = Veda|last9 = Panneershelvam|first10= Marc|last10= Lanctot|first11= Sander|last11= Dieleman|first12=Dominik|last12= Grewe|first13= John|last13= Nham|first14= Nal|last14= Kalchbrenner|first15= Ilya|last15= Sutskever|author-link15=Ilya Sutskever|first16= Timothy|last16= Lillicrap|first17= Madeleine|last17= Leach|first18= Koray|last18= Kavukcuoglu|first19= Thore|last19= Graepel|first20= Demis |last20=Hassabis|author-link20=Demis Hassabis|date= 28 January 2016|bibcode = 2016Natur.529..484S|accessdate=11 December 2017}}{{closed access}}</ref><ref>{{Cite web|title = A Google DeepMind Algorithm Uses Deep Learning and More to Master the Game of Go {{!}} MIT Technology Review |url= http://www.technologyreview.com/news/546066/googles-ai-masters-the-game-of-go-a-decade-earlier-than-expected/|website = MIT Technology Review|accessdate = 2016-01-30}}</ref>谷歌翻译使用LSTM来翻译100多种语言。<ref>{{Cite web|url=https://governmentciomedia.com/talk-algorithms-ai-becomes-faster-learner|title=Talk to the Algorithms: AI Becomes a Faster Learner|website=governmentciomedia.com|language=en|access-date=2018-08-29}}</ref> |
| | | |
− | -->
| + | === 批评和评论 === |
| | | |
| + | 深度学习引来了一些批判和评论,某些情况下这些批判和评论来自于计算机科学领域之外。 |
| | | |
| + | ==== 理论 ==== |
| | | |
− | [[Category:Deep learning| ]]
| + | 一个主要的批判是,没有一个围绕这些方法的理论。在常见的深度学习结构中,学习使用的是常见的梯度下降算法。然而,围绕其他算法的理论,比如对比分歧,则不那么清楚。(例如,它可以收了吗?如果可以收敛那么要多久?它又有什么近似?)深度学习方法通常被视作一个黑匣子,大多数的证实都是通过实践而不是理论的。<ref name="Knight 2017">{{cite web | last=Knight | first=Will | title=DARPA is funding projects that will try to open up AI’s black boxes | website=MIT Technology Review | date=2017-03-14 | url=https://www.technologyreview.com/s/603795/the-us-military-wants-its-autonomous-machines-to-explain-themselves/ | accessdate=2017-11-02}}</ref> |
| | | |
− | [[Category:Artificial neural networks]]
| + | 其他人指出,应该把深度学习视作实现强人工智能的一个步骤,而不是一个包罗万象的解决方法。<ref>{{Cite web|url=http://www.newyorker.com/|title=Is "Deep Learning" a Revolution in Artificial Intelligence?|last=Marcus|first=Gary|date=November 25, 2012|website=|publisher=The New Yorker|accessdate=2017-06-14}}</ref>尽管深度学习非常强大,但是仍缺乏实现这个目标所需要的大部分功能。研究心理学家Gary Marcus指出:实际上,深度学习只是构建智能机器这一更大挑战的一部分。<ref>{{cite web|url=http://artent.net/2015/03/27/art-and-artificial-intelligence-by-g-w-smith/|title=Art and Artificial Intelligence|date=March 27, 2015|publisher=ArtEnt|author=Smith, G. W.|accessdate=March 27, 2015|deadurl=bot: unknown|archiveurl=https://web.archive.org/web/20170625075845/http://artent.net/2015/03/27/art-and-artificial-intelligence-by-g-w-smith/|archivedate=June 25, 2017|df=}}</ref> 这些技术缺乏表现因果关系的方法……没有明显的逻辑推理方法,而且它们与集成抽象知识(比如关于对象式什么的信息、它们的用途以及它们的典型使用方式)。最强大的人工智能系统,比如Watson使用了深度学习这一技术作为一个非常复杂技术组合中的一个元素,从统计学的贝叶斯推断到演绎推理。<ref>{{cite web |url=http://repositriodeficheiros.yolasite.com/resources/Texto%2028.pdf |author=Mellars, Paul |date=February 1, 2005 |title=The Impossible Coincidence: A Single-Species Model for the Origins of Modern Human Behavior in Europe|publisher=Evolutionary Anthropology: Issues, News, and Reviews |accessdate=April 5, 2017}}</ref> |
| | | |
− | Category:Artificial neural networks
| + | 作为强调深度学习限制的另一个例子,一位作者推测,也许有可能训练一个机器视觉的堆叠来完成一个复杂的任务,区分职业和业余人士的肖像画,并假设这种敏感性可能表示了普通机器学习算法演化的雏形。<ref>{{cite web|url=http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html |author1=Alexander Mordvintsev |author2=Christopher Olah |author3=Mike Tyka |date=June 17, 2015 |title=Inceptionism: Going Deeper into Neural Networks |publisher=Google Research Blog |accessdate=June 20, 2015}}</ref> 同一位作者指出,这符合人类学观点,认为美学是行为现代性的关键因素。 |
| | | |
− | 类别: 人工神经网络
| + | 为了进一步提到艺术明干新可能存在于认知层次的相对较低的层次中,有人发了一系列关于深度(20-30层)神经网络内部状态的图标,试图从完全随机的数据中分辨出他们所训练的图像:原始的研究已经收到了超过1000条评论,这是《卫报》网站上一段时间内最常被访问文章的主题。<ref>{{cite web|url=https://www.theguardian.com/technology/2015/jun/18/google-image-recognition-neural-network-androids-dream-electric-sheep|title=Yes, androids do dream of electric sheep|date=June 18, 2015|publisher=The Guardian|author=Alex Hern|accessdate=June 20, 2015}}</ref> web site. |
| | | |
− | [[Category:Artificial intelligence]]
| + | ==== 错误 ==== |
| | | |
− | Category:Artificial intelligence
| + | 一些深度学习构架表现出了一些有问题的行为,例如自信地将不可分辨的图像归类为一个常见的普通图像类型,<ref name=goertzel>{{cite web|first=Ben |last=Goertzel |title=Are there Deep Reasons Underlying the Pathologies of Today's Deep Learning Algorithms? |year=2015 |url=http://goertzel.org/DeepLearning_v1.pdf}}</ref>并且如果将正确分类的图像加上一个细小的扰动就会被分类错误。Goertzel假设,这些行为是由于它们内部表现的局限性,<ref>{{cite arxiv |eprint=1412.1897|last1=Nguyen|first1=Anh|title=Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images|last2=Yosinski|first2=Jason|last3=Clune|first3=Jeff|class=cs.CV|year=2014}}</ref> 这些限制将抑制异构的多成分AGI构架的集成。<ref>{{cite arxiv |eprint=1312.6199|last1=Szegedy|first1=Christian|title=Intriguing properties of neural networks|last2=Zaremba|first2=Wojciech|last3=Sutskever|first3=Ilya|last4=Bruna|first4=Joan|last5=Erhan|first5=Dumitru|last6=Goodfellow|first6=Ian|last7=Fergus|first7=Rob|class=cs.CV|year=2013}}</ref>这些问题也许可以通过深度学习构架来解决,这种体系结构的内部形式与观察到的实体和事件的图像语法分解类似。从训练数据中学习语法(视觉或语言)相当于将该系统限制在符合语法生产规则概念上的常识推理,这是人类语言习得<ref>Miller, G. A., and N. Chomsky. "Pattern conception." Paper for Conference on pattern detection, University of Michigan. 1957.</ref>和AI的基本目的。<ref>{{cite web|first=Jason |last=Eisner |title=Deep Learning of Recursive Structure: Grammar Induction |url=http://techtalks.tv/talks/deep-learning-of-recursive-structure-grammar-induction/58089/}}</ref> |
| | | |
− | 类别: 人工智能
| + | ==== 安全威胁 ==== |
| | | |
− | [[Category:Emerging technologies]]
| + | 随着深度学习从实验室走向世界,研究和经验表明人工神经网络容易受到黑客和欺诈攻击。通过识别这些系统的功能模式,攻击者可以修改ANNs的输入,以便ANN找到一个人类观察者无法识别的匹配。比如,攻击者可以对图像进行稍微的改变,使得ANN找到一个与之匹配的图像,即使这个图像看起来和人类的搜索目标完全不同。这种操作称之为对抗攻击。2016年,研究员在试错使用了一些医生的图像并改变了其焦点生成了新的图像就欺骗了ANN。这些修改过的图像对于人类而言并没什么区别。另一种图像显示,经过修改的图像打印出来的图片成功地欺骗了一个图像分类器。一种防御方式是反向图像搜索,把可能的假图像提交到TinEye这样的网站上,这样就可以找到其他的实例。<ref name=":4">{{Cite news|url=https://singularityhub.com/2017/10/10/ai-is-easy-to-fool-why-that-needs-to-change|title=AI Is Easy to Fool—Why That Needs to Change|last=|first=|date=2017-10-10|work=Singularity Hub|accessdate=2017-10-11}}</ref>一个改进是只使用图像的一部分进行搜索,以识别可能被拍摄的图片。<ref>{{Cite journal|last=Gibney|first=Elizabeth|date=|title=The scientist who spots fake videos|url=https://www.nature.com/news/the-scientist-who-spots-fake-videos-1.22784|journal=Nature|pages=|doi=10.1038/nature.2017.22784|via=|year=2017}}</ref> |
| + | 另一组研究表明,某种迷幻研究可能会糊弄面部识别系统,使他们认为普通人是一个名人,这可能会让一个人冒充另一个人。2017年,研究人员添加了贴纸来作为停止标志,并导致ANN误解了这些标志。<ref name=":4" /> |
| + | 然而,ANNs可以进一步进行训练,以发现欺骗的企图,潜在地引导攻击者和防御者进行类似于恶意软件行业的军备竞赛。通过反复攻击被遗传算法不断改进的防御软件来攻击防御软件。直到它欺骗了反恶意软件,同时还保留了攻击目标的能力。 |
| + | 另一个小组证明,某些声音可以让Google Now的语音指令系统打开一个特定的网址来下载恶意软件。<ref name=":4" /> |
| + | 在数据中毒中,虚假的数据不断地被整合进机器学习的训练集中来阻止其达到掌握。<ref name=":4" /> |
| | | |
− | Category:Emerging technologies
| |
| | | |
− | 类别: 新兴技术
| + | == 参考 == |
| | | |
− | <noinclude>
| + | * [https://en.wikipedia.org/wiki/Applications_of_artificial_intelligence Applications of artificial intelligence] |
| + | * [https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software Comparison of deep learning software] |
| + | * [https://en.wikipedia.org/wiki/Compressed_sensing Compressed sensing]* [https://en.wikipedia.org/wiki/Echo_state_network Echo state network] |
| + | * [https://en.wikipedia.org/wiki/List_of_artificial_intelligence_projects List of artificial intelligence projects] |
| + | * [https://en.wikipedia.org/wiki/Liquid_state_machine Liquid state machine] |
| + | * [https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research List of datasets for machine learning research] |
| + | * [https://en.wikipedia.org/wiki/Reservoir_computing Reservoir computing] |
| + | * [https://en.wikipedia.org/wiki/Sparse_coding Sparse coding] |
| | | |
− | <small>This page was moved from [[wikipedia:en:Deep learning]]. Its edit history can be viewed at [[深度学习/edithistory]]</small></noinclude> | + | == 引用 == |
| + | <references/> |
| | | |
| + | [[Category:旧词条迁移]] |
| [[Category:待整理页面]] | | [[Category:待整理页面]] |