更改

人工神经网络 (查看源代码)

2018年8月27日 (一) 15:24的版本

删除1,108字节、 2018年8月27日 (一) 15:24

第195行：第195行：

=== 卷积神经网络（Convolutional neural networks） ===

−

卷积神经网络 (CNN) 是一类深度前馈网络，由一或多层[https://en.wikipedia.org/wiki/Convolution 卷积]层和位于其上的全连接层（与典型ANN中的匹配）组成。它使用相等权重和池化层。特别地，最大池化<ref name="Weng19932"/>通常通过Fukushima的卷积结构组织。<ref name="FUKU1980">{{cite journal|year=1980|title=Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position|url=|journal=Biol. Cybern.|volume=36|issue=4|pages=193–202~~|doi=10.1007/bf00344251|pmid=7370364~~|last1=Fukushima|first1=K.}}</ref>这种结构允许CNN利用输入数据的2D结构

+

卷积神经网络 (CNN) 是一类深度前馈网络，由一或多层[https://en.wikipedia.org/wiki/Convolution 卷积]层和位于其上的全连接层（与典型ANN中的匹配）组成。它使用相等权重和池化层。特别地，最大池化<ref name="Weng19932"/>通常通过Fukushima的卷积结构组织。<ref name="FUKU1980">{{cite journal|year=1980|title=Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position|url=|journal=Biol. Cybern.|volume=36|issue=4|pages=193–202|last1=Fukushima|first1=K.}}</ref>这种结构允许CNN利用输入数据的2D结构

−

CNN适合处理视觉和其他二维数据<ref name="LECUN1989">LeCun ''et al.'', "Backpropagation Applied to Handwritten Zip Code Recognition," ''Neural Computation'', 1, pp. 541–551, 1989.</ref><ref name="lecun2016slides">[[Yann LeCun]] (2016). Slides on Deep Learning [https://indico.cern.ch/event/510372/ Online]</ref>，它们在图像和语音应用中展示出了优秀的结果。它们可以被标准反向传播训练。CNN比其他普通的深度前馈神经网络更容易训练且有更少的需要估计的参数。<ref name="STANCNN">{{cite web|url=http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/|title=Unsupervised Feature Learning and Deep Learning Tutorial|publisher=}}</ref> 计算机视觉中应用的例子包括[https://en.wikipedia.org/wiki/DeepDream DeepDream]<ref name="deepdream">{{cite journal|last2=Liu|first2=Wei|last3=Jia|first3=Yangqing|last4=Sermanet|first4=Pierre|last5=Reed|first5=Scott|last6=Anguelov|first6=Dragomir|last7=Erhan|first7=Dumitru|last8=Vanhoucke|first8=Vincent|last9=Rabinovich|first9=Andrew|date=|year=2014|title=Going Deeper with Convolutions|url=|journal=Computing Research Repository|volume=|pages=1|arxiv=1409.4842~~|doi=10.1109/CVPR.2015.7298594~~|via=|first1=Christian|last1=Szegedy~~|isbn=978-1-4673-6964-0~~}}</ref>和[https://en.wikipedia.org/wiki/Robot_navigation 机器人导航]<ref>{{cite journal | last=Ran | first=Lingyan | last2=Zhang | first2=Yanning | last3=Zhang | first3=Qilin | last4=Yang | first4=Tao | title=Convolutional Neural Network-Based Robot Navigation Using Uncalibrated Spherical Images | journal=Sensors | publisher=MDPI AG | volume=17 | issue=6 | date=2017-06-12 | issn=1424-8220 | ~~doi=10.3390/s17061341 |~~ page=1341 | url=https://qilin-zhang.github.io/_pages/pdfs/sensors-17-01341.pdf}}</ref>

+

CNN适合处理视觉和其他二维数据<ref name="LECUN1989">LeCun ''et al.'', "Backpropagation Applied to Handwritten Zip Code Recognition," ''Neural Computation'', 1, pp. 541–551, 1989.</ref><ref name="lecun2016slides">[https://en.wikipedia.org/wiki/Yann_LeCun Yann LeCun] (2016). Slides on Deep Learning [https://indico.cern.ch/event/510372/ Online]</ref>，它们在图像和语音应用中展示出了优秀的结果。它们可以被标准反向传播训练。CNN比其他普通的深度前馈神经网络更容易训练且有更少的需要估计的参数。<ref name="STANCNN">{{cite web|url=http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/|title=Unsupervised Feature Learning and Deep Learning Tutorial|publisher=}}</ref> 计算机视觉中应用的例子包括[https://en.wikipedia.org/wiki/DeepDream DeepDream]<ref name="deepdream">{{cite journal|last2=Liu|first2=Wei|last3=Jia|first3=Yangqing|last4=Sermanet|first4=Pierre|last5=Reed|first5=Scott|last6=Anguelov|first6=Dragomir|last7=Erhan|first7=Dumitru|last8=Vanhoucke|first8=Vincent|last9=Rabinovich|first9=Andrew|date=|year=2014|title=Going Deeper with Convolutions|url=|journal=Computing Research Repository|volume=|pages=1|url=https://arxiv.org/abs/1409.4842|via=|first1=Christian|last1=Szegedy}}</ref>和[https://en.wikipedia.org/wiki/Robot_navigation 机器人导航]<ref>{{cite journal | last=Ran | first=Lingyan | last2=Zhang | first2=Yanning | last3=Zhang | first3=Qilin | last4=Yang | first4=Tao | title=Convolutional Neural Network-Based Robot Navigation Using Uncalibrated Spherical Images | journal=Sensors | publisher=MDPI AG | volume=17 | issue=6 | date=2017-06-12 | issn=1424-8220 | page=1341 | url=https://qilin-zhang.github.io/_pages/pdfs/sensors-17-01341.pdf}}</ref>

。

===长短期记忆（ Long short-term memory） ===

−

长短期记忆 (LSTM) 网络是避免了[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]。<ref name=":03">{{Cite journal|last=Hochreiter|first=Sepp|author-link=Sepp Hochreiter|last2=Schmidhuber|first2=Jürgen|author-link2=Jürgen Schmidhuber|date=1997-11-01|title=Long Short-Term Memory|url=http://www.mitpressjournals.org/doi/10.1162/neco.1997.9.8.1735|journal=Neural Computation|volume=9|issue=8|pages=1735–1780~~|doi=10.1162/neco.1997.9.8.1735~~|issn=0899-7667|via=}}</ref> LSTM通常被称为遗忘门的循环门扩展<ref name=":10">{{Cite web|url=https://www.researchgate.net/publication/220320057_Learning_Precise_Timing_with_LSTM_Recurrent_Networks|title=Learning Precise Timing with LSTM Recurrent Networks (PDF Download Available)|website=ResearchGate|language=en|access-date=2017-06-13|pp=115–143}}</ref>。 LSTM网络避免了反向传播误差的消失或爆炸。<ref name="HOCH19912"/> 误差可以通过在空间展开的LSTM中的无限制的虚层反向回流。也就是说，LSTM可以学习“非常深的学习”任务，<ref name="SCHIDHUB2" />这些任务需要记住上千甚至上百万离散时间步前的事件。问题特殊的LSTM形态的拓扑结构可以成为进化的LSTM，<ref>{{Cite journal|last=Bayer|first=Justin|last2=Wierstra|first2=Daan|last3=Togelius|first3=Julian|last4=Schmidhuber|first4=Jürgen|date=2009-09-14|title=Evolving Memory Cell Structures for Sequence Learning|url=https://link.springer.com/chapter/10.1007/978-3-642-04277-5_76|journal=Artificial Neural Networks – ICANN 2009|volume=5769|language=en|publisher=Springer, Berlin, Heidelberg|pages=755–764~~|doi=10.1007/978-3-642-04277-5_76~~|series=Lecture Notes in Computer Science~~|isbn=978-3-642-04276-8~~}}</ref> 能处理长延迟和混合高低频成分的信号。

+

长短期记忆 (LSTM) 网络是避免了[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]。<ref name=":03">{{Cite journal|last=Hochreiter|first=Sepp|author-link=Sepp Hochreiter|last2=Schmidhuber|first2=Jürgen|author-link2=Jürgen Schmidhuber|date=1997-11-01|title=Long Short-Term Memory|url=http://www.mitpressjournals.org/doi/10.1162/neco.1997.9.8.1735|journal=Neural Computation|volume=9|issue=8|pages=1735–1780|issn=0899-7667|via=}}</ref> LSTM通常被称为遗忘门的循环门扩展<ref name=":10">{{Cite web|url=https://www.researchgate.net/publication/220320057_Learning_Precise_Timing_with_LSTM_Recurrent_Networks|title=Learning Precise Timing with LSTM Recurrent Networks (PDF Download Available)|website=ResearchGate|language=en|access-date=2017-06-13|pp=115–143}}</ref>。 LSTM网络避免了反向传播误差的消失或爆炸。<ref name="HOCH19912"/> 误差可以通过在空间展开的LSTM中的无限制的虚层反向回流。也就是说，LSTM可以学习“非常深的学习”任务，<ref name="SCHIDHUB2" />这些任务需要记住上千甚至上百万离散时间步前的事件。问题特殊的LSTM形态的拓扑结构可以成为进化的LSTM，<ref>{{Cite journal|last=Bayer|first=Justin|last2=Wierstra|first2=Daan|last3=Togelius|first3=Julian|last4=Schmidhuber|first4=Jürgen|date=2009-09-14|title=Evolving Memory Cell Structures for Sequence Learning|url=https://link.springer.com/chapter/10.1007/978-3-642-04277-5_76|journal=Artificial Neural Networks – ICANN 2009|volume=5769|language=en|publisher=Springer, Berlin, Heidelberg|pages=755–764|series=Lecture Notes in Computer Science}}</ref> 能处理长延迟和混合高低频成分的信号。

大量LSTM RNN<ref>{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=Sequence labelling in structured domains with hierarchical recurrent neural networks|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.1887|journal=In Proc. 20th Int. Joint Conf. on Artificial In℡ligence, Ijcai 2007|pages=774–779}}</ref> 使用联结主义时间分类（CTC）训练，<ref name=":12">{{Cite journal|last=Graves|first=Alex|last2=Fernández|first2=Santiago|last3=Gomez|first3=Faustino|date=2006|title=Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.6306|journal=In Proceedings of the International Conference on Machine Learning, ICML 2006|pages=369–376}}</ref> 给定相应输入序列，可以找到一个最大化训练集中标记序列概率的RNN权重矩阵。CTC达到了校准和识别。

−

2003，LSTM开始在传统语音识别器中具有竞争力。<ref name="graves2003">{{Cite web|url=Ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf|title=Biologically Plausible Speech Recognition with LSTM Neural Nets|last=Graves|first=Alex|last2=Eck|first2=Douglas|date=2003|website=1st Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland|pages=175–184|archive-url=|archive-date=|dead-url=|access-date=|last3=Beringer|first3=Nicole|last4=Schmidhuber|first4=Jürgen|authorlink4=~~Jürgen Schmidhuber~~}}</ref>2007，与CTC的结合在语音数据上达到了第一个良好的结果。<ref name="fernandez2007keyword">{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting|url=http://dl.acm.org/citation.cfm?id=1778066.1778092|journal=Proceedings of the 17th International Conference on Artificial Neural Networks|series=ICANN'07|location=Berlin, Heidelberg|publisher=Springer-Verlag|pages=220–229~~|isbn=3540746935~~}}</ref>2009，一个CTC训练的LSTM成为第一个赢得模式识别比赛的RNN，当它赢得了几个连笔[https://en.wikipedia.org/wiki/Handwriting_recognition 手写识别]比赛。<ref name="SCHIDHUB2" /><ref name="graves20093"/>2014，[https://en.wikipedia.org/wiki/Baidu 百度]使用CTC训练的RNN打破了Switchboard Hub5'00语音识别在基准测试数据集上的表现，而没有使用传统语音处理方法。<ref name="hannun2014">{{cite ~~arxiv~~|last=Hannun|first=Awni|last2=Case|first2=Carl|last3=Casper|first3=Jared|last4=Catanzaro|first4=Bryan|last5=Diamos|first5=Greg|last6=Elsen|first6=Erich|last7=Prenger|first7=Ryan|last8=Satheesh|first8=Sanjeev|last9=Sengupta|first9=Shubho|date=2014-12-17|title=Deep Speech: Scaling up end-to-end speech recognition|~~eprint~~=1412.5567~~|class=cs.CL~~}}</ref> LSTM也提高了大量词汇语音识别，<ref name="sak2014">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling|last=Sak|first=Hasim|last2=Senior|first2=Andrew|date=2014|website=|archive-url=|archive-date=|dead-url=|access-date=|last3=Beaufays|first3=Francoise}}</ref><ref name="liwu2015">{{cite ~~arxiv~~|last=Li|first=Xiangang|last2=Wu|first2=Xihong|date=2014-10-15|title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition|~~eprint~~=1410.4281~~|class=cs.CL~~}}</ref>文本到语音合成，<ref>{{Cite web|url=https://www.researchgate.net/publication/287741874_TTS_synthesis_with_bidirectional_LSTM_based_Recurrent_Neural_Networks|title=TTS synthesis with bidirectional LSTM based Recurrent Neural Networks|last=Fan|first=Y.|last2=Qian|first2=Y.|date=2014|website=ResearchGate|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-06-13|last3=Xie|first3=F.|last4=Soong|first4=F. K.}}</ref> 对谷歌安卓<ref name="scholarpedia2"/><ref name="zen2015">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis|last=Zen|first=Heiga|last2=Sak|first2=Hasim|date=2015|website=Google.com|publisher=ICASSP|pages=4470–4474|archive-url=|archive-date=|dead-url=|access-date=}}</ref>和真实图片的传声头像。<ref name="fan2015">{{Cite journal|last=Fan|first=Bo|last2=Wang|first2=Lijuan|last3=Soong|first3=Frank K.|last4=Xie|first4=Lei|date=2015|title=Photo-Real Talking Head with Deep Bidirectional LSTM|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2015/04/icassp2015_fanbo_1009.pdf|journal=Proceedings of ICASSP|volume=|pages=|via=}}</ref>2015，谷歌的语音识别通过CTC训练的LSTM提高了49%的性能。<ref name="sak2015">{{Cite web|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html|title=Google voice search: faster and more accurate|last=Sak|first=Haşim|last2=Senior|first2=Andrew|date=September 2015|website=|archive-url=|archive-date=|dead-url=|access-date=|last3=Rao|first3=Kanishka|last4=Beaufays|first4=Françoise|last5=Schalkwyk|first5=Johan}}</ref>

+

2003，LSTM开始在传统语音识别器中具有竞争力。<ref name="graves2003">{{Cite web|url=Ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf|title=Biologically Plausible Speech Recognition with LSTM Neural Nets|last=Graves|first=Alex|last2=Eck|first2=Douglas|date=2003|website=1st Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland|pages=175–184|archive-url=|archive-date=|dead-url=|access-date=|last3=Beringer|first3=Nicole|last4=Schmidhuber|first4=Jürgen|authorlink4=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber}}</ref>2007，与CTC的结合在语音数据上达到了第一个良好的结果。<ref name="fernandez2007keyword">{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting|url=http://dl.acm.org/citation.cfm?id=1778066.1778092|journal=Proceedings of the 17th International Conference on Artificial Neural Networks|series=ICANN'07|location=Berlin, Heidelberg|publisher=Springer-Verlag|pages=220–229}}</ref>2009，一个CTC训练的LSTM成为第一个赢得模式识别比赛的RNN，当它赢得了几个连笔[https://en.wikipedia.org/wiki/Handwriting_recognition 手写识别]比赛。<ref name="SCHIDHUB2" /><ref name="graves20093"/>2014，[https://en.wikipedia.org/wiki/Baidu 百度]使用CTC训练的RNN打破了Switchboard Hub5'00语音识别在基准测试数据集上的表现，而没有使用传统语音处理方法。<ref name="hannun2014">{{cite journal|last=Hannun|first=Awni|last2=Case|first2=Carl|last3=Casper|first3=Jared|last4=Catanzaro|first4=Bryan|last5=Diamos|first5=Greg|last6=Elsen|first6=Erich|last7=Prenger|first7=Ryan|last8=Satheesh|first8=Sanjeev|last9=Sengupta|first9=Shubho|date=2014-12-17|title=Deep Speech: Scaling up end-to-end speech recognition|url=https://arxiv.org/abs/1412.5567}}</ref> LSTM也提高了大量词汇语音识别，<ref name="sak2014">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling|last=Sak|first=Hasim|last2=Senior|first2=Andrew|date=2014|website=|archive-url=|archive-date=|dead-url=|access-date=|last3=Beaufays|first3=Francoise}}</ref><ref name="liwu2015">{{cite journal|last=Li|first=Xiangang|last2=Wu|first2=Xihong|date=2014-10-15|title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition|url=https://arxiv.org/abs/1410.4281}}</ref>文本到语音合成，<ref>{{Cite web|url=https://www.researchgate.net/publication/287741874_TTS_synthesis_with_bidirectional_LSTM_based_Recurrent_Neural_Networks|title=TTS synthesis with bidirectional LSTM based Recurrent Neural Networks|last=Fan|first=Y.|last2=Qian|first2=Y.|date=2014|website=ResearchGate|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-06-13|last3=Xie|first3=F.|last4=Soong|first4=F. K.}}</ref> 对谷歌安卓<ref name="scholarpedia2"/><ref name="zen2015">{{Cite web|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis|last=Zen|first=Heiga|last2=Sak|first2=Hasim|date=2015|website=Google.com|publisher=ICASSP|pages=4470–4474|archive-url=|archive-date=|dead-url=|access-date=}}</ref>和真实图片的传声头像。<ref name="fan2015">{{Cite journal|last=Fan|first=Bo|last2=Wang|first2=Lijuan|last3=Soong|first3=Frank K.|last4=Xie|first4=Lei|date=2015|title=Photo-Real Talking Head with Deep Bidirectional LSTM|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2015/04/icassp2015_fanbo_1009.pdf|journal=Proceedings of ICASSP|volume=|pages=|via=}}</ref>2015，谷歌的语音识别通过CTC训练的LSTM提高了49%的性能。<ref name="sak2015">{{Cite web|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html|title=Google voice search: faster and more accurate|last=Sak|first=Haşim|last2=Senior|first2=Andrew|date=September 2015|website=|archive-url=|archive-date=|dead-url=|access-date=|last3=Rao|first3=Kanishka|last4=Beaufays|first4=Françoise|last5=Schalkwyk|first5=Johan}}</ref>

−

LSTM在[https://en.wikipedia.org/wiki/Natural_Language_Processing 自然语言处理]中变得受欢迎。不像之前基于[https://en.wikipedia.org/wiki/Hidden_Markov_model 隐式马尔科夫模型]和相似概念的模型，LSTM可以学习识别[https://en.wikipedia.org/wiki/Context-sensitive_languages 上下文有关语言]。<ref name="gers2001">{{cite journal|last2=Schmidhuber|first2=Jürgen|year=2001|title=LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages|url=|journal=IEEE Transactions on Neural Networks|volume=12|issue=6|pages=1333–1340~~|doi=10.1109/72.963769~~|last1=Gers|first1=Felix A.|authorlink2=~~Jürgen Schmidhuber~~}}</ref>LSTM提高了机器翻译，<ref>{{cite web | last=Huang | first=Jie | last2=Zhou | first2=Wengang | last3=Zhang | first3=Qilin | last4=Li | first4=Houqiang | last5=Li | first5=Weiping | title=Video-based Sign Language Recognition without Temporal Segmentation | eprint=1801.10111 | date=2018-01-30 | url=https://arxiv.org/pdf/1801.10111.pdf}}</ref><ref name="NIPS2014">{{Cite journal|last=Sutskever|first=L.|last2=Vinyals|first2=O.|last3=Le|first3=Q.|date=2014|title=Sequence to Sequence Learning with Neural Networks|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|journal=NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems |volume=2 |pages=3104–3112 |~~bibcode~~=~~2014arXiv1409~~.~~3215S |arxiv=~~1409.3215 ~~|class=cs.CL~~}}</ref>[https://en.wikipedia.org/wiki/Language_modeling 语言建模]<ref name="vinyals2016">{{cite ~~arxiv~~|last=Jozefowicz|first=Rafal|last2=Vinyals|first2=Oriol|last3=Schuster|first3=Mike|last4=Shazeer|first4=Noam|last5=Wu|first5=Yonghui|date=2016-02-07|title=Exploring the Limits of Language Modeling|~~eprint~~=1602.02410~~|class=cs.CL~~}}</ref>和多语言语言处理。<ref name="gillick2015">{{cite ~~arxiv~~|last=Gillick|first=Dan|last2=Brunk|first2=Cliff|last3=Vinyals|first3=Oriol|last4=Subramanya|first4=Amarnag|date=2015-11-30|title=Multilingual Language Processing From Bytes|~~eprint~~=1512.00103~~|class=cs.CL~~}}</ref>与CNN结合的LSTM提高了自动图像字幕标记。<ref name="vinyals2015">{{cite ~~arxiv~~|last=Vinyals|first=Oriol|last2=Toshev|first2=Alexander|last3=Bengio|first3=Samy|last4=Erhan|first4=Dumitru|date=2014-11-17|title=Show and Tell: A Neural Image Caption Generator|~~eprint~~=1411.4555~~|class=cs.CV~~}}</ref>

+

LSTM在[https://en.wikipedia.org/wiki/Natural_Language_Processing 自然语言处理]中变得受欢迎。不像之前基于[https://en.wikipedia.org/wiki/Hidden_Markov_model 隐式马尔科夫模型]和相似概念的模型，LSTM可以学习识别[https://en.wikipedia.org/wiki/Context-sensitive_languages 上下文有关语言]。<ref name="gers2001">{{cite journal|last2=Schmidhuber|first2=Jürgen|year=2001|title=LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages|url=|journal=IEEE Transactions on Neural Networks|volume=12|issue=6|pages=1333–1340|last1=Gers|first1=Felix A.|authorlink2=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber}}</ref>LSTM提高了机器翻译，<ref>{{cite web | last=Huang | first=Jie | last2=Zhou | first2=Wengang | last3=Zhang | first3=Qilin | last4=Li | first4=Houqiang | last5=Li | first5=Weiping | title=Video-based Sign Language Recognition without Temporal Segmentation | eprint=1801.10111 | date=2018-01-30 | url=https://arxiv.org/pdf/1801.10111.pdf}}</ref><ref name="NIPS2014">{{Cite journal|last=Sutskever|first=L.|last2=Vinyals|first2=O.|last3=Le|first3=Q.|date=2014|title=Sequence to Sequence Learning with Neural Networks|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|journal=NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems |volume=2 |pages=3104–3112 |url=https://arxiv.org/abs/1409.3215}}</ref>[https://en.wikipedia.org/wiki/Language_modeling 语言建模]<ref name="vinyals2016">{{cite journal|last=Jozefowicz|first=Rafal|last2=Vinyals|first2=Oriol|last3=Schuster|first3=Mike|last4=Shazeer|first4=Noam|last5=Wu|first5=Yonghui|date=2016-02-07|title=Exploring the Limits of Language Modeling|url=https://arxiv.org/abs/1602.02410}}</ref>和多语言语言处理。<ref name="gillick2015">{{cite journal|last=Gillick|first=Dan|last2=Brunk|first2=Cliff|last3=Vinyals|first3=Oriol|last4=Subramanya|first4=Amarnag|date=2015-11-30|title=Multilingual Language Processing From Bytes|url=https://arxiv.org/abs/1512.00103}}</ref>与CNN结合的LSTM提高了自动图像字幕标记。<ref name="vinyals2015">{{cite journal|last=Vinyals|first=Oriol|last2=Toshev|first2=Alexander|last3=Bengio|first3=Samy|last4=Erhan|first4=Dumitru|date=2014-11-17|title=Show and Tell: A Neural Image Caption Generator|url=https://arxiv.org/abs/1411.4555}}</ref>

=== 深度储蓄池计算（Deep reservoir computing） ===

−

深度储蓄池计算和深度回声状态网络 (deepESNs)<ref>{{Cite journal|last=Gallicchio|first=Claudio|last2=Micheli|first2=Alessio|last3=Pedrelli|first3=Luca|title=Deep reservoir computing: A critical experimental analysis|url=http://www.sciencedirect.com/science/article/pii/S0925231217307567|journal=Neurocomputing|volume=268|pages=87~~|doi=10.1016/j.neucom.2016.12.089~~|year=2017}}</ref><ref>{{Cite journal|last=Gallicchio|first=Claudio|last2=Micheli|first2=Alessio|date=|title=Echo State Property of Deep Reservoir Computing Networks|url=https://link.springer.com/article/10.1007/s12559-017-9461-9|journal=Cognitive Computation|language=en|volume=9|issue=3|pages=337–350~~|doi=10.1007/s12559-017-9461-9~~|issn=1866-9956|via=|year=2017}}</ref> 为高效训练的分层处理时序数据的模型提供了一个框架，同时使RNN的层次化构成的内在作用能够探查。

+

深度储蓄池计算和深度回声状态网络 (deepESNs)<ref>{{Cite journal|last=Gallicchio|first=Claudio|last2=Micheli|first2=Alessio|last3=Pedrelli|first3=Luca|title=Deep reservoir computing: A critical experimental analysis|url=http://www.sciencedirect.com/science/article/pii/S0925231217307567|journal=Neurocomputing|volume=268|pages=87|year=2017}}</ref><ref>{{Cite journal|last=Gallicchio|first=Claudio|last2=Micheli|first2=Alessio|date=|title=Echo State Property of Deep Reservoir Computing Networks|url=https://link.springer.com/article/10.1007/s12559-017-9461-9|journal=Cognitive Computation|language=en|volume=9|issue=3|pages=337–350|issn=1866-9956|via=|year=2017}}</ref> 为高效训练的分层处理时序数据的模型提供了一个框架，同时使RNN的层次化构成的内在作用能够探查。

=== 深度置信网络（Deep belief networks） ===

[[File:Restricted_Boltzmann_machine.svg.png|thumb|一个带有全连接可见和隐藏单元的[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机] (RBM) 。注意没有隐藏-隐藏和可见-可见连接。]]

−

一个深度置信网络（DBN）是一个概率的[https://en.wikipedia.org/wiki/Generative_model 生成模型]，它由多层隐藏层组成。可以被认为是一个组成每一层的简单学习模块的[https://en.wikipedia.org/wiki/Function_composition 组合]。<ref name="SCHOLARDBNS">{{cite journal|year=2009|title=Deep belief networks|url=|journal=Scholarpedia|volume=4|issue=5|page=~~5947|doi=10.4249/scholarpedia.~~5947|last1=Hinton|first1=G.E.~~|bibcode=2009SchpJ...4.5947H~~}}</ref>

+

一个深度置信网络（DBN）是一个概率的[https://en.wikipedia.org/wiki/Generative_model 生成模型]，它由多层隐藏层组成。可以被认为是一个组成每一层的简单学习模块的[https://en.wikipedia.org/wiki/Function_composition 组合]。<ref name="SCHOLARDBNS">{{cite journal|year=2009|title=Deep belief networks|url=|journal=Scholarpedia|volume=4|issue=5|page=5947|last1=Hinton|first1=G.E.}}</ref>

一个DBN可以被用于生成地预训练一个DNN，通过使用学习的DBN权重和初始DNN权重。

−

反向传播或其他差别算法就可以调整这些权重。当训练数据有限时特别有用，因为很差的初始化的权重可以显著阻碍模型表现。这些预训练的权重在权重空间的范围内，这个权重空间距离最优权重比随机选择的权重更近。这允许既提高模型表现又加快好的调整相位收敛。<ref>{{Cite journal|last=Larochelle|first=Hugo|last2=Erhan|first2=Dumitru|last3=Courville|first3=Aaron|last4=Bergstra|first4=James|last5=Bengio|first5=Yoshua|date=2007|title=An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation|url=http://doi.acm.org/10.1145/1273496.1273556|journal=Proceedings of the 24th International Conference on Machine Learning|series=ICML '07|location=New York, NY, USA|publisher=ACM|pages=473–480~~|doi=10.1145/1273496.1273556|isbn=9781595937933~~}}</ref>

+

反向传播或其他差别算法就可以调整这些权重。当训练数据有限时特别有用，因为很差的初始化的权重可以显著阻碍模型表现。这些预训练的权重在权重空间的范围内，这个权重空间距离最优权重比随机选择的权重更近。这允许既提高模型表现又加快好的调整相位收敛。<ref>{{Cite journal|last=Larochelle|first=Hugo|last2=Erhan|first2=Dumitru|last3=Courville|first3=Aaron|last4=Bergstra|first4=James|last5=Bengio|first5=Yoshua|date=2007|title=An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation|url=http://doi.acm.org/10.1145/1273496.1273556|journal=Proceedings of the 24th International Conference on Machine Learning|series=ICML '07|location=New York, NY, USA|publisher=ACM|pages=473–480}}</ref>

===大内存和检索神经网络===

−

大内存和检索神经网络（LAMSTAR）<ref name="book2013">{{cite book|url={{google books ~~|plainurl=y |~~id=W6W6CgAAQBAJ&pg=PP1}}|title=Principles of Artificial Neural Networks|last=Graupe|first=Daniel|publisher=World Scientific|year=2013~~|isbn=978-981-4522-74-8~~|location=|pages=1–|ref=harv}}</ref><ref name="GrPatent">{{Patent|US|5920852 A|D. Graupe," Large memory storage and retrieval (LAMSTAR) network, April 1996}}</ref>是多层快速深度学习神经网络，可以同时使用许多滤波。这些滤波可能非线性，随机，逻辑，[https://en.wikipedia.org/wiki/Non-stationary 非固定]甚至非解析。它们是生物学动机的并且可以连续学习。

+

大内存和检索神经网络（LAMSTAR）<ref name="book2013">{{cite book|url=https://books.google.com/books?id=W6W6CgAAQBAJ&pg=PP1|title=Principles of Artificial Neural Networks|last=Graupe|first=Daniel|publisher=World Scientific|year=2013|location=|pages=1–|ref=harv}}</ref><ref name="GrPatent">{{Patent|US|5920852 A|D. Graupe," Large memory storage and retrieval (LAMSTAR) network, April 1996}}</ref>是多层快速深度学习神经网络，可以同时使用许多滤波。这些滤波可能非线性，随机，逻辑，[https://en.wikipedia.org/wiki/Non-stationary 非固定]甚至非解析。它们是生物学动机的并且可以连续学习。

LAMSTAR神经网络可以作为在空间或时间或二者兼具的域内的动力神经网络。它的速度由[https://en.wikipedia.org/wiki/Hebbian 赫布（Hebbian）]连接权重提供，<ref name=book2013a>D. Graupe, "Principles of Artificial Neural Networks.3rd Edition", World Scientific Publishers, 2013, pp. 203–274.</ref>它整合多种并且通常不同的滤波（预处理函数）到它的与给定学习任务相关的很多层和函数中。这很大程度模拟了整合多种预处理器（[https://en.wikipedia.org/wiki/Cochlea 耳蜗]，[https://en.wikipedia.org/wiki/Retina 视网膜]等）和皮层（听觉，视觉等）和它们的多个域的生物学习。通过使用抑制，相关，它的深度学习能力大大增强，甚至当在任务中时，处理不完整数据的能力或“丢失的”神经元或层的能力也显著增强。由于它的连接权重，它是完全透明的。这些连接权重允许动态地决定更新和去除，并且帮助任务相关的层，滤波或单独神经元的排列。

−

LAMSTAR被应用于多个领域，包括医药<ref>{{Cite journal|last=Nigam|first=Vivek Prakash|last2=Graupe|first2=Daniel|date=2004-01-01|title=A neural-network-based detection of epilepsy|journal=Neurological Research|volume=26|issue=1|pages=55–60~~|doi=10.1179/016164104773026534~~|issn=0161-6412~~|pmid=14977058~~}}</ref><ref name=":11">{{Cite journal|last=Waxman|first=Jonathan A.|last2=Graupe|first2=Daniel|last3=Carley|first3=David W.|date=2010-04-01|title=Automated Prediction of Apnea and Hypopnea, Using a LAMSTAR Artificial Neural Network|url=http://www.atsjournals.org/doi/abs/10.1164/rccm.200907-1146OC|journal=American Journal of Respiratory and Critical Care Medicine|volume=181|issue=7|pages=727–733~~|doi=10.1164/rccm.200907-1146oc~~|issn=1073-449X}}</ref><ref name="GrGrZh">{{cite journal|last2=Graupe|first2=M. H.|last3=Zhong|first3=Y.|last4=Jackson|first4=R. K.|year=2008|title=Blind adaptive filtering for non-invasive extraction of the fetal electrocardiogram and its non-stationarities|url=|journal=Proc. Inst. Mech. Eng. H|volume=222|issue=8|pages=1221–1234~~|doi=10.1243/09544119jeim417~~|last1=Graupe|first1=D.}}</ref>和金融预测，<ref name="book2013b">{{harvnb|Graupe|2013|pp=240–253}}</ref>在未知噪音下嘈杂语音的适应性滤波，<ref name="GrAbon">{{cite journal|last2=Abon|first2=J.|year=2002|title=A Neural Network for Blind Adaptive Filtering of Unknown Noise from Speech|url=https://www.tib.eu/en/search/id/BLCP:CN019373941/Blind-Adaptive-Filtering-of-Speech-from-Noise-of/|journal=Intelligent Engineering Systems Through Artificial Neural Networks|language=en|publisher=Technische Informationsbibliothek (TIB)|volume=12|issue=|pages=683–688|last1=Graupe|first1=D.|accessdate=2017-06-14}}</ref> 静态图像识别，<ref name="book2013c">D. Graupe, "Principles of Artificial Neural Networks.3rd Edition", World Scientific Publishers", 2013, pp. 253–274.</ref>视频图像识别，<ref name="Girado">{{cite journal|last2=Sandin|first2=D. J.|last3=DeFanti|first3=T. A.|year=2003|title=Real-time camera-based face detection using a modified LAMSTAR neural network system|url=|journal=Proc. SPIE 5015, Applications of Artificial Neural Networks in Image Processing VIII|volume=5015|issue=|pages=36|page=~~|doi=10.1117/12.477405~~|last1=Girado|first1=J. I.|series=Applications of Artificial Neural Networks in Image Processing VIII~~|bibcode=2003SPIE.5015...36G~~}}</ref>软件安全<ref name="VenkSel">{{cite journal|last2=Selvan|first2=S.|year=2007|title=Intrusion Detection using an Improved Competitive Learning Lamstar Network|url=|journal=International Journal of Computer Science and Network Security|volume=7|issue=2|pages=255–263|last1=Venkatachalam|first1=V}}</ref> 和非线性系统的适应性控制。<ref>{{Cite web|url=https://www.researchgate.net/publication/262316982_Control_of_unstable_nonlinear_and_nonstationary_systems_using_LAMSTAR_neural_networks|title=Control of unstable nonlinear and nonstationary systems using LAMSTAR neural networks|last=Graupe|first=D.|last2=Smollack|first2=M.|date=2007|website=ResearchGate|publisher=Proceedings of 10th IASTED on Intelligent Control, Sect.592,|pages=141–144|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-06-14}}</ref> LAMSTAR比基于[https://en.wikipedia.org/wiki/ReLU ReLU]函数滤波和最大池化的CNN在20个对比研究中有明显更快的学习速度，和稍低的错误率。<ref name="book1016">{{cite book|url={{google books ~~|plainurl=y |~~id=e5hIDQAAQBAJ~~|page~~=~~57}}~~|title=Deep Learning Neural Networks: Design and Case Studies|last=Graupe|first=Daniel|date=7 July 2016|publisher=World Scientific Publishing Co Inc|year=~~|isbn=978-981-314-647-1~~|location=|pages=57–110}}</ref>

+

LAMSTAR被应用于多个领域，包括医药<ref>{{Cite journal|last=Nigam|first=Vivek Prakash|last2=Graupe|first2=Daniel|date=2004-01-01|title=A neural-network-based detection of epilepsy|journal=Neurological Research|volume=26|issue=1|pages=55–60|issn=0161-6412}}</ref><ref name=":11">{{Cite journal|last=Waxman|first=Jonathan A.|last2=Graupe|first2=Daniel|last3=Carley|first3=David W.|date=2010-04-01|title=Automated Prediction of Apnea and Hypopnea, Using a LAMSTAR Artificial Neural Network|url=http://www.atsjournals.org/doi/abs/10.1164/rccm.200907-1146OC|journal=American Journal of Respiratory and Critical Care Medicine|volume=181|issue=7|pages=727–733|issn=1073-449X}}</ref><ref name="GrGrZh">{{cite journal|last2=Graupe|first2=M. H.|last3=Zhong|first3=Y.|last4=Jackson|first4=R. K.|year=2008|title=Blind adaptive filtering for non-invasive extraction of the fetal electrocardiogram and its non-stationarities|url=|journal=Proc. Inst. Mech. Eng. H|volume=222|issue=8|pages=1221–1234|last1=Graupe|first1=D.}}</ref>和金融预测，<ref name="book2013b">{{harvnb|Graupe|2013|pp=240–253}}</ref>在未知噪音下嘈杂语音的适应性滤波，<ref name="GrAbon">{{cite journal|last2=Abon|first2=J.|year=2002|title=A Neural Network for Blind Adaptive Filtering of Unknown Noise from Speech|url=https://www.tib.eu/en/search/id/BLCP:CN019373941/Blind-Adaptive-Filtering-of-Speech-from-Noise-of/|journal=Intelligent Engineering Systems Through Artificial Neural Networks|language=en|publisher=Technische Informationsbibliothek (TIB)|volume=12|issue=|pages=683–688|last1=Graupe|first1=D.|accessdate=2017-06-14}}</ref> 静态图像识别，<ref name="book2013c">D. Graupe, "Principles of Artificial Neural Networks.3rd Edition", World Scientific Publishers", 2013, pp. 253–274.</ref>视频图像识别，<ref name="Girado">{{cite journal|last2=Sandin|first2=D. J.|last3=DeFanti|first3=T. A.|year=2003|title=Real-time camera-based face detection using a modified LAMSTAR neural network system|url=|journal=Proc. SPIE 5015, Applications of Artificial Neural Networks in Image Processing VIII|volume=5015|issue=|pages=36|page=|last1=Girado|first1=J. I.|series=Applications of Artificial Neural Networks in Image Processing VIII}}</ref>软件安全<ref name="VenkSel">{{cite journal|last2=Selvan|first2=S.|year=2007|title=Intrusion Detection using an Improved Competitive Learning Lamstar Network|url=|journal=International Journal of Computer Science and Network Security|volume=7|issue=2|pages=255–263|last1=Venkatachalam|first1=V}}</ref> 和非线性系统的适应性控制。<ref>{{Cite web|url=https://www.researchgate.net/publication/262316982_Control_of_unstable_nonlinear_and_nonstationary_systems_using_LAMSTAR_neural_networks|title=Control of unstable nonlinear and nonstationary systems using LAMSTAR neural networks|last=Graupe|first=D.|last2=Smollack|first2=M.|date=2007|website=ResearchGate|publisher=Proceedings of 10th IASTED on Intelligent Control, Sect.592,|pages=141–144|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-06-14}}</ref> LAMSTAR比基于[https://en.wikipedia.org/wiki/ReLU ReLU]函数滤波和最大池化的CNN在20个对比研究中有明显更快的学习速度，和稍低的错误率。<ref name="book1016">{{cite book|url=https://books.google.com/books?id=e5hIDQAAQBAJ&pg=PA57|title=Deep Learning Neural Networks: Design and Case Studies|last=Graupe|first=Daniel|date=7 July 2016|publisher=World Scientific Publishing Co Inc|year=|location=|pages=57–110}}</ref>

这些应用展示了钻入数据藏在浅学习网络和人类感觉下的面貌，如预测[https://en.wikipedia.org/wiki/Sleep_apnea 睡眠呼吸中止症]，<ref name=":11" />怀孕早期从放在母亲腹部皮肤表面电极记录的胎儿心电图，<ref name="GrGrZh" /> 金融预测<ref name="book2013" />或者嘈杂语音的盲过滤<ref name="GrAbon" />

的案例。

−

LAMSTAR在1996被提议（[https://www.google.com/patents/US5920852 US Patent|5920852 A]），然后从1997到2002被Graupe和Kordylewski深入开发。<ref>{{Cite journal|last=Graupe|first=D.|last2=Kordylewski|first2=H.|date=August 1996|title=Network based on SOM (Self-Organizing-Map) modules combined with statistical decision tools|url=http://ieeexplore.ieee.org/document/594203/|journal=Proceedings of the 39th Midwest Symposium on Circuits and Systems|volume=1|pages=471–474 vol.1~~|doi=10.1109/mwscas.1996.594203|isbn=0-7803-3636-4~~}}</ref><ref>{{Cite journal|last=Graupe|first=D.|last2=Kordylewski|first2=H.|date=1998-03-01|title=A Large Memory Storage and Retrieval Neural Network for Adaptive Retrieval and Diagnosis|url=http://www.worldscientific.com/doi/abs/10.1142/S0218194098000091|journal=International Journal of Software Engineering and Knowledge Engineering|volume=08|issue=1|pages=115–138~~|doi=10.1142/s0218194098000091~~|issn=0218-1940}}</ref><ref name="Kordylew">{{cite journal|last2=Graupe|first2=D|last3=Liu|first3=K.|year=2001|title=A novel large-memory neural network as an aid in medical diagnosis applications|url=|journal=IEEE Transactions on Information Technology in Biomedicine|volume=5|issue=3|pages=202–209~~|doi=10.1109/4233.945291~~|last1=Kordylewski|first1=H.}}</ref>一个更改的版本称为LAMSTAR2，被Schneider 和 Graupe在2008开发。<ref name="Schn">{{cite journal|last2=Graupe|year=2008|title=A modified LAMSTAR neural network and its applications|url=|journal=International journal of neural systems|volume=18|issue=4|pages=331–337~~|doi=10.1142/s0129065708001634~~|last1=Schneider|first1=N.C.}}</ref><ref name="book2013d">{{harvnb|Graupe|2013|p=217}}</ref>

+

LAMSTAR在1996被提议（[https://www.google.com/patents/US5920852 US Patent|5920852 A]），然后从1997到2002被Graupe和Kordylewski深入开发。<ref>{{Cite journal|last=Graupe|first=D.|last2=Kordylewski|first2=H.|date=August 1996|title=Network based on SOM (Self-Organizing-Map) modules combined with statistical decision tools|url=http://ieeexplore.ieee.org/document/594203/|journal=Proceedings of the 39th Midwest Symposium on Circuits and Systems|volume=1|pages=471–474 vol.1}}</ref><ref>{{Cite journal|last=Graupe|first=D.|last2=Kordylewski|first2=H.|date=1998-03-01|title=A Large Memory Storage and Retrieval Neural Network for Adaptive Retrieval and Diagnosis|url=http://www.worldscientific.com/doi/abs/10.1142/S0218194098000091|journal=International Journal of Software Engineering and Knowledge Engineering|volume=08|issue=1|pages=115–138|issn=0218-1940}}</ref><ref name="Kordylew">{{cite journal|last2=Graupe|first2=D|last3=Liu|first3=K.|year=2001|title=A novel large-memory neural network as an aid in medical diagnosis applications|url=|journal=IEEE Transactions on Information Technology in Biomedicine|volume=5|issue=3|pages=202–209|last1=Kordylewski|first1=H.}}</ref>一个更改的版本称为LAMSTAR2，被Schneider 和 Graupe在2008开发。<ref name="Schn">{{cite journal|last2=Graupe|year=2008|title=A modified LAMSTAR neural network and its applications|url=|journal=International journal of neural systems|volume=18|issue=4|pages=331–337|last1=Schneider|first1=N.C.}}</ref><ref name="book2013d">{{harvnb|Graupe|2013|p=217}}</ref>

=== 叠加（去噪）自动编码器（Stacked (de-noising) auto-encoders） ===

第242行：第242行：

===深度叠加网络（ Deep stacking networks ）===

−

深度叠加网络 (DSN)<ref name="ref17">{{cite journal|last2=Yu|first2=Dong|last3=Platt|first3=John|date=2012|title=Scalable stacking and learning for building deep architectures|url=http://research-srv.microsoft.com/pubs/157586/DSN-ICASSP2012.pdf|journal=2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)|pages=2133–2136|last1=Deng|first1=Li}}</ref> （深度凸网络）是基于多块的简化神经网络模块的层级。在2011被Deng和Dong引入。<ref name="ref16">{{cite journal|last2=Yu|first2=Dong|date=2011|title=Deep Convex Net: A Scalable Architecture for Speech Pattern Classification|url=http://www.truebluenegotiations.com/files/deepconvexnetwork-interspeech2011-pub.pdf|journal=Proceedings of the Interspeech|pages=2285–2288|last1=Deng|first1=Li}}</ref> 它用带[https://en.wikipedia.org/wiki/Closed-form_expression 闭型解]的[https://en.wikipedia.org/wiki/Convex_optimization_problem 凸优化]表达学习，强调机制与[https://en.wikipedia.org/wiki/Ensemble_learning 层叠泛化]的相似。<ref name="ref18">{{cite journal|date=1992|title=Stacked generalization|journal=Neural Networks|volume=5|issue=2|pages=241–259~~|doi=10.1016/S0893-6080(05)80023-1~~|last1=David|first1=Wolpert}}</ref>每个DSN块是一个容易被[https://en.wikipedia.org/wiki/Supervised_learning 监督]式自我训练的简单模块，不需要整个块的反向传播。<ref>{{Cite journal|last=Bengio|first=Y.|date=2009-11-15|title=Learning Deep Architectures for AI|url=http://www.nowpublishers.com/article/Details/MAL-006|journal=Foundations and Trends® in Machine Learning|language=English|volume=2|issue=1|pages=1–127~~|doi=10.1561/2200000006~~|issn=1935-8237}}</ref>

+

深度叠加网络 (DSN)<ref name="ref17">{{cite journal|last2=Yu|first2=Dong|last3=Platt|first3=John|date=2012|title=Scalable stacking and learning for building deep architectures|url=http://research-srv.microsoft.com/pubs/157586/DSN-ICASSP2012.pdf|journal=2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)|pages=2133–2136|last1=Deng|first1=Li}}</ref> （深度凸网络）是基于多块的简化神经网络模块的层级。在2011被Deng和Dong引入。<ref name="ref16">{{cite journal|last2=Yu|first2=Dong|date=2011|title=Deep Convex Net: A Scalable Architecture for Speech Pattern Classification|url=http://www.truebluenegotiations.com/files/deepconvexnetwork-interspeech2011-pub.pdf|journal=Proceedings of the Interspeech|pages=2285–2288|last1=Deng|first1=Li}}</ref> 它用带[https://en.wikipedia.org/wiki/Closed-form_expression 闭型解]的[https://en.wikipedia.org/wiki/Convex_optimization_problem 凸优化]表达学习，强调机制与[https://en.wikipedia.org/wiki/Ensemble_learning 层叠泛化]的相似。<ref name="ref18">{{cite journal|date=1992|title=Stacked generalization|journal=Neural Networks|volume=5|issue=2|pages=241–259|last1=David|first1=Wolpert}}</ref>每个DSN块是一个容易被[https://en.wikipedia.org/wiki/Supervised_learning 监督]式自我训练的简单模块，不需要整个块的反向传播。<ref>{{Cite journal|last=Bengio|first=Y.|date=2009-11-15|title=Learning Deep Architectures for AI|url=http://www.nowpublishers.com/article/Details/MAL-006|journal=Foundations and Trends® in Machine Learning|language=English|volume=2|issue=1|pages=1–127|issn=1935-8237}}</ref>

每块由一个简化的带单隐层的[https://en.wikipedia.org/wiki/Multi-layer_perceptron 多层感知机]（MLP）组成。隐藏层 '''''h''''' 有逻辑[https://en.wikipedia.org/wiki/Sigmoid_function 双弯曲的][https://en.wikipedia.org/wiki/Logistic_function 单元]，输出层有线性单元。这些层之间的连接用权重矩阵'''''U;'''''表示，输入到隐藏层连接有权重矩阵 '''''W'''''。目标向量'''''t''''' 形成矩阵 '''''T'''''的列, 输入数据向量 '''''x'''''形成矩阵 '''''X.''''' 的列。隐藏单元的矩阵是<math>\boldsymbol{H} = \sigma(\boldsymbol{W}^T\boldsymbol{X})</math>. 。模块按顺序训练，因此底层的权重 '''''W''''' 在每一阶段已知。函数执行对应元素的[https://en.wikipedia.org/wiki/Logistic_function 逻辑双弯曲]操作。每块估计同一个最终标记类 ''y''，这个估计被原始输入'''''X''''' 串级起来，形成下一个块的扩展输入。因此第一块的输入只包含原始输入，而下游的块输入加上了前驱块的输出。然后学习上层权重矩阵 '''''U''''' ，给定网络中其他权重可以被表达为一个凸优化问题:

第250行：第250行：

=== 张量深度叠加网络（Tensor deep stacking networks） ===

−

这个结构是 DSN 的延伸.。它提供了两个重要的改善：使用来自[https://en.wikipedia.org/wiki/Covariance 协方差]统计的更高序的信息，并且将低层[https://en.wikipedia.org/wiki/Convex_optimization 非凸问题]转化为一个更高层的凸子问题。<ref name="ref19">{{cite journal|last2=Deng|first2=Li|last3=Yu|first3=Dong|date=2012|title=Tensor deep stacking networks|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=1–15|issue=8|pages=1944–1957~~|doi=10.1109/tpami.2012.268~~|last1=Hutchinson|first1=Brian}}</ref>TDSN在[https://en.wikipedia.org/wiki/Bilinear_map 双线性映射]中，通过一个第三序的[https://en.wikipedia.org/wiki/Tensor 张量]，从预测同一层的两个不同隐藏单元集合使用协方差统计。

+

这个结构是 DSN 的延伸.。它提供了两个重要的改善：使用来自[https://en.wikipedia.org/wiki/Covariance 协方差]统计的更高序的信息，并且将低层[https://en.wikipedia.org/wiki/Convex_optimization 非凸问题]转化为一个更高层的凸子问题。<ref name="ref19">{{cite journal|last2=Deng|first2=Li|last3=Yu|first3=Dong|date=2012|title=Tensor deep stacking networks|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=1–15|issue=8|pages=1944–1957|last1=Hutchinson|first1=Brian}}</ref>TDSN在[https://en.wikipedia.org/wiki/Bilinear_map 双线性映射]中，通过一个第三序的[https://en.wikipedia.org/wiki/Tensor 张量]，从预测同一层的两个不同隐藏单元集合使用协方差统计。

−

在传统DNN中，并行性和可扩展性不被认为是严重的。<ref name="ref26">{{cite journal|last2=Salakhutdinov|first2=Ruslan|date=2006|title=Reducing the Dimensionality of Data with Neural Networks|journal=Science|volume=313|issue=5786|pages=504–507~~|doi=10.1126/science.1127647|pmid=16873662~~|last1=Hinton|first1=Geoffrey~~|bibcode=2006Sci...313..504H~~}}</ref><ref name="ref27">{{cite journal|last2=Yu|first2=D.|last3=Deng|first3=L.|last4=Acero|first4=A.|date=2012|title=Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition|journal=IEEE Transactions on Audio, Speech, and Language Processing|volume=20|issue=1|pages=30–42~~|doi=10.1109/tasl.2011.2134090~~|last1=Dahl|first1=G.}}</ref><ref name="ref28">{{cite journal|last2=Dahl|first2=George|last3=Hinton|first3=Geoffrey|date=2012|title=Acoustic Modeling Using Deep Belief Networks|journal=IEEE Transactions on Audio, Speech, and Language Processing|volume=20|issue=1|pages=14–22~~|doi=10.1109/tasl.2011.2109382~~|last1=Mohamed|first1=Abdel-rahman}}</ref>DSN和TDSN中所有的学习使用批处理模式, 允许并行化。<ref name="ref16" /><ref name="ref17" />并行化允许放大这种设计到更大（更深）的结构和数据集。

+

在传统DNN中，并行性和可扩展性不被认为是严重的。<ref name="ref26">{{cite journal|last2=Salakhutdinov|first2=Ruslan|date=2006|title=Reducing the Dimensionality of Data with Neural Networks|journal=Science|volume=313|issue=5786|pages=504–507|last1=Hinton|first1=Geoffrey}}</ref><ref name="ref27">{{cite journal|last2=Yu|first2=D.|last3=Deng|first3=L.|last4=Acero|first4=A.|date=2012|title=Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition|journal=IEEE Transactions on Audio, Speech, and Language Processing|volume=20|issue=1|pages=30–42|last1=Dahl|first1=G.}}</ref><ref name="ref28">{{cite journal|last2=Dahl|first2=George|last3=Hinton|first3=Geoffrey|date=2012|title=Acoustic Modeling Using Deep Belief Networks|journal=IEEE Transactions on Audio, Speech, and Language Processing|volume=20|issue=1|pages=14–22|last1=Mohamed|first1=Abdel-rahman}}</ref>DSN和TDSN中所有的学习使用批处理模式, 允许并行化。<ref name="ref16" /><ref name="ref17" />并行化允许放大这种设计到更大（更深）的结构和数据集。

基本结构适用于多种任务如[https://en.wikipedia.org/wiki/Statistical_classification 分类]和[https://en.wikipedia.org/wiki/Regression_analysis 回归]。

=== 钉板受限玻尔兹曼机（Spike-and-slab RBMs） ===

深度学习有带[https://en.wikipedia.org/wiki/Real_number 实值]输入的需要，如在高斯受限玻尔兹曼机中一样，引出了“钉板”[https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine 受限玻尔兹曼机]，它模拟带严格[https://en.wikipedia.org/wiki/Binary_variable 二进制][https://en.wikipedia.org/wiki/Latent_variable 潜变量]的连续值输入。<ref name="ref30">{{cite journal|last2=Bergstra|first2=James|last3=Bengio|first3=Yoshua|date=2011|title=A Spike and Slab Restricted Boltzmann Machine|url=http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CourvilleBB11.pdf|journal=JMLR: Workshop and Conference Proceeding|volume=15|pages=233–241|last1=Courville|first1=Aaron}}</ref>与基本RBM和它的变体一样，钉板RBM是[https://en.wikipedia.org/wiki/Bipartite_graph 二分图]，好像GRBM一样，可见单元（输入）是实值的。

−

区别在隐藏层，每个隐藏单元有二进制的发放值【？】和实值的平滑值【？】。spike是一个离散的在零处的[https://en.wikipedia.org/wiki/Probability_mass 概率质量]，slab是一个连续域上的[https://en.wikipedia.org/wiki/Probability_density 概率密度]<ref name="ref32">{{cite conference|last1=Courville|first1=Aaron|last2=Bergstra|first2=James|last3=Bengio|first3=Yoshua|chapter=Unsupervised Models of Images by Spike-and-Slab RBMs|title=Proceedings of the 28th International Conference on Machine Learning|volume=10|pages=1–8|date=2011|url=http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Courville_591.pdf}}</ref>，它们的混合形成了[https://en.wikipedia.org/wiki/Prior_probability 先验]。<ref name="ref31">{{cite journal|last2=Beauchamp|first2=J|date=1988|title=Bayesian Variable Selection in Linear Regression|journal=Journal of the American Statistical Association|volume=83|issue=404|pages=1023–1032~~|doi=10.1080/01621459.1988.10478694~~|last1=Mitchell|first1=T}}</ref>

+

区别在隐藏层，每个隐藏单元有二进制的发放值【？】和实值的平滑值【？】。spike是一个离散的在零处的[https://en.wikipedia.org/wiki/Probability_mass 概率质量]，slab是一个连续域上的[https://en.wikipedia.org/wiki/Probability_density 概率密度]<ref name="ref32">{{cite conference|last1=Courville|first1=Aaron|last2=Bergstra|first2=James|last3=Bengio|first3=Yoshua|chapter=Unsupervised Models of Images by Spike-and-Slab RBMs|title=Proceedings of the 28th International Conference on Machine Learning|volume=10|pages=1–8|date=2011|url=http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Courville_591.pdf}}</ref>，它们的混合形成了[https://en.wikipedia.org/wiki/Prior_probability 先验]。<ref name="ref31">{{cite journal|last2=Beauchamp|first2=J|date=1988|title=Bayesian Variable Selection in Linear Regression|journal=Journal of the American Statistical Association|volume=83|issue=404|pages=1023–1032|last1=Mitchell|first1=T}}</ref>

ssRBM的一个扩展是µ-ssRBM，使用[https://en.wikipedia.org/wiki/Energy_function 能量函数]中的附加项提供了额外的建模能力。这些项之一使模型形成了spike值的[https://en.wikipedia.org/wiki/Conditional_probability_distribution 条件分布]，通过给定一个观测值[https://en.wikipedia.org/wiki/Marginalizing_out 边际化出]slab值。

=== 混合层级深度模型（Compound hierarchical-deep models） ===

−

混合层级深度模型构成了带非参数[https://en.wikipedia.org/wiki/Bayesian_network 贝叶斯模型]的深度网络。[https://en.wikipedia.org/wiki/Feature_(machine_learning) 特征]可以使用像DBN<ref name="hinton2006" />，DBM<ref name="ref3">{{cite journal|last1=Hinton|first1=Geoffrey|last2=Salakhutdinov|first2=Ruslan|date=2009|title=Efficient Learning of Deep Boltzmann Machines|url=http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS09_SalakhutdinovH.pdf|volume=3|pages=448–455}}</ref>,深度自动编码器<ref name="ref15">{{cite journal|last2=Bengio|first2=Yoshua|last3=Louradour|first3=Jerdme|last4=Lamblin|first4=Pascal|date=2009|title=Exploring Strategies for Training Deep Neural Networks|url=http://dl.acm.org/citation.cfm?id=1577070|journal=The Journal of Machine Learning Research|volume=10|pages=1–40|last1=Larochelle|first1=Hugo}}</ref>，卷积变体<ref name="ref39">{{cite journal|last2=Carpenter|first2=Blake|date=2011|title=Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning|url=http://www.iapr-tc11.org/archive/icdar2011/fileup/PDF/4520a440.pdf|journal=|volume=|pages=440–445|via=|last1=Coates|first1=Adam}}</ref><ref name="ref40">{{cite journal|last2=Grosse|first2=Roger|date=2009|title=Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations|url=http://portal.acm.org/citation.cfm?~~doid=1553374.1553453|~~journal=Proceedings of the 26th Annual International Conference on Machine Learning|pages=1–8|last1=Lee|first1=Honglak}}</ref>，ssRAM，<ref name="ref32" />深度编码网络，<ref name="ref41">{{cite journal|last2=Zhang|first2=Tong|date=2010|title=Deep Coding Network|url=http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2010_1077.pdf|journal=Advances in Neural . . .|pages=1–9|last1=Lin|first1=Yuanqing}}</ref>带稀疏特征学习的DBN，<ref name="ref42">{{cite journal|last2=Boureau|first2=Y-Lan|date=2007|title=Sparse Feature Learning for Deep Belief Networks|url=http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2007_1118.pdf|journal=Advances in Neural Information Processing Systems|volume=23|pages=1–8|last1=Ranzato|first1=Marc Aurelio}}</ref>RNN,<ref name="ref43">{{cite journal|last2=Lin|first2=Clif|date=2011|title=Parsing Natural Scenes and Natural Language with Recursive Neural Networks|url=http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Socher_125.pdf|journal=Proceedings of the 26th International Conference on Machine Learning|last1=Socher|first1=Richard}}</ref>条件DBN，<ref name="ref44">{{cite journal|last2=Hinton|first2=Geoffrey|date=2006|title=Modeling Human Motion Using Binary Latent Variables|url=http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_693.pdf|journal=Advances in Neural Information Processing Systems|last1=Taylor|first1=Graham}}</ref>去噪自动编码器的深度结构学习<ref name="ref45">{{cite journal|last2=Larochelle|first2=Hugo|date=2008|title=Extracting and composing robust features with denoising autoencoders|url=http://portal.acm.org/citation.cfm?~~doid=1390156.1390294|~~journal=Proceedings of the 25th international conference on Machine learning – ICML '08|pages=1096–1103|last1=Vincent|first1=Pascal}}</ref>。这提供了更好的表示，允许更快的学习和高维数据下更精确的分类。然而，这些结果在学习带少示例的异常类时表现很差，因为所有的网络单元都参与表示输入（分布式表征）并且必须一起被调整（高[https://en.wikipedia.org/wiki/Degree_of_freedom 自由度]）。限制自由度减少了要学习的参数数量，使从新的例子中的新的类学习更容易。[https://en.wikipedia.org/wiki/Hierarchical_Bayesian_model 层次贝叶斯模型]允许从少量示例中学习，例如<ref name="ref34">{{cite journal|last2=Perfors|first2=Amy|last3=Tenenbaum|first3=Joshua|date=2007|title=Learning overhypotheses with hierarchical Bayesian models|journal=Developmental Science|volume=10|issue=3|pages=307–21~~|doi=10.1111/j.1467-7687.2007.00585.x|pmid=17444972~~|last1=Kemp|first1=Charles}}</ref><ref name="ref37">{{cite journal|last2=Tenenbaum|first2=Joshua|date=2007|title=Word learning as Bayesian inference|journal=Psychol. Rev.|volume=114|issue=2|pages=245–72~~|doi=10.1037/0033-295X.114.2.245|pmid=17500627~~|last1=Xu|first1=Fei}}</ref><ref name="ref46">{{cite journal|last2=Polatkan|first2=Gungor|date=2011|title=The Hierarchical Beta Process for Convolutional Factor Analysis and Deep Learning|url=http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Chen_251.pdf|journal=Machine Learning . . .|last1=Chen|first1=Bo}}</ref><ref name="ref47">{{cite journal|last2=Fergus|first2=Rob|date=2006|title=One-shot learning of object categories|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=28|issue=4|pages=594–611~~|doi=10.1109/TPAMI.2006.79|pmid=16566508~~|last1=Fei-Fei|first1=Li}}</ref><ref name="ref48">{{cite journal|last2=Dunson|first2=David|date=2008|title=The Nested Dirichlet Process|url=http://amstat.tandfonline.com/doi/full/10.1198/016214508000000553|journal=Journal of the American Statistical Association|volume=103|issue=483|pages=1131–1154~~|doi=10.1198/016214508000000553~~|last1=Rodriguez|first1=Abel}}</ref>计算机视觉，[https://en.wikipedia.org/wiki/Statistics 统计学] 和认知科学。

+

混合层级深度模型构成了带非参数[https://en.wikipedia.org/wiki/Bayesian_network 贝叶斯模型]的深度网络。[https://en.wikipedia.org/wiki/Feature_(machine_learning) 特征]可以使用像DBN<ref name="hinton2006" />，DBM<ref name="ref3">{{cite journal|last1=Hinton|first1=Geoffrey|last2=Salakhutdinov|first2=Ruslan|date=2009|title=Efficient Learning of Deep Boltzmann Machines|url=http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS09_SalakhutdinovH.pdf|volume=3|pages=448–455}}</ref>,深度自动编码器<ref name="ref15">{{cite journal|last2=Bengio|first2=Yoshua|last3=Louradour|first3=Jerdme|last4=Lamblin|first4=Pascal|date=2009|title=Exploring Strategies for Training Deep Neural Networks|url=http://dl.acm.org/citation.cfm?id=1577070|journal=The Journal of Machine Learning Research|volume=10|pages=1–40|last1=Larochelle|first1=Hugo}}</ref>，卷积变体<ref name="ref39">{{cite journal|last2=Carpenter|first2=Blake|date=2011|title=Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning|url=http://www.iapr-tc11.org/archive/icdar2011/fileup/PDF/4520a440.pdf|journal=|volume=|pages=440–445|via=|last1=Coates|first1=Adam}}</ref><ref name="ref40">{{cite journal|last2=Grosse|first2=Roger|date=2009|title=Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations|url=http://portal.acm.org/citation.cfm?journal=Proceedings of the 26th Annual International Conference on Machine Learning|pages=1–8|last1=Lee|first1=Honglak}}</ref>，ssRAM，<ref name="ref32" />深度编码网络，<ref name="ref41">{{cite journal|last2=Zhang|first2=Tong|date=2010|title=Deep Coding Network|url=http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2010_1077.pdf|journal=Advances in Neural . . .|pages=1–9|last1=Lin|first1=Yuanqing}}</ref>带稀疏特征学习的DBN，<ref name="ref42">{{cite journal|last2=Boureau|first2=Y-Lan|date=2007|title=Sparse Feature Learning for Deep Belief Networks|url=http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2007_1118.pdf|journal=Advances in Neural Information Processing Systems|volume=23|pages=1–8|last1=Ranzato|first1=Marc Aurelio}}</ref>RNN,<ref name="ref43">{{cite journal|last2=Lin|first2=Clif|date=2011|title=Parsing Natural Scenes and Natural Language with Recursive Neural Networks|url=http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Socher_125.pdf|journal=Proceedings of the 26th International Conference on Machine Learning|last1=Socher|first1=Richard}}</ref>条件DBN，<ref name="ref44">{{cite journal|last2=Hinton|first2=Geoffrey|date=2006|title=Modeling Human Motion Using Binary Latent Variables|url=http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_693.pdf|journal=Advances in Neural Information Processing Systems|last1=Taylor|first1=Graham}}</ref>去噪自动编码器的深度结构学习<ref name="ref45">{{cite journal|last2=Larochelle|first2=Hugo|date=2008|title=Extracting and composing robust features with denoising autoencoders|url=http://portal.acm.org/citation.cfm?journal=Proceedings of the 25th international conference on Machine learning – ICML '08|pages=1096–1103|last1=Vincent|first1=Pascal}}</ref>。这提供了更好的表示，允许更快的学习和高维数据下更精确的分类。然而，这些结果在学习带少示例的异常类时表现很差，因为所有的网络单元都参与表示输入（分布式表征）并且必须一起被调整（高[https://en.wikipedia.org/wiki/Degree_of_freedom 自由度]）。限制自由度减少了要学习的参数数量，使从新的例子中的新的类学习更容易。[https://en.wikipedia.org/wiki/Hierarchical_Bayesian_model 层次贝叶斯模型]允许从少量示例中学习，例如<ref name="ref34">{{cite journal|last2=Perfors|first2=Amy|last3=Tenenbaum|first3=Joshua|date=2007|title=Learning overhypotheses with hierarchical Bayesian models|journal=Developmental Science|volume=10|issue=3|pages=307–21|last1=Kemp|first1=Charles}}</ref><ref name="ref37">{{cite journal|last2=Tenenbaum|first2=Joshua|date=2007|title=Word learning as Bayesian inference|journal=Psychol. Rev.|volume=114|issue=2|pages=245–72|last1=Xu|first1=Fei}}</ref><ref name="ref46">{{cite journal|last2=Polatkan|first2=Gungor|date=2011|title=The Hierarchical Beta Process for Convolutional Factor Analysis and Deep Learning|url=http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Chen_251.pdf|journal=Machine Learning . . .|last1=Chen|first1=Bo}}</ref><ref name="ref47">{{cite journal|last2=Fergus|first2=Rob|date=2006|title=One-shot learning of object categories|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=28|issue=4|pages=594–611|last1=Fei-Fei|first1=Li}}</ref><ref name="ref48">{{cite journal|last2=Dunson|first2=David|date=2008|title=The Nested Dirichlet Process|url=http://amstat.tandfonline.com/doi/full/10.1198/016214508000000553|journal=Journal of the American Statistical Association|volume=103|issue=483|pages=1131–1154|last1=Rodriguez|first1=Abel}}</ref>计算机视觉，[https://en.wikipedia.org/wiki/Statistics 统计学] 和认知科学。

−

混合HD结构目的是整合HB和深度网络的特征。混合HDP-DBM结构是一种作为层级模型的[https://en.wikipedia.org/wiki/Hierarchical_Dirichlet_process 层级狄利克雷过程]与DBM结构合并。这是全[https://en.wikipedia.org/wiki/Generative_model 生成模型]，从流经模型层的抽象概念中生成，它可以分析在异常类中看起来“合理的”自然的新例子。所以的层级通过最大化一个共同[https://en.wikipedia.org/wiki/Log_probability 对数概率][https://en.wikipedia.org/wiki/Score_(statistics) 分数]被共同学习。<ref name="ref38">{{cite journal|last2=Joshua|first2=Tenenbaum|date=2012|title=Learning with Hierarchical-Deep Models|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=35|issue=8|pages=1958–71~~|doi=10.1109/TPAMI.2012.269|pmid=23787346~~|last1=Ruslan|first1=Salakhutdinov}}</ref>

+

混合HD结构目的是整合HB和深度网络的特征。混合HDP-DBM结构是一种作为层级模型的[https://en.wikipedia.org/wiki/Hierarchical_Dirichlet_process 层级狄利克雷过程]与DBM结构合并。这是全[https://en.wikipedia.org/wiki/Generative_model 生成模型]，从流经模型层的抽象概念中生成，它可以分析在异常类中看起来“合理的”自然的新例子。所以的层级通过最大化一个共同[https://en.wikipedia.org/wiki/Log_probability 对数概率][https://en.wikipedia.org/wiki/Score_(statistics) 分数]被共同学习。<ref name="ref38">{{cite journal|last2=Joshua|first2=Tenenbaum|date=2012|title=Learning with Hierarchical-Deep Models|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=35|issue=8|pages=1958–71|last1=Ruslan|first1=Salakhutdinov}}</ref>

在有三层隐藏层的DBM中，可见输入'''<math>{\boldsymbol{\nu}}</math>'''的概率是：

第275行：第275行：

深度预测编码网络 (DPCN)是一个[https://en.wikipedia.org/wiki/Predictive_modelling 预测]编码体系，它使用自顶向下信息，经验为主地调整自底向上[https://en.wikipedia.org/wiki/Inference 推理]过程需要的先验，通过一个深度局部连接的[https://en.wikipedia.org/wiki/Generative_model 生成模型] 。这通过使用线性动态模型，从不同时间的观测值提取稀疏[https://en.wikipedia.org/wiki/Feature_(machine_learning) 特征]工作。然后一个池化策略被用于学习不变的特征表示。这些单元组成一种[https://en.wikipedia.org/wiki/Greedy_algorithm 贪心]按层间[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习]训练的深度结构。这些层构成一种[https://en.wikipedia.org/wiki/Markov_chain 马尔科夫链]因而任何层的状态只依赖前面和后面的层。

−

DPCN通过使用自顶向下方法用顶层的信息和过去状态的空间依赖预测层的表征。<ref name="ref56">{{cite ~~arXiv~~|~~eprint~~=1301.3541|first2=Jose|last2=Principe|title=Deep Predictive Coding Networks|date=2013|last1=Chalasani|first1=Rakesh~~|class=cs.LG~~}}</ref>

+

DPCN通过使用自顶向下方法用顶层的信息和过去状态的空间依赖预测层的表征。<ref name="ref56">{{cite journal|url=https://arxiv.org/abs/1301.3541|first2=Jose|last2=Principe|title=Deep Predictive Coding Networks|date=2013|last1=Chalasani|first1=Rakesh}}</ref>

DPCN可以被扩展形成一个[https://en.wikipedia.org/wiki/Convolutional_neural_network 卷积网络]。<ref name="ref56" />

第286行：第286行：

除了[https://en.wikipedia.org/wiki/Long_short-term_memory 长短期记忆](LSTM), 其他方法也在循环函数中加入可微记忆，例如：

* 交替记忆网络的可微的推和弹动作，称为神经叠加机器<ref name="S. Das, C.L. Giles p. 79">S. Das, C.L. Giles, G.Z. Sun, "Learning Context Free Grammars: Limitations of a Recurrent Neural Network with an External Stack Memory," Proc. 14th Annual Conf. of the Cog. Sci. Soc., p. 79, 1992.</ref><ref name="Mozer, M. C. 1993 pp. 863-870">{{Cite web|url=https://papers.nips.cc/paper/626-a-connectionist-symbol-manipulator-that-discovers-the-structure-of-context-free-languages|title=A connectionist symbol manipulator that discovers the structure of context-free languages|last=Mozer|first=M. C.|last2=Das|first2=S.|date=1993|website=|publisher=NIPS 5|pages=863–870|archive-url=|archive-date=|dead-url=|access-date=}}</ref>

−

* 控制网络的外部可微存储在其他网络的快速幂中的记忆网络。<ref name="ReferenceC">{{cite journal|year=1992|title=Learning to control fast-weight memories: An alternative to recurrent nets|url=|journal=Neural Computation|volume=4|issue=1|pages=131–139~~|doi=10.1162/neco.1992.4.1.131~~|last1=Schmidhuber|first1=J.}}</ref>

+

* 控制网络的外部可微存储在其他网络的快速幂中的记忆网络。<ref name="ReferenceC">{{cite journal|year=1992|title=Learning to control fast-weight memories: An alternative to recurrent nets|url=|journal=Neural Computation|volume=4|issue=1|pages=131–139|last1=Schmidhuber|first1=J.}}</ref>

* LSTM遗忘门<ref name="F. Gers, N. Schraudolph 2002">{{cite journal|last2=Schraudolph|first2=N.|last3=Schmidhuber|first3=J.|date=|year=2002|title=Learning precise timing with LSTM recurrent networks|url=http://jmlr.org/papers/volume3/gers02a/gers02a.pdf|journal=JMLR|volume=3|issue=|pages=115–143|via=|last1=Gers|first1=F.}}</ref>

−

* 带用于寻址和在可微样式（内部存储）快速操作RNN自身权重的特殊输出单元的自我参照的RNN。<ref name="J. Schmidhuber pages 191-195">{{Cite conference|author=[[Jürgen Schmidhuber]]|title=An introspective network that can learn to run its own weight change algorithm|booktitle=In Proc. of the Intl. Conf. on Artificial Neural Networks, Brighton|pages=191–195|publisher=IEE|year=1993|url=ftp://ftp.idsia.ch/pub/juergen/iee93self.ps.gz}}</ref><ref name="Hochreiter, Sepp 2001">{{cite journal|last2=Younger|first2=A. Steven|last3=Conwell|first3=Peter R.|date=|year=2001|title=Learning to Learn Using Gradient Descent|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.323|journal=ICANN|volume=2130|issue=|pages=87–94~~|doi=~~|via=|last1=Hochreiter|first1=Sepp}}</ref>

+

* 带用于寻址和在可微样式（内部存储）快速操作RNN自身权重的特殊输出单元的自我参照的RNN。<ref name="J. Schmidhuber pages 191-195">{{Cite conference|author=[https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber Jürgen Schmidhuber]|title=An introspective network that can learn to run its own weight change algorithm|booktitle=In Proc. of the Intl. Conf. on Artificial Neural Networks, Brighton|pages=191–195|publisher=IEE|year=1993|url=ftp://ftp.idsia.ch/pub/juergen/iee93self.ps.gz}}</ref><ref name="Hochreiter, Sepp 2001">{{cite journal|last2=Younger|first2=A. Steven|last3=Conwell|first3=Peter R.|date=|year=2001|title=Learning to Learn Using Gradient Descent|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.323|journal=ICANN|volume=2130|issue=|pages=87–94|via=|last1=Hochreiter|first1=Sepp}}</ref>

−

* 学习带无界记忆的转换。<ref name="Grefenstette, Edward 1506">Grefenstette, Edward, et al. [https://arxiv.org/pdf/1506.02516.pdf "Learning to Transduce with Unbounded Memory."]~~{{arxiv|1506.02516}}~~ (2015).</ref>

+

* 学习带无界记忆的转换。<ref name="Grefenstette, Edward 1506">Grefenstette, Edward, et al. [https://arxiv.org/pdf/1506.02516.pdf "Learning to Transduce with Unbounded Memory."](2015).</ref>

===== 神经图灵机（Neural Turing machines） =====

−

神经图灵机<ref name="Graves, Alex 14102">Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural Turing Machines." ~~{{arxiv|1410.5401}}~~ (2014).</ref>将LSTM网络与外部记忆资源结合，这样他们可以通过注意过程相互影响。这种组合系统和[https://en.wikipedia.org/wiki/Turing_machine 图灵机]相似但是端到端可微，允许使用[https://en.wikipedia.org/wiki/Gradient_descent 梯度下降]有效训练。初步结果表明神经图灵机可以推断简单算法，如复制，排序和从输入输出例子的联想回忆。

+

神经图灵机<ref name="Graves, Alex 14102">Graves, Alex, Greg Wayne, and Ivo Danihelka. [https://arxiv.org/abs/1410.5401 "Neural Turing Machines."] (2014).</ref>将LSTM网络与外部记忆资源结合，这样他们可以通过注意过程相互影响。这种组合系统和[https://en.wikipedia.org/wiki/Turing_machine 图灵机]相似但是端到端可微，允许使用[https://en.wikipedia.org/wiki/Gradient_descent 梯度下降]有效训练。初步结果表明神经图灵机可以推断简单算法，如复制，排序和从输入输出例子的联想回忆。

−

[https://en.wikipedia.org/wiki/Differentiable_neural_computer 可微神经计算机]（DNC）是一个NTM的延伸。他们在序列处理任务中表现超过神经图灵机，长短期记忆系统和记忆网络。<ref name=":02">{{Cite ~~news~~|url=https://www.wired.co.uk/article/deepmind-ai-tube-london-underground|title=DeepMind's AI learned to ride the London Underground using human-like reason and memory|last=Burgess|first=Matt|~~newspaper~~=WIRED UK|language=en-GB|access-date=2016-10-19}}</ref><ref>{{Cite ~~news~~|url=https://www.pcmag.com/news/348701/deepmind-ai-learns-to-navigate-london-tube|title=DeepMind AI 'Learns' to Navigate London Tube|~~newspaper~~=PCMAG|access-date=2016-10-19}}</ref><ref>{{Cite web|url=https://techcrunch.com/2016/10/13/__trashed-2/|title=DeepMind's differentiable neural computer helps you navigate the subway with its memory|last=Mannes|first=John|website=TechCrunch|access-date=2016-10-19}}</ref><ref>{{Cite journal|last=Graves|first=Alex|last2=Wayne|first2=Greg|last3=Reynolds|first3=Malcolm|last4=Harley|first4=Tim|last5=Danihelka|first5=Ivo|last6=Grabska-Barwińska|first6=Agnieszka|last7=Colmenarejo|first7=Sergio Gómez|last8=Grefenstette|first8=Edward|last9=Ramalho|first9=Tiago|date=2016-10-12|title=Hybrid computing using a neural network with dynamic external memory|url=http://www.nature.com/nature/journal/vaop/ncurrent/full/nature20101.html|journal=Nature|language=en|volume=538|issue=7626~~|doi=10.1038/nature20101~~|issn=1476-4687|pages=471–476~~|pmid=27732574|bibcode=2016Natur.538..471G~~}}</ref><ref>{{Cite web|url=https://deepmind.com/blog/differentiable-neural-computers/|title=Differentiable neural computers {{!}} DeepMind|website=DeepMind|access-date=2016-10-19}}</ref>

+

[https://en.wikipedia.org/wiki/Differentiable_neural_computer 可微神经计算机]（DNC）是一个NTM的延伸。他们在序列处理任务中表现超过神经图灵机，长短期记忆系统和记忆网络。<ref name=":02">{{Cite journal|url=https://www.wired.co.uk/article/deepmind-ai-tube-london-underground|title=DeepMind's AI learned to ride the London Underground using human-like reason and memory|last=Burgess|first=Matt|journal=WIRED UK|language=en-GB|access-date=2016-10-19}}</ref><ref>{{Cite journal|url=https://www.pcmag.com/news/348701/deepmind-ai-learns-to-navigate-london-tube|title=DeepMind AI 'Learns' to Navigate London Tube|journal=PCMAG|access-date=2016-10-19}}</ref><ref>{{Cite web|url=https://techcrunch.com/2016/10/13/__trashed-2/|title=DeepMind's differentiable neural computer helps you navigate the subway with its memory|last=Mannes|first=John|website=TechCrunch|access-date=2016-10-19}}</ref><ref>{{Cite journal|last=Graves|first=Alex|last2=Wayne|first2=Greg|last3=Reynolds|first3=Malcolm|last4=Harley|first4=Tim|last5=Danihelka|first5=Ivo|last6=Grabska-Barwińska|first6=Agnieszka|last7=Colmenarejo|first7=Sergio Gómez|last8=Grefenstette|first8=Edward|last9=Ramalho|first9=Tiago|date=2016-10-12|title=Hybrid computing using a neural network with dynamic external memory|url=http://www.nature.com/nature/journal/vaop/ncurrent/full/nature20101.html|journal=Nature|language=en|volume=538|issue=7626|issn=1476-4687|pages=471–476}}</ref><ref>{{Cite web|url=https://deepmind.com/blog/differentiable-neural-computers/|title=Differentiable neural computers {{!}} DeepMind|website=DeepMind|access-date=2016-10-19}}</ref>

==== 语义哈希（Semantic hashing ）====

−

直接代表过去经验，[https://en.wikipedia.org/wiki/Instance-based_learning 使用相同经验形成局部模型]的方法通常称为[https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm 最近邻]或[https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm k最近邻]方法。<ref>{{cite journal|last2=Schaal|first2=Stefan|year=1995|title=Memory-based neural networks for robot learning|url=|journal=Neurocomputing|volume=9|issue=3|pages=243–269~~|doi=10.1016/0925-2312(95)00033-6~~|last1=Atkeson|first1=Christopher G.}}</ref>深度学习在语义哈希<ref>Salakhutdinov, Ruslan, and Geoffrey Hinton. [http://www.utstat.toronto.edu/~rsalakhu/papers/sdarticle.pdf "Semantic hashing."] International Journal of Approximate Reasoning 50.7 (2009): 969–978.</ref>中十分有用，其中一个深度[https://en.wikipedia.org/wiki/Graphical_model 图模型]建模由一个大的文档集中获取的字数向量。<ref name="Le 2014">{{Cite ~~arXiv~~|~~eprint~~=1405.4053|first=Quoc V.|last=Le|first2=Tomas|last2=Mikolov|title=Distributed representations of sentences and documents|year=2014~~|class=cs.CL~~}}</ref> 文档映射到内存地址，这样语义相似的文档位于临近的地址。与查询文档相似的文档可以通过访问所有仅来自查询文档地址的几位不同的地址找到。不像在1000位地址上操作的[https://en.wikipedia.org/wiki/Sparse_distributed_memory 稀疏分布记忆]，语义哈希在常见计算机结构的32或64位地址上工作。

+

直接代表过去经验，[https://en.wikipedia.org/wiki/Instance-based_learning 使用相同经验形成局部模型]的方法通常称为[https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm 最近邻]或[https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm k最近邻]方法。<ref>{{cite journal|last2=Schaal|first2=Stefan|year=1995|title=Memory-based neural networks for robot learning|url=|journal=Neurocomputing|volume=9|issue=3|pages=243–269|last1=Atkeson|first1=Christopher G.}}</ref>深度学习在语义哈希<ref>Salakhutdinov, Ruslan, and Geoffrey Hinton. [http://www.utstat.toronto.edu/~rsalakhu/papers/sdarticle.pdf "Semantic hashing."] International Journal of Approximate Reasoning 50.7 (2009): 969–978.</ref>中十分有用，其中一个深度[https://en.wikipedia.org/wiki/Graphical_model 图模型]建模由一个大的文档集中获取的字数向量。<ref name="Le 2014">{{Cite journal|url=https://arxiv.org/abs/1405.4053|first=Quoc V.|last=Le|first2=Tomas|last2=Mikolov|title=Distributed representations of sentences and documents|year=2014}}</ref> 文档映射到内存地址，这样语义相似的文档位于临近的地址。与查询文档相似的文档可以通过访问所有仅来自查询文档地址的几位不同的地址找到。不像在1000位地址上操作的[https://en.wikipedia.org/wiki/Sparse_distributed_memory 稀疏分布记忆]，语义哈希在常见计算机结构的32或64位地址上工作。

==== 记忆网络（Memory networks） ====

−

记忆网络<ref name="Weston, Jason 14102">Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." ~~{{arxiv|1410.3916}}~~ (2014).</ref><ref>Sukhbaatar, Sainbayar, et al. "End-To-End Memory Networks." ~~{{arxiv|1503.08895}}~~ (2015).</ref>是神经网络结合[https://en.wikipedia.org/wiki/Long-term_memory 长期记忆]的另一个扩展。长期记忆可以可以被读写，目的是用来预测。这些模型用于[https://en.wikipedia.org/wiki/Question_answering 问题回答]，其中长期记忆有效地作为（动态）知识基础，输出是文本回应。<ref>Bordes, Antoine, et al. "Large-scale Simple Question Answering with Memory Networks." ~~{{arxiv|1506.02075}}~~ (2015).</ref>一个来自UCLA萨穆埃利工程学院的电子和计算机工程师团队做出了一种物理人工神经网络。它可以在实际光速下分析大量数据并识别物体。<ref>{{Cite ~~news~~|url=https://www.sciencedaily.com/releases/2018/08/180802130750.htm|title=AI device identifies objects at the speed of light: The 3D-printed artificial neural network can be used in medicine, robotics and security|~~work~~=ScienceDaily|access-date=2018-08-08|language=en}}</ref>

+

记忆网络<ref name="Weston, Jason 14102">Weston, Jason, Sumit Chopra, and Antoine Bordes. [https://arxiv.org/abs/1410.3916 "Memory networks."] (2014).</ref><ref>Sukhbaatar, Sainbayar, et al. [https://arxiv.org/abs/1503.08895 "End-To-End Memory Networks."](2015).</ref>是神经网络结合[https://en.wikipedia.org/wiki/Long-term_memory 长期记忆]的另一个扩展。长期记忆可以可以被读写，目的是用来预测。这些模型用于[https://en.wikipedia.org/wiki/Question_answering 问题回答]，其中长期记忆有效地作为（动态）知识基础，输出是文本回应。<ref>Bordes, Antoine, et al. [https://arxiv.org/abs/1506.02075 "Large-scale Simple Question Answering with Memory Networks."] (2015).</ref>一个来自UCLA萨穆埃利工程学院的电子和计算机工程师团队做出了一种物理人工神经网络。它可以在实际光速下分析大量数据并识别物体。<ref>{{Cite journal|url=https://www.sciencedaily.com/releases/2018/08/180802130750.htm|title=AI device identifies objects at the speed of light: The 3D-printed artificial neural network can be used in medicine, robotics and security|journal=ScienceDaily|access-date=2018-08-08|language=en}}</ref>

==== 指针网络（Pointer networks） ====

−

深度神经网络可能通过在维持可训练性的同时，加深和减少参数改进。当训练十分深（例如一百万层）神经网络可能不可行，类[https://en.wikipedia.org/wiki/CPU CPU]结构如指针网络<ref>Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." ~~{{arxiv|1506.03134}}~~ (2015).</ref>和神经随机访问机器<ref>Kurach, Karol, Andrychowicz, Marcin and Sutskever, Ilya. "Neural Random-Access Machines." ~~{{arxiv|1511.06392}}~~ (2015).</ref>通过使用外部[https://en.wikipedia.org/wiki/Random-access_memory 随机访问内存]和其他属于[https://en.wikipedia.org/wiki/Computer_architecture 计算机组成]的组件，如[https://en.wikipedia.org/wiki/Processor_register 寄存器]，[https://en.wikipedia.org/wiki/Arithmetic_logic_unit ALU]和[https://en.wikipedia.org/wiki/Pointer_(computer_programming) 指针]解决了这个限制。这种系统在储存在记忆单元和寄存器中的[https://en.wikipedia.org/wiki/Probability_distribution 概率分布]向量上操作。这样，模型是全可微并且端到端训练的。这些模型的关键特点是它们的深度，它们短期记忆的大小和参数的数量可以独立切换——不像类似LSTM的模型，它们的参数数量随内存大小二次增长。

+

深度神经网络可能通过在维持可训练性的同时，加深和减少参数改进。当训练十分深（例如一百万层）神经网络可能不可行，类[https://en.wikipedia.org/wiki/CPU CPU]结构如指针网络<ref>Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly.

+

[https://arxiv.org/abs/1506.03134 "Pointer networks."](2015).</ref>和神经随机访问机器<ref>Kurach, Karol, Andrychowicz, Marcin and Sutskever, Ilya.[https://arxiv.org/abs/1511.06392 "Neural Random-Access Machines."](2015).</ref>通过使用外部[https://en.wikipedia.org/wiki/Random-access_memory 随机访问内存]和其他属于[https://en.wikipedia.org/wiki/Computer_architecture 计算机组成]的组件，如[https://en.wikipedia.org/wiki/Processor_register 寄存器]，[https://en.wikipedia.org/wiki/Arithmetic_logic_unit ALU]和[https://en.wikipedia.org/wiki/Pointer_(computer_programming) 指针]解决了这个限制。这种系统在储存在记忆单元和寄存器中的[https://en.wikipedia.org/wiki/Probability_distribution 概率分布]向量上操作。这样，模型是全可微并且端到端训练的。这些模型的关键特点是它们的深度，它们短期记忆的大小和参数的数量可以独立切换——不像类似LSTM的模型，它们的参数数量随内存大小二次增长。

==== 编码解码网络（Encoder–decoder networks ）====

−

编码解码框架是基于从高度[https://en.wikipedia.org/wiki/Structured_prediction 结构化]输入到高度结构化输出的映射的神经网络。这种方法在[https://en.wikipedia.org/wiki/Machine_translation 机器翻译]<ref>{{Cite web|url=http://www.aclweb.org/anthology/D13-1176|title=Recurrent continuous translation models|last=Kalchbrenner|first=N.|last2=Blunsom|first2=P.|date=2013|website=|publisher=EMNLP'2013|archive-url=|archive-date=|dead-url=|access-date=}}</ref><ref>{{Cite web|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|title=Sequence to sequence learning with neural networks|last=Sutskever|first=I.|last2=Vinyals|first2=O.|date=2014|website=|publisher=NIPS'2014|archive-url=|archive-date=|dead-url=|access-date=|last3=Le|first3=Q. V.}}</ref><ref>{{Cite journal|last=Cho|first=K.|last2=van Merrienboer|first2=B.|last3=Gulcehre|first3=C.|last4=Bougares|first4=F.|last5=Schwenk|first5=H.|last6=Bengio|first6=Y.|date=October 2014|title=Learning phrase representations using RNN encoder-decoder for statistical machine translation|journal=Proceedings of the Empiricial Methods in Natural Language Processing|volume=1406|pages=arXiv:1406.1078|via=|arxiv=1406.1078~~|bibcode=2014arXiv1406.1078C~~}}</ref>的背景下被提出，它的输入和输出是使用两种自然语言写成的句子。在这个工作中，LSTM RNN或CNN被用作编码机，来总结源语句，这个总结被条件RNN【语言模型】解码来产生翻译。<ref>Cho, Kyunghyun, Aaron Courville, and Yoshua Bengio. "Describing Multimedia Content using Attention-based Encoder–Decoder Networks." ~~{{arxiv|1507.01053}}~~ (2015).</ref> 这些系统共享建立的模块：门限RNN，CNN，和训练的注意机制。

+

编码解码框架是基于从高度[https://en.wikipedia.org/wiki/Structured_prediction 结构化]输入到高度结构化输出的映射的神经网络。这种方法在[https://en.wikipedia.org/wiki/Machine_translation 机器翻译]<ref>{{Cite web|url=http://www.aclweb.org/anthology/D13-1176|title=Recurrent continuous translation models|last=Kalchbrenner|first=N.|last2=Blunsom|first2=P.|date=2013|website=|publisher=EMNLP'2013|archive-url=|archive-date=|dead-url=|access-date=}}</ref><ref>{{Cite web|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|title=Sequence to sequence learning with neural networks|last=Sutskever|first=I.|last2=Vinyals|first2=O.|date=2014|website=|publisher=NIPS'2014|archive-url=|archive-date=|dead-url=|access-date=|last3=Le|first3=Q. V.}}</ref><ref>{{Cite journal|last=Cho|first=K.|last2=van Merrienboer|first2=B.|last3=Gulcehre|first3=C.|last4=Bougares|first4=F.|last5=Schwenk|first5=H.|last6=Bengio|first6=Y.|date=October 2014|title=Learning phrase representations using RNN encoder-decoder for statistical machine translation|journal=Proceedings of the Empiricial Methods in Natural Language Processing|volume=1406|pages=arXiv:1406.1078|via=|url=https://arxiv.org/abs/1406.1078}}</ref>的背景下被提出，它的输入和输出是使用两种自然语言写成的句子。在这个工作中，LSTM RNN或CNN被用作编码机，来总结源语句，这个总结被条件RNN【语言模型】解码来产生翻译。<ref>Cho, Kyunghyun, Aaron Courville, and Yoshua Bengio.[https://arxiv.org/abs/1507.01053 "Describing Multimedia Content using Attention-based Encoder–Decoder Networks." ](2015).</ref> 这些系统共享建立的模块：门限RNN，CNN，和训练的注意机制。

=== 多层核机器（Multilayer kernel machine） ===

−

多层核机器 (MKM) 是通过迭代应用弱非线性核学习高度非线性函数的方法。它们使用[https://en.wikipedia.org/wiki/Kernel_principal_component_analysis 核主成分分析] (KPCA)，<ref name="ref60">{{cite journal|last2=Smola|first2=Alexander|date=1998|title=Nonlinear component analysis as a kernel eigenvalue problem|journal=Neural computation|volume=(44)|issue=5|pages=1299–1319~~|doi=10.1162/089976698300017467~~|last1=Scholkopf|first1=B|citeseerx=10.1.1.53.8911}}</ref>作为一种无监督贪心的逐层预训练步深度学习方法。<ref name="ref59">{{cite journal|date=2012|title=Kernel Methods for Deep Learning|url=http://cseweb.ucsd.edu/~yoc002/paper/thesis_youngmincho.pdf|pages=1–9|last1=Cho|first1=Youngmin}}</ref>

+

多层核机器 (MKM) 是通过迭代应用弱非线性核学习高度非线性函数的方法。它们使用[https://en.wikipedia.org/wiki/Kernel_principal_component_analysis 核主成分分析] (KPCA)，<ref name="ref60">{{cite journal|last2=Smola|first2=Alexander|date=1998|title=Nonlinear component analysis as a kernel eigenvalue problem|journal=Neural computation|volume=(44)|issue=5|pages=1299–1319|last1=Scholkopf|first1=B|citeseerx=10.1.1.53.8911}}</ref>作为一种无监督贪心的逐层预训练步深度学习方法。<ref name="ref59">{{cite journal|date=2012|title=Kernel Methods for Deep Learning|url=http://cseweb.ucsd.edu/~yoc002/paper/thesis_youngmincho.pdf|pages=1–9|last1=Cho|first1=Youngmin}}</ref>

学到前面层 <math>{l}</math>的特征, 提取在核产生特征域的投影层 <math>{l}</math>的<math>n_l</math>[https://en.wikipedia.org/wiki/Principal_component_analysis 主成分](PC) 。为了寻找每层更新表征的[https://en.wikipedia.org/wiki/Dimensionality_reduction 降维]，[https://en.wikipedia.org/wiki/Supervised_learning 监督策略]从KPCA提取的特征中选择最佳有益特征。过程是:

匿名用户

http://c2.com/cgi/wiki?$1>Cynthia