更改

添加117,727字节、 2022年7月4日 (一) 10:50

小

Moved page from wikipedia:en:Recurrent neural network (history)

第1行：第1行： −

~~此词条暂由彩云小译翻译，翻译字数共4，未经人工整理和审校，带来阅读不便，请见谅。~~

+

此词条暂由彩云小译翻译，翻译字数共4009，未经人工整理和审校，带来阅读不便，请见谅。

−

~~#REDIRECT~~ [[~~Recurrent~~ neural network]] {{R from ~~plural~~}}

+

{{Short description|Computational model used in machine learning}}

+

A '''recurrent neural network''' ('''RNN''') is a class of [[artificial neural network]]s where connections between nodes form a [[directed graph|directed]] or [[Graph (discrete mathematics)|undirected graph]] along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from [[feedforward neural networks]], RNNs can use their internal state (memory) to process variable length sequences of inputs.<ref>{{Cite journal|last=Dupond|first=Samuel|date=2019|title= A thorough review on the current advance of neural network structures.|url=https://www.sciencedirect.com/journal/annual-reviews-in-control|journal=Annual Reviews in Control|volume=14|pages=200–230}}</ref><ref>{{Cite journal|date=2018-11-01|title=State-of-the-art in artificial neural network applications: A survey|journal=Heliyon|language=en|volume=4|issue=11|pages=e00938|doi=10.1016/j.heliyon.2018.e00938|issn=2405-8440|doi-access=free|last1=Abiodun|first1=Oludare Isaac|last2=Jantan|first2=Aman|last3=Omolara|first3=Abiodun Esther|last4=Dada|first4=Kemi Victoria|last5=Mohamed|first5=Nachaat Abdelatif|last6=Arshad|first6=Humaira|pmid=30519653|pmc=6260436}}</ref><ref>{{Cite journal|date=2018-12-01|title=Time series forecasting using artificial neural networks methodologies: A systematic review|journal=Future Computing and Informatics Journal|language=en|volume=3|issue=2|pages=334–340|doi=10.1016/j.fcij.2018.10.003|issn=2314-7288|doi-access=free|last1=Tealab|first1=Ahmed}}</ref> This makes them applicable to tasks such as unsegmented, connected [[handwriting recognition]]<ref>{{cite journal |last1=Graves |first1=Alex |author-link1=Alex Graves (computer scientist) |last2=Liwicki |first2=Marcus |last3=Fernandez |first3=Santiago |last4=Bertolami |first4=Roman |last5=Bunke |first5=Horst |last6=Schmidhuber |first6=Jürgen |author-link6=Jürgen Schmidhuber |title=A Novel Connectionist System for Improved Unconstrained Handwriting Recognition |url=http://www.idsia.ch/~juergen/tpami_2008.pdf |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=31 |issue=5 |pages=855–868 |year=2009 |doi=10.1109/tpami.2008.137 |pmid=19299860 |citeseerx=10.1.1.139.4502 |s2cid=14635907 }}</ref> or [[speech recognition]].<ref name="sak2014">{{Cite web |url=https://research.google.com/pubs/archive/43905.pdf |title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling |last1=Sak |first1=Haşim |last2=Senior |first2=Andrew |last3=Beaufays | first3=Françoise |year=2014 }}</ref><ref name="liwu2015">{{cite arXiv |last1=Li |first1=Xiangang |last2=Wu |first2=Xihong |date=2014-10-15 |title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition |eprint=1410.4281 |class=cs.CL }}</ref> Recurrent neural networks are theoretically [[Turing complete]] and can run arbitrary programs to process arbitrary sequences of inputs.<ref>{{cite journal|last1=Hyötyniemi|first1=Heikki|date=1996|title=Turing machines are recurrent neural networks|journal=Proceedings of STeP '96/Publications of the Finnish Artificial Intelligence Society|pages=13–24}}</ref>

−

~~#REDIRECT Recurrent neural network~~

−

~~# 重定向递归神经网络~~

+

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed or undirected graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs.

+

递归神经网络(RNN)是一类人工神经网络，其中节点之间的连接沿着时间序列形成有向或无向图。这使它能够表现出时间动态行为。由前馈神经网络衍生而来的神经网络可以利用其内部状态(记忆)来程序变数输入的长度序列。这使它们适用于未分段、连接的手写识别或语音识别等任务。递归神经网络理论上是图灵完全的，可以运行任意的程序来处理任意的输入序列。

+

The term "recurrent neural network" is used to refer to the class of networks with an [[infinite impulse response]], whereas "[[convolutional neural network]]" refers to the class of [[finite impulse response|finite impulse]] response. Both classes of networks exhibit temporal [[dynamic system|dynamic behavior]].<ref>{{Cite journal |last=Miljanovic |first=Milos |date=Feb–Mar 2012 |title=Comparative analysis of Recurrent and Finite Impulse Response Neural Networks in Time Series Prediction |url=http://www.ijcse.com/docs/INDJCSE12-03-01-028.pdf |journal=Indian Journal of Computer and Engineering |volume=3 |issue=1 }}</ref> A finite impulse recurrent network is a [[directed acyclic graph]] that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a [[directed cyclic graph]] that can not be unrolled.

+

The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

+

术语“递归神经网络”是指有无限脉冲响应泸波器的网络类别，而“卷积神经网络”是指有有限脉冲响应的网络类别。这两类网络都表现出时间动态行为。有限脉冲回归网络是一个可展开的有向无环图，可以用严格前馈神经网络代替，而无限脉冲回归网络是一个不可展开的有向循环图。

+

Both finite impulse and infinite impulse recurrent networks can have additional stored states, and the storage can be under direct control by the neural network. The storage can also be replaced by another network or graph if that incorporates time delays or has feedback loops. Such controlled states are referred to as gated state or gated memory, and are part of [[long short-term memory]] networks (LSTMs) and [[gated recurrent unit]]s. This is also called Feedback Neural Network (FNN).

+

Both finite impulse and infinite impulse recurrent networks can have additional stored states, and the storage can be under direct control by the neural network. The storage can also be replaced by another network or graph if that incorporates time delays or has feedback loops. Such controlled states are referred to as gated state or gated memory, and are part of long short-term memory networks (LSTMs) and gated recurrent units. This is also called Feedback Neural Network (FNN).

+

无论是有限脉冲网络还是无限脉冲回归网络都可以有附加的存储状态，并且存储可以由神经网络直接控制。如果存储器包含时间延迟或有反馈回路，那么也可以用另一个网络或图来代替。这种受控状态称为门控状态或门控存储器，是长期短期存储器网络(LSTM)和门控循环单元的一部分。这也被称为反馈神经网络(FNN)。

+

==History==

+

Recurrent neural networks were based on [[David Rumelhart]]'s work in 1986.<ref>{{Cite journal |last1=Williams |first1=Ronald J. |last2=Hinton |first2=Geoffrey E. |last3=Rumelhart |first3=David E. |date=October 1986 |title=Learning representations by back-propagating errors |journal=Nature |volume=323 |issue=6088 |pages=533–536 |doi=10.1038/323533a0 |issn=1476-4687 |bibcode=1986Natur.323..533R |s2cid=205001834 }}</ref> [[Hopfield network]]s – a special kind of RNN – were (re-)discovered by [[John Hopfield]] in 1982. In 1993, a neural history compressor system solved a "Very Deep Learning" task that required more than 1000 subsequent [[Layer (deep learning)|layers]] in an RNN unfolded in time.<ref name="schmidhuber1993">{{Cite book |url=ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf |title=Habilitation thesis: System modeling and optimization |last=Schmidhuber |first=Jürgen |year=1993 }} Page 150 ff demonstrates credit assignment across the equivalent of 1,200 layers in an unfolded RNN.</ref>

+

Recurrent neural networks were based on David Rumelhart's work in 1986. Hopfield networks – a special kind of RNN – were (re-)discovered by John Hopfield in 1982. In 1993, a neural history compressor system solved a "Very Deep Learning" task that required more than 1000 subsequent layers in an RNN unfolded in time. Page 150 ff demonstrates credit assignment across the equivalent of 1,200 layers in an unfolded RNN.

+

回归神经网络是基于 David Rumelhart 在1986年的工作。Hopfield 网络——一种特殊的 RNN ——在1982年被 John Hopfield (重新)发现。在1993年，一个神经历史压缩系统解决了一个“非常深入学习”的任务，需要超过1000个随后的层次在一个 RNN 及时展开。第150页以下展示了在一个展开的 RNN 中相当于1,200层的信用分配。

+

===LSTM===

+

[[Long short-term memory]] (LSTM) networks were invented by [[Sepp Hochreiter|Hochreiter]] and [[Jürgen Schmidhuber|Schmidhuber]] in 1997 and set accuracy records in multiple applications domains.<ref name="lstm">{{Cite journal |last1=Hochreiter |first1=Sepp |author-link=Sepp Hochreiter |last2=Schmidhuber |first2=Jürgen |date=1997-11-01 |title=Long Short-Term Memory |journal=Neural Computation |volume=9 |issue=8 |pages=1735–1780 |doi=10.1162/neco.1997.9.8.1735|pmid=9377276 |s2cid=1915014 }}</ref>

+

Long short-term memory (LSTM) networks were invented by Hochreiter and Schmidhuber in 1997 and set accuracy records in multiple applications domains.

+

长短期存储器(LSTM)网络是由 Hochreiter 和 Schmidhuber 于1997年发明的，并在多个应用领域创造了准确性记录。

+

Around 2007, LSTM started to revolutionize [[speech recognition]], outperforming traditional models in certain speech applications.<ref name="fernandez2007keyword">{{Cite book |last1=Fernández |first1=Santiago |last2=Graves |first2=Alex |last3=Schmidhuber |first3=Jürgen |year=2007 |title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting |url=http://dl.acm.org/citation.cfm?id=1778066.1778092 |journal=Proceedings of the 17th International Conference on Artificial Neural Networks |series=ICANN'07 |location=Berlin, Heidelberg |publisher=Springer-Verlag |pages=220–229 |isbn=978-3-540-74693-5 }}</ref> In 2009, a [[Connectionist Temporal Classification (CTC)|Connectionist Temporal Classification]] (CTC)-trained LSTM network was the first RNN to win pattern recognition contests when it won several competitions in connected [[handwriting recognition]].<ref name=schmidhuber2015/><ref name=graves20093>{{Cite document |last1=Graves |first1=Alex |last2=Schmidhuber |first2=Jürgen |year=2009 |editor1-last=Koller |editor1-first=D. |editor2-last=Schuurmans |editor2-first=D. |editor2-link=Dale Schuurmans |editor3-last=Bengio |editor3-first=Y. |editor3-link=Yoshua Bengio

+

|editor4-last=Bottou |editor4-first=L. |title=Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks |work=Advances in Neural Information Processing Systems |publisher=Neural Information Processing Systems (NIPS) Foundation |volume=21 |pages=545–552 |url=https://papers.nips.cc/paper/3449-offline-handwriting-recognition-with-multidimensional-recurrent-neural-networks}}</ref> In 2014, the Chinese company [[Baidu]] used CTC-trained RNNs to break the 2S09 Switchboard Hub5'00 speech recognition dataset<ref>{{Cite web|url=https://catalog.ldc.upenn.edu/LDC2002S09|title=2000 HUB5 English Evaluation Speech - Linguistic Data Consortium|website=catalog.ldc.upenn.edu}}</ref> benchmark without using any traditional speech processing methods.<ref name="hannun2014">{{cite arXiv |last1=Hannun |first1=Awni |last2=Case |first2=Carl |last3=Casper |first3=Jared |last4=Catanzaro |first4=Bryan |last5=Diamos |first5=Greg |last6=Elsen |first6=Erich |last7=Prenger |first7=Ryan |last8=Satheesh |first8=Sanjeev |last9=Sengupta |first9=Shubho |date=2014-12-17 |title=Deep Speech: Scaling up end-to-end speech recognition |eprint=1412.5567 |class=cs.CL}}</ref>

+

Around 2007, LSTM started to revolutionize speech recognition, outperforming traditional models in certain speech applications. In 2009, a Connectionist Temporal Classification (CTC)-trained LSTM network was the first RNN to win pattern recognition contests when it won several competitions in connected handwriting recognition. In 2014, the Chinese company Baidu used CTC-trained RNNs to break the 2S09 Switchboard Hub5'00 speech recognition dataset benchmark without using any traditional speech processing methods.

+

大约在2007年，LSTM 开始对语音识别进行革命性的改革，在某些语音应用中表现优于传统模型。2009年，一个接受连接时间分类(CTC)训练的 LSTM 网络赢得了连接手写识别的多项竞赛，成为第一个赢得模式识别竞赛的 RNN。2014年，中国公司百度在没有使用任何传统语音处理方法的情况下，使用 CTC 训练的 RNN 突破了2S09总机 Hub5’00语音识别数据集基准。

+

LSTM also improved large-vocabulary speech recognition<ref name="sak2014"/><ref name="liwu2015"/> and [[text-to-speech]] synthesis<ref name="fan2015">Fan, Bo; Wang, Lijuan; Soong, Frank K.; Xie, Lei (2015) "Photo-Real Talking Head with Deep Bidirectional LSTM", in ''Proceedings of ICASSP 2015''</ref> and was used in [[Google Android]].<ref name="schmidhuber2015" /><ref name="zen2015">{{Cite web |url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf |title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis |last1=Zen |first1=Heiga |last2=Sak |first2=Haşim |year=2015 |website=Google.com |publisher=ICASSP |pages=4470–4474 }}</ref> In 2015, Google's speech recognition reportedly experienced a dramatic performance jump of 49%{{Citation needed|date=November 2016}} through CTC-trained LSTM.<ref name="sak2015">{{Cite web |url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html |title=Google voice search: faster and more accurate |last1=Sak |first1=Haşim |last2=Senior |first2=Andrew |date=September 2015 |last3=Rao |first3=Kanishka |last4=Beaufays |first4=Françoise |last5=Schalkwyk |first5=Johan}}</ref>

+

LSTM also improved large-vocabulary speech recognition and text-to-speech synthesisFan, Bo; Wang, Lijuan; Soong, Frank K.; Xie, Lei (2015) "Photo-Real Talking Head with Deep Bidirectional LSTM", in Proceedings of ICASSP 2015 and was used in Google Android. In 2015, Google's speech recognition reportedly experienced a dramatic performance jump of 49% through CTC-trained LSTM.

+

LSTM 还改进了大词汇量语音识别和文本到语音合成。樊，Bo; Wang，Lijuan; 宋，Frank K; Xie，Lei (2015)“照片-真实对话头与深度双向 LSTM”，发表于 ICASSP 2015年会议录，并在谷歌 Android 中使用。据报道，2015年，谷歌的语音识别通过 CTC 培训的 LSTM 实现了49% 的显著性能跃升。

+

LSTM broke records for improved [[machine translation]],<ref name="sutskever2014">{{Cite journal |last1=Sutskever |first1=Ilya |last2=Vinyals |first2=Oriol |last3=Le |first3=Quoc V. |year=2014 |title=Sequence to Sequence Learning with Neural Networks |url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf |journal=Electronic Proceedings of the Neural Information Processing Systems Conference |volume=27 |pages=5346 |arxiv=1409.3215 |bibcode=2014arXiv1409.3215S }}</ref> [[Language Modeling]]<ref name="vinyals2016">{{cite arXiv |last1=Jozefowicz |first1=Rafal |last2=Vinyals |first2=Oriol |last3=Schuster |first3=Mike |last4=Shazeer |first4=Noam |last5=Wu |first5=Yonghui |date=2016-02-07 |title=Exploring the Limits of Language Modeling |eprint=1602.02410 |class=cs.CL}}</ref> and Multilingual Language Processing.<ref name="gillick2015">{{cite arXiv |last1=Gillick |first1=Dan |last2=Brunk |first2=Cliff |last3=Vinyals |first3=Oriol |last4=Subramanya |first4=Amarnag |date=2015-11-30 |title=Multilingual Language Processing From Bytes |eprint=1512.00103 |class=cs.CL}}</ref> LSTM combined with [[convolutional neural network]]s (CNNs) improved [[automatic image captioning]].<ref name="vinyals2015">{{cite arXiv |last1=Vinyals |first1=Oriol |last2=Toshev |first2=Alexander |last3=Bengio |first3=Samy |last4=Erhan |first4=Dumitru |date=2014-11-17 |title=Show and Tell: A Neural Image Caption Generator |eprint=1411.4555 |class=cs.CV }}</ref>

+

LSTM broke records for improved machine translation, Language Modeling and Multilingual Language Processing. LSTM combined with convolutional neural networks (CNNs) improved automatic image captioning.

+

LSTM 在改进机器翻译、语言建模和多语言语言处理方面打破了记录。LSTM 结合卷积神经网络(CNN)改进的自动图像字幕。

+

==Architectures==

+

RNNs come in many variants.

+

RNNs come in many variants.

+

= = 体系结构 = = RNN 有许多变体。

+

===Fully recurrent ===

+

[[File:Recurrent neural network unfold.svg|thumb|Compressed (left) and unfolded (right) basic recurrent neural network.]]

+

Fully recurrent neural networks (FRNN) connect the outputs of all neurons to the inputs of all neurons. This is the most general neural network topology because all other topologies can be represented by setting some connection weights to zero to simulate the lack of connections between those neurons. The illustration to the right may be misleading to many because practical neural network topologies are frequently organized in "layers" and the drawing gives that appearance. However, what appears to be [[Layer (deep learning)|layers]] are, in fact, different steps in time of the same fully recurrent neural network. The left-most item in the illustration shows the recurrent connections as the arc labeled 'v'. It is "unfolded" in time to produce the appearance of [[Layer (deep learning)|layers]].

+

thumb|Compressed (left) and unfolded (right) basic recurrent neural network.

+

Fully recurrent neural networks (FRNN) connect the outputs of all neurons to the inputs of all neurons. This is the most general neural network topology because all other topologies can be represented by setting some connection weights to zero to simulate the lack of connections between those neurons. The illustration to the right may be misleading to many because practical neural network topologies are frequently organized in "layers" and the drawing gives that appearance. However, what appears to be layers are, in fact, different steps in time of the same fully recurrent neural network. The left-most item in the illustration shows the recurrent connections as the arc labeled 'v'. It is "unfolded" in time to produce the appearance of layers.

+

= = 完全重复 = = = 拇指 | 压缩(左)和展开(右)基本递归神经网络。完全递归神经网络(FRNN)将所有神经元的输出与所有神经元的输入连接起来。这是最一般的神经网络拓扑，因为所有其他拓扑都可以通过将一些连接权值设置为零来模拟这些神经元之间缺乏连接来表示。右边的插图可能会误导许多人，因为实际的神经网络拓扑结构经常被组织成“层”，而图中给出了这种外观。然而，事实上，表面上看起来是层次的东西，在同一个完全递归神经网络的时间里却是不同的步骤。插图中最左边的项显示了标记为“ v”的弧形的循环连接。它是及时“展开”以产生层的外观。

+

==={{Anchor|Elman network|Jordan network}}Elman networks and Jordan networks===

+

[[File:Elman srnn.png|thumb|right|The Elman network]]

+

thumb|right|The Elman network

+

= = 埃尔曼网络和乔丹网络 = = = 拇指 | 右边 | 埃尔曼网络

+

An [[Jeff Elman|Elman]] network is a three-layer network (arranged horizontally as ''x'', ''y'', and ''z'' in the illustration) with the addition of a set of context units (''u'' in the illustration). The middle (hidden) layer is connected to these context units fixed with a weight of one.<ref name="bmm615">Cruse, Holk; [http://www.brains-minds-media.org/archive/615/bmm615.pdf ''Neural Networks as Cybernetic Systems''], 2nd and revised edition</ref> At each time step, the input is fed forward and a [[learning rule]] is applied. The fixed back-connections save a copy of the previous values of the hidden units in the context units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard [[multilayer perceptron]].

+

An Elman network is a three-layer network (arranged horizontally as x, y, and z in the illustration) with the addition of a set of context units (u in the illustration). The middle (hidden) layer is connected to these context units fixed with a weight of one.Cruse, Holk; Neural Networks as Cybernetic Systems, 2nd and revised edition At each time step, the input is fed forward and a learning rule is applied. The fixed back-connections save a copy of the previous values of the hidden units in the context units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard multilayer perceptron.

+

Elman 网络是一个三层网络(图中水平排列为 x、 y 和 z) ，加上一组上下文单元(图中的 u)。中间(隐藏)层连接到这些上下文单元，其权重为1。作为控制论系统的神经网络，第2版和修订版在每个时间步骤中，输入被前馈并应用一个学习规则。固定的反向连接保存上下文单元中隐藏单元之前的值的副本(因为它们在应用学习规则之前通过连接传播)。因此，网络可以维持某种状态，允许它执行超出标准多层感知机能力的序列预测等任务。

+

[[Michael I. Jordan|Jordan]] networks are similar to Elman networks. The context units are fed from the output layer instead of the hidden layer. The context units in a Jordan network are also referred to as the state layer. They have a recurrent connection to themselves.<ref name="bmm615" />

+

Jordan networks are similar to Elman networks. The context units are fed from the output layer instead of the hidden layer. The context units in a Jordan network are also referred to as the state layer. They have a recurrent connection to themselves.

+

Jordan 网络类似于 Elman 网络。上下文单元由输出层提供，而不是隐藏层。Jordan 网络中的上下文单元也称为状态层。他们经常和自己联系在一起。

+

Elman and Jordan networks are also known as "Simple recurrent networks" (SRN).

+

Elman and Jordan networks are also known as "Simple recurrent networks" (SRN).

+

Elman 和 Jordan 网络也被称为“简单循环网络”(SPRN)。

+

;Elman network<ref>{{cite journal

+

| last=Elman

+

| first=Jeffrey L.

+

| title=Finding Structure in Time

+

| journal=Cognitive Science

+

| year=1990

+

| volume=14

+

| issue=2

+

| pages=179–211

+

| doi=10.1016/0364-0213(90)90002-E| doi-access=free

+

}}</ref>

+

:<math>

+

\begin{align}

+

h_t &= \sigma_h(W_{h} x_t + U_{h} h_{t-1} + b_h) \\

+

y_t &= \sigma_y(W_{y} h_t + b_y)

+

\end{align}

+

</math>

+

;Jordan network<ref>{{Cite book |last=Jordan |first=Michael I. |title=Neural-Network Models of Cognition - Biobehavioral Foundations |date=1997-01-01 |chapter=Serial Order: A Parallel Distributed Processing Approach |journal=Advances in Psychology |series=Neural-Network Models of Cognition |volume=121 |pages=471–495 |doi=10.1016/s0166-4115(97)80111-2 |isbn=9780444819314}}</ref>

+

:<math>

+

\begin{align}

+

h_t &= \sigma_h(W_{h} x_t + U_{h} y_{t-1} + b_h) \\

+

y_t &= \sigma_y(W_{y} h_t + b_y)

+

\end{align}

+

</math>

+

;Elman network

+

:

+

\begin{align}

+

h_t &= \sigma_h(W_{h} x_t + U_{h} h_{t-1} + b_h) \\

+

y_t &= \sigma_y(W_{y} h_t + b_y)

+

\end{align}

+

;Jordan network

+

:

+

\begin{align}

+

h_t &= \sigma_h(W_{h} x_t + U_{h} y_{t-1} + b_h) \\

+

y_t &= \sigma_y(W_{y} h_t + b_y)

+

\end{align}

+

;Elman network

+

:

+

\begin{align}

+

h_t &= \sigma_h(W_{h} x_t + U_{h} h_{t-1} + b_h) \\

+

y_t &= \sigma_y(W_{y} h_t + b_y)

+

\end{align}

+

;Jordan network

+

:

+

\begin{align}

+

h_t &= \sigma_h(W_{h} x_t + U_{h} y_{t-1} + b_h) \\

+

y_t &= \sigma_y(W_{y} h_t + b_y)

+

\end{align}

+

Variables and functions

+

* <math>x_t</math>: input vector

+

* <math>h_t</math>: hidden layer vector

+

* <math>y_t</math>: output vector

+

* <math>W</math>, <math>U</math> and <math>b</math>: parameter matrices and vector

+

* <math>\sigma_h</math> and <math>\sigma_y</math>: [[Activation function]]s

+

Variables and functions

+

* x_t: input vector

+

* h_t: hidden layer vector

+

* y_t: output vector

+

* W, U and b: parameter matrices and vector

+

* \sigma_h and \sigma_y: Activation functions

+

变量和函数

+

* x _ t: 输入向量

+

* h _ t: 隐层向量

+

* y _ t: 输出向量

+

* W，U 和 b: 参数矩阵和向量

+

* sigma _ h 和 sigma _ y: 激活函数

+

===Hopfield ===

+

The [[Hopfield network]] is an RNN in which all connections across layers are equally sized. It requires [[Stationary process|stationary]] inputs and is thus not a general RNN, as it does not process sequences of patterns. However, it guarantees that it will converge. If the connections are trained using [[Hebbian learning]] then the Hopfield network can perform as [[Robustness (computer science)|robust]] [[content-addressable memory]], resistant to connection alteration.

+

The Hopfield network is an RNN in which all connections across layers are equally sized. It requires stationary inputs and is thus not a general RNN, as it does not process sequences of patterns. However, it guarantees that it will converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust content-addressable memory, resistant to connection alteration.

+

这个 Hopfield神经网络是一个 RNN，其中所有跨层连接的大小都是相同的。它需要固定的输入，因此不是一般的 RNN，因为它不处理模式序列。然而，它保证了它会收敛。如果使用赫布式学习训练连接，那么 Hopfield神经网络可以表现出强大的结合存储，抵抗连接改变。

+

====Bidirectional associative memory====

+

Introduced by Bart Kosko,<ref>{{cite journal |year=1988 |title=Bidirectional associative memories |journal=IEEE Transactions on Systems, Man, and Cybernetics |volume=18 |issue=1 |pages=49–60 |doi=10.1109/21.87054 |last1=Kosko |first1=Bart |s2cid=59875735 }}</ref> a bidirectional associative memory (BAM) network is a variant of a Hopfield network that stores associative data as a vector. The bi-directionality comes from passing information through a matrix and its [[transpose]]. Typically, bipolar encoding is preferred to binary encoding of the associative pairs. Recently, stochastic BAM models using [[Markov chain|Markov]] stepping were optimized for increased network stability and relevance to real-world applications.<ref>{{cite journal |last1=Rakkiyappan |first1=Rajan |last2=Chandrasekar |first2=Arunachalam |last3=Lakshmanan |first3=Subramanian |last4=Park |first4=Ju H. |date=2 January 2015 |title=Exponential stability for markovian jumping stochastic BAM neural networks with mode-dependent probabilistic time-varying delays and impulse control |journal=Complexity |volume=20 |issue=3 |pages=39–65 |doi=10.1002/cplx.21503 |bibcode=2015Cmplx..20c..39R }}</ref>

+

Introduced by Bart Kosko, a bidirectional associative memory (BAM) network is a variant of a Hopfield network that stores associative data as a vector. The bi-directionality comes from passing information through a matrix and its transpose. Typically, bipolar encoding is preferred to binary encoding of the associative pairs. Recently, stochastic BAM models using Markov stepping were optimized for increased network stability and relevance to real-world applications.

+

由 Bart Kosko 引入的双向联想记忆(bAM)网络是 Hopfield神经网络的一种变体，它以向量的形式存储联想数据。双向性来自于通过矩阵及其转置传递信息。通常情况下，交替反转码优于联想对的二进制编码。最近，随机 BAM 模型使用马尔可夫步优化增加网络稳定性和相关性的现实世界的应用。

+

A BAM network has two layers, either of which can be driven as an input to recall an association and produce an output on the other layer.<ref>{{cite book

+

| url = {{google books |plainurl=y |id=txsjjYzFJS4C|page=336}}

+

| page = 336

+

| title = Neural networks: a systematic introduction

+

| first=Rául |last=Rojas

+

| publisher = Springer

+

| isbn = 978-3-540-60505-8

+

| year = 1996

+

}}</ref>

+

A BAM network has two layers, either of which can be driven as an input to recall an association and produce an output on the other layer.

+

BAM 网络有两层，其中任何一层都可以作为输入驱动，用于回忆关联并在另一层上产生输出。

+

===Echo state ===

+

The echo state network (ESN) has a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change (be trained). ESNs are good at reproducing certain [[time series]].<ref>{{Cite journal |last1=Jaeger |first1=Herbert |last2=Haas |first2=Harald |date=2004-04-02 |title=Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication |journal=Science |volume=304 |issue=5667 |pages=78–80 |doi=10.1126/science.1091277 |pmid=15064413 |bibcode=2004Sci...304...78J|citeseerx=10.1.1.719.2301 |s2cid=2184251 }}</ref> A variant for [[Spiking neural network|spiking neurons]] is known as a [[liquid state machine]].<ref>{{cite journal |first1=Wolfgang |last1=Maass |first2=Thomas |last2=Natschläger |first3=Henry |last3=Markram |title=A fresh look at real-time computation in generic recurrent neural circuits |series=Technical report |publisher=Institute for Theoretical Computer Science, Technische Universität Graz |date=2002-08-20 }}</ref>

+

The echo state network (ESN) has a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change (be trained). ESNs are good at reproducing certain time series. A variant for spiking neurons is known as a liquid state machine.

+

回波状态网络(ESN)具有一个稀疏连接的随机隐层。输出神经元的权重是网络中唯一可以改变(被训练)的部分。ESN 善于重现特定的时间序列。刺激神经元的一种变体被称为液态机。

+

===Independently RNN (IndRNN) ===

+

The Independently recurrent neural network (IndRNN)<ref name="auto">{{cite arXiv |title= Independently Recurrent Neural Network (IndRNN): Building a Longer and Deeper RNN|last1=Li |first1=Shuai |last2=Li |first2=Wanqing |last3=Cook |first3=Chris |last4=Zhu |first4=Ce |last5=Yanbo |first5=Gao |eprint=1803.04831|class=cs.CV |year=2018 }}</ref> addresses the gradient vanishing and exploding problems in the traditional fully connected RNN. Each neuron in one layer only receives its own past state as context information (instead of full connectivity to all other neurons in this layer) and thus neurons are independent of each other's history. The gradient backpropagation can be regulated to avoid gradient vanishing and exploding in order to keep long or short-term memory. The cross-neuron information is explored in the next layers. IndRNN can be robustly trained with the non-saturated nonlinear functions such as ReLU. Using skip connections, deep networks can be trained.

+

The Independently recurrent neural network (IndRNN) addresses the gradient vanishing and exploding problems in the traditional fully connected RNN. Each neuron in one layer only receives its own past state as context information (instead of full connectivity to all other neurons in this layer) and thus neurons are independent of each other's history. The gradient backpropagation can be regulated to avoid gradient vanishing and exploding in order to keep long or short-term memory. The cross-neuron information is explored in the next layers. IndRNN can be robustly trained with the non-saturated nonlinear functions such as ReLU. Using skip connections, deep networks can be trained.

+

= = = 独立神经网络(IndRNN) = = = 独立递归神经网络(indRNN)解决了传统完全连接神经网络中的梯度消失和爆炸问题。一层中的每个神经元只接收自己的过去状态作为上下文信息(而不是与该层中的所有其他神经元完全连接) ，因此神经元彼此独立于对方的历史。可以调节梯度反向传播，避免梯度消失和爆炸，以保持长期或短期记忆。跨神经元的信息是在下一层探索。利用非饱和非线性函数(如 ReLU)可以对 IndRNN 进行鲁棒训练。使用跳过连接，可以训练深度网络。

+

===Recursive ===

+

A [[recursive neural network]]<ref>{{cite book |doi=10.1109/ICNN.1996.548916 |title=Learning task-dependent distributed representations by backpropagation through structure |last1=Goller |first1=Christoph |last2=Küchler |first2=Andreas |s2cid=6536466 |journal=IEEE International Conference on Neural Networks |volume=1 |pages=347 |year=1996 |isbn=978-0-7803-3210-2|citeseerx=10.1.1.52.4759 }}</ref> is created by applying the same set of weights [[recursion|recursively]] over a differentiable graph-like structure by traversing the structure in [[topological sort|topological order]]. Such networks are typically also trained by the reverse mode of [[automatic differentiation]].<ref name="lin1970">{{cite book |first=Seppo |last=Linnainmaa |author-link=Seppo Linnainmaa |year=1970 |title=The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors |publisher=M.Sc. thesis (in Finnish), University of Helsinki }}</ref><ref name="grie2008">{{cite book |first1=Andreas |last1=Griewank |first2=Andrea |last2= Walther |author2-link=Andrea Walther|title=Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation |edition=Second |url={{google books |plainurl=y |id=xoiiLaRxcbEC}} |year=2008 |publisher=SIAM |isbn=978-0-89871-776-1}}</ref> They can process [[distributed representation]]s of structure, such as [[mathematical logic|logical terms]]. A special case of recursive neural networks is the RNN whose structure corresponds to a linear chain. Recursive neural networks have been applied to [[natural language processing]].<ref>{{citation |last1=Socher |first1=Richard |last2=Lin |first2=Cliff |last3=Ng |first3=Andrew Y. |last4=Manning |first4=Christopher D. |contribution=Parsing Natural Scenes and Natural Language with Recursive Neural Networks |contribution-url=https://ai.stanford.edu/~ang/papers/icml11-ParsingWithRecursiveNeuralNetworks.pdf |title=28th International Conference on Machine Learning (ICML 2011) }}</ref> The Recursive Neural Tensor Network uses a [[tensor]]-based composition function for all nodes in the tree.<ref>{{cite journal |last1=Socher |first1=Richard |last2=Perelygin |first2=Alex |last3=Wu |first3=Jean Y. |last4=Chuang |first4=Jason |last5=Manning |first5=Christopher D. |last6=Ng |first6=Andrew Y. |last7=Potts |first7=Christopher |title=Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank |journal=Emnlp 2013 |url=http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf}}</ref>

+

A recursive neural network is created by applying the same set of weights recursively over a differentiable graph-like structure by traversing the structure in topological order. Such networks are typically also trained by the reverse mode of automatic differentiation. They can process distributed representations of structure, such as logical terms. A special case of recursive neural networks is the RNN whose structure corresponds to a linear chain. Recursive neural networks have been applied to natural language processing. The Recursive Neural Tensor Network uses a tensor-based composition function for all nodes in the tree.

+

递归神经网络是通过在一个可微的图形结构上递归地遍历该结构来创建的，这个结构是相同的权重集合，而这个权重集合是通过在拓扑有序中遍历该结构来创建的。这种网络通常也是通过反向自动微分模式来训练的。它们可以处理结构的分布式表示，例如逻辑术语。递归神经网络的一个特例是结构对应于线性链的神经网络。递归神经网络已应用于自然语言处理。递归神经张量网络对树中的所有节点使用基于张量的合成函数。

+

===Neural history compressor===

+

The neural history compressor is an unsupervised stack of RNNs.<ref name="schmidhuber1992">{{cite journal |last1=Schmidhuber |first1=Jürgen |year=1992 |title=Learning complex, extended sequences using the principle of history compression |url=ftp://ftp.idsia.ch/pub/juergen/chunker.pdf |journal=Neural Computation |volume=4 |issue=2 |pages=234–242 |doi=10.1162/neco.1992.4.2.234 |s2cid=18271205 }}</ref> At the input level, it learns to predict its next input from the previous inputs. Only unpredictable inputs of some RNN in the hierarchy become inputs to the next higher level RNN, which therefore recomputes its internal state only rarely. Each higher level RNN thus studies a compressed representation of the information in the RNN below. This is done such that the input sequence can be precisely reconstructed from the representation at the highest level.

+

The neural history compressor is an unsupervised stack of RNNs. At the input level, it learns to predict its next input from the previous inputs. Only unpredictable inputs of some RNN in the hierarchy become inputs to the next higher level RNN, which therefore recomputes its internal state only rarely. Each higher level RNN thus studies a compressed representation of the information in the RNN below. This is done such that the input sequence can be precisely reconstructed from the representation at the highest level.

+

= = 神经历史压缩器 = = 神经历史压缩器是一个无监督的 RNN 堆栈。在输入层面，它学习从前面的输入中预测下一个输入。只有层次结构中某些 RNN 的不可预测输入才会成为下一个更高级别 RNN 的输入，因此它很少重新计算其内部状态。因此，每个较高级别的 RNN 都研究下面 RNN 中信息的压缩表示。这样做是为了能够从最高级别的表示精确地重建输入序列。

+

The system effectively minimises the description length or the negative [[logarithm]] of the probability of the data.<ref name="scholarpedia2015pre">{{cite journal |last1=Schmidhuber |first1=Jürgen |year=2015 |title=Deep Learning |journal=Scholarpedia |volume=10 |issue=11 |page=32832 |doi=10.4249/scholarpedia.32832 |bibcode=2015SchpJ..1032832S |doi-access=free }}</ref> Given a lot of learnable predictability in the incoming data sequence, the highest level RNN can use supervised learning to easily classify even deep sequences with long intervals between important events.

+

The system effectively minimises the description length or the negative logarithm of the probability of the data. Given a lot of learnable predictability in the incoming data sequence, the highest level RNN can use supervised learning to easily classify even deep sequences with long intervals between important events.

+

该系统有效地最小化了描述长度或数据概率的负对数。由于在输入数据序列中有很多可学习的可预测性，最高级别的 RNN 可以使用监督式学习来轻松地对重要事件之间间隔很长的深层序列进行分类。

+

It is possible to distill the RNN hierarchy into two RNNs: the "conscious" chunker (higher level) and the "subconscious" automatizer (lower level).<ref name="schmidhuber1992" /> Once the chunker has learned to predict and compress inputs that are unpredictable by the automatizer, then the automatizer can be forced in the next learning phase to predict or imitate through additional units the hidden units of the more slowly changing chunker. This makes it easy for the automatizer to learn appropriate, rarely changing memories across long intervals. In turn, this helps the automatizer to make many of its once unpredictable inputs predictable, such that the chunker can focus on the remaining unpredictable events.<ref name="schmidhuber1992" />

+

It is possible to distill the RNN hierarchy into two RNNs: the "conscious" chunker (higher level) and the "subconscious" automatizer (lower level). Once the chunker has learned to predict and compress inputs that are unpredictable by the automatizer, then the automatizer can be forced in the next learning phase to predict or imitate through additional units the hidden units of the more slowly changing chunker. This makes it easy for the automatizer to learn appropriate, rarely changing memories across long intervals. In turn, this helps the automatizer to make many of its once unpredictable inputs predictable, such that the chunker can focus on the remaining unpredictable events.

+

有可能将 RNN 层次提炼为两个 RNN: “有意识的”块(较高层次)和“潜意识的”自动化器(较低层次)。一旦组块已经学会预测和压缩自动化程序无法预测的输入，那么自动化程序就可以在下一个学习阶段被迫通过额外的单元来预测或模仿变化较慢的组块的隐藏单元。这使得自动化程序很容易学习适当的，很少在长时间间隔内改变记忆。反过来，这有助于自动化程序使许多曾经不可预测的输入可预测，这样块就可以专注于剩下的不可预测的事件。

+

A [[generative model]] partially overcame the [[vanishing gradient problem]]<ref name="hochreiter1991">Hochreiter, Sepp (1991), [http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf Untersuchungen zu dynamischen neuronalen Netzen], Diploma thesis, Institut f. Informatik, Technische Univ. Munich, Advisor Jürgen Schmidhuber</ref> of [[automatic differentiation]] or [[backpropagation]] in neural networks in 1992. In 1993, such a system solved a "Very Deep Learning" task that required more than 1000 subsequent layers in an RNN unfolded in time.<ref name="schmidhuber1993" />

+

A generative model partially overcame the vanishing gradient problemHochreiter, Sepp (1991), Untersuchungen zu dynamischen neuronalen Netzen, Diploma thesis, Institut f. Informatik, Technische Univ. Munich, Advisor Jürgen Schmidhuber of automatic differentiation or backpropagation in neural networks in 1992. In 1993, such a system solved a "Very Deep Learning" task that required more than 1000 subsequent layers in an RNN unfolded in time.

+

一个生成模型部分地克服了渐变问题霍克赖特，塞普(1991) ，动态神经网络的研究，文凭论文，信息技术研究所，工业大学。慕尼黑，1992年神经网络于尔根·施密德胡伯自动微分或反向传播的顾问。在1993年，这样一个系统解决了一个“非常深入学习”的任务，需要超过1000后续层次的 RNN 展开时间。

+

===Second order RNNs===

+

Second order RNNs use higher order weights <math>w{}_{ijk}</math> instead of the standard <math>w{}_{ij}</math> weights, and states can be a product. This allows a direct mapping to a [[finite-state machine]] both in training, stability, and representation.<ref>{{cite journal |first1=C. Lee |last1=Giles |first2=Clifford B. |last2=Miller |first3=Dong |last3=Chen |first4=Hsing-Hen |last4=Chen |first5=Guo-Zheng |last5=Sun |first6=Yee-Chun |last6=Lee |url=https://clgiles.ist.psu.edu/pubs/NC1992-recurrent-NN.pdf |title=Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks |journal=Neural Computation |volume=4 |issue=3 |pages=393–405 |year=1992 |doi=10.1162/neco.1992.4.3.393 |s2cid=19666035 }}</ref><ref>{{cite journal |first1=Christian W. |last1=Omlin |first2=C. Lee |last2=Giles |title=Constructing Deterministic Finite-State Automata in Recurrent Neural Networks |journal=Journal of the ACM |volume=45 |issue=6 |pages=937–972 |year=1996 |doi=10.1145/235809.235811 |citeseerx=10.1.1.32.2364 |s2cid=228941 }}</ref> Long short-term memory is an example of this but has no such formal mappings or proof of stability.

+

Second order RNNs use higher order weights w{}_{ijk} instead of the standard w{}_{ij} weights, and states can be a product. This allows a direct mapping to a finite-state machine both in training, stability, and representation. Long short-term memory is an example of this but has no such formal mappings or proof of stability.

+

= = = 二阶 RNNs = = = = 二阶 RNNs 使用高阶权重 w {} _ { ijk }代替标准 w {} _ { ij }权重，并且状态可以是乘积。这可以直接映射到训练、稳定性和表现的有限状态机。长期短期记忆就是一个例子，但没有这样的形式映射或稳定性的证据。

+

===Long short-term memory===

+

[[File:Long Short-Term Memory.svg|thumb|Long short-term memory unit]]

+

Long short-term memory (LSTM) is a [[deep learning]] system that avoids the [[vanishing gradient problem]]. LSTM is normally augmented by recurrent gates called "forget gates".<ref name="gers2002">{{Cite journal |url=http://www.jmlr.org/papers/volume3/gers02a/gers02a.pdf |title=Learning Precise Timing with LSTM Recurrent Networks |last1=Gers |first1=Felix A. |last2=Schraudolph |first2=Nicol N. |journal=Journal of Machine Learning Research |volume=3 |access-date=2017-06-13 |last3=Schmidhuber |first3=Jürgen |pages=115–143 |year=2002 }}</ref> LSTM prevents backpropagated errors from vanishing or exploding.<ref name="hochreiter1991" /> Instead, errors can flow backwards through unlimited numbers of virtual layers unfolded in space. That is, LSTM can learn tasks<ref name="schmidhuber2015">{{Cite journal |last=Schmidhuber |first=Jürgen |date=January 2015 |title=Deep Learning in Neural Networks: An Overview |journal=Neural Networks |volume=61 |pages=85–117 |doi=10.1016/j.neunet.2014.09.003 |pmid=25462637 |arxiv=1404.7828 |s2cid=11715509 }}</ref> that require memories of events that happened thousands or even millions of discrete time steps earlier. Problem-specific LSTM-like topologies can be evolved.<ref name="bayer2009">{{Cite book |last1=Bayer |first1=Justin |last2=Wierstra |first2=Daan |last3=Togelius |first3=Julian |last4=Schmidhuber |first4=Jürgen |date=2009-09-14 |title=Evolving Memory Cell Structures for Sequence Learning |journal=Artificial Neural Networks – ICANN 2009 |publisher=Springer |location=Berlin, Heidelberg |pages=755–764 |doi=10.1007/978-3-642-04277-5_76 |series=Lecture Notes in Computer Science |volume=5769 |isbn=978-3-642-04276-8|url=http://mediatum.ub.tum.de/doc/1289041/document.pdf }}</ref> LSTM works even given long delays between significant events and can handle signals that mix low and high frequency components.

+

thumb|Long short-term memory unit

+

Long short-term memory (LSTM) is a deep learning system that avoids the vanishing gradient problem. LSTM is normally augmented by recurrent gates called "forget gates". LSTM prevents backpropagated errors from vanishing or exploding. Instead, errors can flow backwards through unlimited numbers of virtual layers unfolded in space. That is, LSTM can learn tasks that require memories of events that happened thousands or even millions of discrete time steps earlier. Problem-specific LSTM-like topologies can be evolved. LSTM works even given long delays between significant events and can handle signals that mix low and high frequency components.

+

长期短期记忆单元长期短期记忆(LSTM)是一个深度学习系统，它避免了渐变问题。LSTM 通常被称为“忘记门”的循环门所增强。LSTM 防止反向传播的错误消失或爆炸。相反，错误可以通过在空间中展开的无限数量的虚拟层向后流动。也就是说，LSTM 可以学习那些需要记住更早发生的数千甚至数百万离散时间步骤的事件的任务。可以发展特定于问题的 LSTM 类拓扑。LSTM 甚至可以在重要事件之间存在长时间延迟的情况下工作，并且可以处理混合了低频和高频成分的信号。

+

Many applications use stacks of LSTM RNNs<ref name="fernandez2007">{{Cite journal |last1=Fernández |first1=Santiago |last2=Graves |first2=Alex |last3=Schmidhuber |first3=Jürgen |year=2007 |title=Sequence labelling in structured domains with hierarchical recurrent neural networks |citeseerx=10.1.1.79.1887 |journal=Proc. 20th International Joint Conference on Artificial Intelligence, Ijcai 2007 |pages=774–779 }}</ref> and train them by [[Connectionist Temporal Classification (CTC)]]<ref name="graves2006">{{Cite journal |last1=Graves |first1=Alex |last2=Fernández |first2=Santiago |last3=Gomez |first3=Faustino J. |year=2006 |title=Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks |citeseerx=10.1.1.75.6306 |journal=Proceedings of the International Conference on Machine Learning |pages=369–376}}</ref> to find an RNN weight matrix that maximizes the probability of the label sequences in a training set, given the corresponding input sequences. CTC achieves both alignment and recognition.

+

Many applications use stacks of LSTM RNNs and train them by Connectionist Temporal Classification (CTC) to find an RNN weight matrix that maximizes the probability of the label sequences in a training set, given the corresponding input sequences. CTC achieves both alignment and recognition.

+

许多应用程序使用 LSTM RNN 堆栈，并通过连接时态分类(CTC)对它们进行训练，以找到一个 RNN 权重矩阵，最大化标签序列在训练集中的概率，给出相应的输入序列。CTC 实现了对准和识别。

+

LSTM can learn to recognize [[context-sensitive languages]] unlike previous models based on [[hidden Markov model]]s (HMM) and similar concepts.<ref>{{Cite journal |last1=Gers |first1=Felix A. |last2=Schmidhuber |first2=Jürgen |date=November 2001 |title=LSTM recurrent networks learn simple context-free and context-sensitive languages |journal=IEEE Transactions on Neural Networks |volume=12 |issue=6 |pages=1333–1340 |doi=10.1109/72.963769 |pmid=18249962 |s2cid=10192330 |issn=1045-9227 |url=https://semanticscholar.org/paper/f828b401c86e0f8fddd8e77774e332dfd226cb05 }}</ref>

+

LSTM can learn to recognize context-sensitive languages unlike previous models based on hidden Markov models (HMM) and similar concepts.

+

与以前基于隐马尔可夫模型(HMM)和类似概念的模型不同，LSTM 可以学习识别上下文敏感语言。

+

===Gated recurrent unit===

+

[[File:Gated Recurrent Unit.svg|thumb|Gated recurrent unit]]

+

Gated recurrent units (GRUs) are a gating mechanism in [[recurrent neural networks]] introduced in 2014. They are used in the full form and several simplified variants.<ref>{{cite arXiv |last1=Heck |first1=Joel |last2=Salem |first2=Fathi M. |date=2017-01-12 |title=Simplified Minimal Gated Unit Variations for Recurrent Neural Networks |eprint=1701.03452 |class=cs.NE }}</ref><ref>{{cite arXiv |last1=Dey |first1=Rahul |last2=Salem |first2=Fathi M. |date=2017-01-20 |title=Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks |eprint=1701.05923 |class=cs.NE }}</ref> Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory.<ref name="MyUser_Arxiv.org_May_18_2016c">{{cite arXiv |class=cs.NE |first2=Caglar |last2=Gulcehre |title=Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling |eprint=1412.3555 |last1=Chung |first1=Junyoung |last3=Cho |first3=KyungHyun |last4=Bengio |first4=Yoshua |year=2014}}</ref> They have fewer parameters than LSTM, as they lack an output gate.<ref name="MyUser_Wildml.com_May_18_2016c">{{cite web |url=http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/ |title=Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML |newspaper=Wildml.com |access-date=May 18, 2016 |date=October 27, 2015 |first=Denny |last=Britz}}</ref>

+

thumb|Gated recurrent unit

+

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks introduced in 2014. They are used in the full form and several simplified variants. Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory. They have fewer parameters than LSTM, as they lack an output gate.

+

门控复发单元门控复发单元(GRU)是2014年引入的复发神经网络中的门控机制。它们以完整的形式和几个简化的变体使用。他们在复调音乐建模和语音信号建模方面的表现与长时记忆的表现相似。它们比 LSTM 具有更少的参数，因为它们缺少输出门。

+

===Bi-directional===

+

Bi-directional RNNs use a finite sequence to predict or label each element of the sequence based on the element's past and future contexts. This is done by concatenating the outputs of two RNNs, one processing the sequence from left to right, the other one from right to left. The combined outputs are the predictions of the teacher-given target signals. This technique has been proven to be especially useful when combined with LSTM RNNs.<ref>{{Cite journal |last1=Graves |first1=Alex |last2=Schmidhuber |first2=Jürgen |date=2005-07-01 |title=Framewise phoneme classification with bidirectional LSTM and other neural network architectures |journal=Neural Networks |series=IJCNN 2005 |volume=18 |issue=5 |pages=602–610 |doi=10.1016/j.neunet.2005.06.042|pmid=16112549 |citeseerx=10.1.1.331.5800 }}</ref><ref name="ThireoReczko">{{Cite journal |last1=Thireou |first1=Trias |last2=Reczko |first2=Martin |date=July 2007 |title=Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins |journal=IEEE/ACM Transactions on Computational Biology and Bioinformatics |volume=4 |issue=3 |pages=441–446 |doi=10.1109/tcbb.2007.1015 |pmid=17666763 |s2cid=11787259 }}</ref>

+

Bi-directional RNNs use a finite sequence to predict or label each element of the sequence based on the element's past and future contexts. This is done by concatenating the outputs of two RNNs, one processing the sequence from left to right, the other one from right to left. The combined outputs are the predictions of the teacher-given target signals. This technique has been proven to be especially useful when combined with LSTM RNNs.

+

双向 RNN 使用一个有限的序列来预测或标记序列中的每个元素，基于元素的过去和未来的上下文。这是通过连接两个 RNN 的输出，一个从左到右处理序列，另一个从右到左。综合输出是对教师给出的目标信号的预测。这项技术已被证明是特别有用的时候结合 LSTM RNN。

+

===Continuous-time===

+

A continuous-time recurrent neural network (CTRNN) uses a system of [[ordinary differential equations]] to model the effects on a neuron of the incoming inputs.

+

A continuous-time recurrent neural network (CTRNN) uses a system of ordinary differential equations to model the effects on a neuron of the incoming inputs.

+

= = 连续时间 = = = 连续时间递归神经网络(CTRNN)使用一个常微分方程系统来模拟输入对神经元的影响。

+

For a neuron <math>i</math> in the network with activation <math>y_{i}</math>, the rate of change of activation is given by:

+

:<math>\tau_{i}\dot{y}_{i}=-y_{i}+\sum_{j=1}^{n}w_{ji}\sigma(y_{j}-\Theta_{j})+I_{i}(t)</math>

+

Where:

+

* <math>\tau_{i}</math> : Time constant of [[Synapse|postsynaptic]] node

+

* <math>y_{i}</math> : Activation of postsynaptic node

+

* <math>\dot{y}_{i}</math> : Rate of change of activation of postsynaptic node

+

* <math>w{}_{ji}</math> : Weight of connection from pre to postsynaptic node

+

* <math>\sigma(x)</math> : [[Sigmoid function|Sigmoid]] of x e.g. <math>\sigma(x) = 1/(1+e^{-x})</math>.

+

* <math>y_{j}</math> : Activation of presynaptic node

+

* <math>\Theta_{j}</math> : Bias of presynaptic node

+

* <math>I_{i}(t)</math> : Input (if any) to node

+

For a neuron i in the network with activation y_{i}, the rate of change of activation is given by:

+

:\tau_{i}\dot{y}_{i}=-y_{i}+\sum_{j=1}^{n}w_{ji}\sigma(y_{j}-\Theta_{j})+I_{i}(t)

+

Where:

+

* \tau_{i} : Time constant of postsynaptic node

+

* y_{i} : Activation of postsynaptic node

+

* \dot{y}_{i} : Rate of change of activation of postsynaptic node

+

* w{}_{ji} : Weight of connection from pre to postsynaptic node

+

* \sigma(x) : Sigmoid of x e.g. \sigma(x) = 1/(1+e^{-x}).

+

* y_{j} : Activation of presynaptic node

+

* \Theta_{j} : Bias of presynaptic node

+

* I_{i}(t) : Input (if any) to node

+

对于具有活化 y _ { i }的网络中的神经元 i,其激活速率由: tau _ { i }点{ y } _ { i } =-y _ { i } + sum _ { j = 1} ^ { n } w _ { ji } sigma (y _ { j }-Theta _ { j }) + I _ { i }(t)给出，其中:

+

* tau _ { i } : 突触后节点的时间常数

+

* y _ { i } :突触后节点

+

* 点{ y } _ { i } : 突触后节点激活的变化率

+

* w {} _ { ji } : 从突触前到突触后节点的连接重量

+

* sigma (x) : Sigmoid of x e.g。Σ (x) = 1/(1 + e ^ {-x }).

+

* y _ { j } : 突触前节点的激活

+

* Θ _ { j } : 突触前节点的偏倚

+

* I _ { i }(t) : 对节点的输入(如果有的话)

+

CTRNNs have been applied to [[evolutionary robotics]] where they have been used to address vision,<ref>{{citation |last1=Harvey |first1=Inman |last2=Husbands |first2=Phil |last3=Cliff |first3=Dave |title=3rd international conference on Simulation of adaptive behavior: from animals to animats 3 |year=1994 |pages=392–401 |contribution=Seeing the light: Artificial evolution, real vision |contribution-url=https://www.researchgate.net/publication/229091538_Seeing_the_Light_Artificial_Evolution_Real_Vision }}</ref> co-operation,<ref name="Evolving communication without dedicated communication channels">{{cite book |last=Quinn |first=Matthew |chapter=Evolving communication without dedicated communication channels |journal=Advances in Artificial Life |year=2001 |pages=357–366 |doi=10.1007/3-540-44811-X_38 |series=Lecture Notes in Computer Science |volume=2159 |isbn=978-3-540-42567-0 |citeseerx=10.1.1.28.5890 }}</ref> and minimal cognitive behaviour.<ref name="The dynamics of adaptive behavior: A research program">{{cite journal |first=Randall D. |last=Beer |title=The dynamics of adaptive behavior: A research program |journal=Robotics and Autonomous Systems |year=1997 |pages=257–289 |doi=10.1016/S0921-8890(96)00063-2 |volume=20 |issue=2–4}}</ref>

+

CTRNNs have been applied to evolutionary robotics where they have been used to address vision, co-operation, and minimal cognitive behaviour.

+

CTRNN 已经被应用到进化机器人技术中，它们被用来处理视觉、合作和最小的认知行为。

+

Note that, by the [[Shannon sampling theorem]], discrete time recurrent neural networks can be viewed as continuous-time recurrent neural networks where the differential equations have transformed into equivalent [[difference equation]]s.<ref name="Sherstinsky-NeurIPS2018-CRACT-3">{{cite conference |last=Sherstinsky |first=Alex |title=Deriving the Recurrent Neural Network Definition and RNN Unrolling Using Signal Processing |url=https://www.researchgate.net/publication/331718291 |conference=Critiquing and Correcting Trends in Machine Learning Workshop at NeurIPS-2018 |conference-url=https://ml-critique-correct.github.io/ |editor-last=Bloem-Reddy |editor-first=Benjamin |editor2-last=Paige |editor2-first=Brooks |editor3-last=Kusner |editor3-first=Matt |editor4-last=Caruana |editor4-first=Rich |editor5-last=Rainforth |editor5-first=Tom |editor6-last=Teh |editor6-first=Yee Whye |date=2018-12-07 }}</ref> This transformation can be thought of as occurring after the post-synaptic node activation functions <math>y_i(t)</math> have been low-pass filtered but prior to sampling.

+

Note that, by the Shannon sampling theorem, discrete time recurrent neural networks can be viewed as continuous-time recurrent neural networks where the differential equations have transformed into equivalent difference equations. This transformation can be thought of as occurring after the post-synaptic node activation functions y_i(t) have been low-pass filtered but prior to sampling.

+

注意，根据香农采样定理，离散时间回归神经网络可以看作是连续时间回归神经网络，其中微分方程已经转化为等价的差分方程。可以认为这种转换发生在突触后节点激活函数 y _ i (t)经过低通滤波但在采样之前。

+

===Hierarchical ===

+

Hierarchical RNNs connect their neurons in various ways to decompose hierarchical behavior into useful subprograms.<ref name="schmidhuber1992" /><ref>{{Cite journal |last1=Paine |first1=Rainer W. |last2=Tani |first2=Jun |s2cid=9932565 |date=2005-09-01 |title=How Hierarchical Control Self-organizes in Artificial Adaptive Systems |journal=Adaptive Behavior |volume=13 |issue=3 |pages=211–225 |doi=10.1177/105971230501300303}}</ref> Such hierarchical structures of cognition are present in theories of memory presented by philosopher [[Henri Bergson]], whose philosophical views have inspired hierarchical models.<ref name="auto1">{{Cite web| url=https://www.researchgate.net/publication/328474302 |title= Burns, Benureau, Tani (2018) A Bergson-Inspired Adaptive Time Constant for the Multiple Timescales Recurrent Neural Network Model. JNNS}}</ref>

+

Hierarchical RNNs connect their neurons in various ways to decompose hierarchical behavior into useful subprograms. Such hierarchical structures of cognition are present in theories of memory presented by philosopher Henri Bergson, whose philosophical views have inspired hierarchical models.

+

= = 层次化 = = = 层次化 RNN 以各种方式连接它们的神经元，将层次化行为分解为有用的子程序。这种认知的等级结构存在于哲学家亨利 · 伯格森提出的记忆理论中，他的哲学观点启发了等级模型。

+

===Recurrent multilayer perceptron network===

+

Generally, a recurrent multilayer perceptron network (RMLP) network consists of cascaded subnetworks, each of which contains multiple layers of nodes. Each of these subnetworks is feed-forward except for the last layer, which can have feedback connections. Each of these subnets is connected only by feed-forward connections.<ref>{{cite book |citeseerx=10.1.1.45.3527 |title=Recurrent Multilayer Perceptrons for Identification and Control: The Road to Applications |year=1995 |first=Kurt |last=Tutschku |publisher=University of Würzburg Am Hubland |series=Institute of Computer Science Research Report |volume=118 |date=June 1995 }}</ref>

+

Generally, a recurrent multilayer perceptron network (RMLP) network consists of cascaded subnetworks, each of which contains multiple layers of nodes. Each of these subnetworks is feed-forward except for the last layer, which can have feedback connections. Each of these subnets is connected only by feed-forward connections.

+

一般来说，多层感知机多层感知机网络由级联子网络组成，每个子网络包含多层节点。这些子网络中的每一个都是前馈的，除了最后一层，它可以有反馈连接。每个子网仅通过前馈连接连接。

+

===Multiple timescales model===

+

A multiple timescales recurrent neural network (MTRNN) is a neural-based computational model that can simulate the functional hierarchy of the brain through self-organization that depends on spatial connection between neurons and on distinct types of neuron activities, each with distinct time properties.<ref>{{Cite journal |last1=Yamashita |first1=Yuichi |last2=Tani |first2=Jun |date=2008-11-07 |title=Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment |journal=PLOS Computational Biology |volume=4 |issue=11 |pages=e1000220 |doi=10.1371/journal.pcbi.1000220 |pmc=2570613 |pmid=18989398 |bibcode=2008PLSCB...4E0220Y}}</ref><ref>{{Cite journal |last1=Alnajjar |first1=Fady |last2=Yamashita |first2=Yuichi |last3=Tani |first3=Jun |year=2013 |title=The hierarchical and functional connectivity of higher-order cognitive mechanisms: neurorobotic model to investigate the stability and flexibility of working memory |journal=Frontiers in Neurorobotics |volume=7 |pages=2 |doi=10.3389/fnbot.2013.00002 |pmc=3575058 |pmid=23423881|doi-access=free }}</ref> With such varied neuronal activities, continuous sequences of any set of behaviors are segmented into reusable primitives, which in turn are flexibly integrated into diverse sequential behaviors. The biological approval of such a type of hierarchy was discussed in the [[memory-prediction framework|memory-prediction]] theory of brain function by [[Jeff Hawkins|Hawkins]] in his book ''[[On Intelligence]]''.{{Citation needed |date=June 2017}} Such a hierarchy also agrees with theories of memory posited by philosopher [[Henri Bergson]], which have been incorporated into an MTRNN model.<ref name="auto1"/><ref>{{Cite web| url=http://jnns.org/conference/2018/JNNS2018_Technical_Programs.pdf |title= Proceedings of the 28th Annual Conference of the Japanese Neural Network Society (October, 2018)}}</ref>

+

A multiple timescales recurrent neural network (MTRNN) is a neural-based computational model that can simulate the functional hierarchy of the brain through self-organization that depends on spatial connection between neurons and on distinct types of neuron activities, each with distinct time properties. With such varied neuronal activities, continuous sequences of any set of behaviors are segmented into reusable primitives, which in turn are flexibly integrated into diverse sequential behaviors. The biological approval of such a type of hierarchy was discussed in the memory-prediction theory of brain function by Hawkins in his book On Intelligence. Such a hierarchy also agrees with theories of memory posited by philosopher Henri Bergson, which have been incorporated into an MTRNN model.

+

多时间尺度模型多时间尺度递归神经网络是一种基于神经的计算模型，它可以通过依赖于神经元之间的空间连接和不同类型的神经元活动的自我组织来模拟大脑的功能层次，每种神经元活动都具有不同的时间属性。由于神经元活动的多样性，任何行为集合的连续序列都被分割成可重用的原语，这些原语又被灵活地集成到不同的连续行为中。霍金斯在他的《论智力》一书中，在大脑功能的记忆预测理论中讨论了这种等级制度的生物学认可。这样的等级结构也与哲学家亨利 · 伯格森提出的记忆理论相吻合，这些理论已经被纳入了一个地铁神经网络模型。

+

===Neural Turing machines===

+

Neural Turing machines (NTMs) are a method of extending recurrent neural networks by coupling them to external [[memory]] resources which they can interact with by [[attentional process]]es. The combined system is analogous to a [[Turing machine]] or [[Von Neumann architecture]] but is [[Differentiable neural computer|differentiable]] end-to-end, allowing it to be efficiently trained with [[gradient descent]].<ref>{{cite arXiv |eprint=1410.5401 |title= Neural Turing Machines |last1=Graves |first1=Alex |last2=Wayne |first2=Greg |last3=Danihelka |first3=Ivo |year=2014 |class=cs.NE }}</ref>

+

Neural Turing machines (NTMs) are a method of extending recurrent neural networks by coupling them to external memory resources which they can interact with by attentional processes. The combined system is analogous to a Turing machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent.

+

神经图灵机(NTMs)是一种通过将递归神经网络与外部记忆资源耦合而扩展的方法，它们可以通过注意过程与外部记忆资源相互作用。这种组合系统类似于图灵机或冯·诺伊曼结构，但是端到端可微分，因此可以有效地用梯度下降法进行训练。

+

===Differentiable neural computer===

+

Differentiable neural computers (DNCs) are an extension of Neural Turing machines, allowing for the usage of fuzzy amounts of each memory address and a record of chronology.

+

Differentiable neural computers (DNCs) are an extension of Neural Turing machines, allowing for the usage of fuzzy amounts of each memory address and a record of chronology.

+

可微分神经计算机是神经图灵机的延伸，允许使用模糊数量的每个记忆地址和记录时间。

+

===Neural network pushdown automata===

+

Neural network pushdown automata (NNPDA) are similar to NTMs, but tapes are replaced by analogue stacks that are differentiable and that are trained. In this way, they are similar in complexity to recognizers of [[context free grammar]]s (CFGs).<ref>{{Cite book |title=Adaptive Processing of Sequences and Data Structures |chapter=The Neural Network Pushdown Automaton: Architecture, Dynamics and Training |last1=Sun |first1=Guo-Zheng |last2=Giles |first2=C. Lee |last3=Chen |first3=Hsing-Hen |year=1998 |publisher=Springer |location=Berlin, Heidelberg |isbn=9783540643418 |editor-last=Giles |editor-first=C. Lee |editor-last2=Gori |editor-first2=Marco |series=Lecture Notes in Computer Science |pages=296–345 |doi=10.1007/bfb0054003 |citeseerx=10.1.1.56.8723 }}</ref>

+

Neural network pushdown automata (NNPDA) are similar to NTMs, but tapes are replaced by analogue stacks that are differentiable and that are trained. In this way, they are similar in complexity to recognizers of context free grammars (CFGs).

+

神经网络下推自动机神经网络下推自动机(NNPDA)类似于 NTM，但是磁带被可微的训练的模拟堆栈所取代。通过这种方式，它们在复杂性上类似于上下文无关文法(CFG)的识别器。

+

===Memristive Networks===

+

Greg Snider of [[HP Labs]] describes a system of cortical computing with memristive nanodevices.<ref>{{Citation

+

| last = Snider

+

| first = Greg

+

| title = Cortical computing with memristive nanodevices

+

| journal = Sci-DAC Review

+

| volume = 10

+

| pages = 58–65

+

| year = 2008

+

| url = http://www.scidacreview.org/0804/html/hardware.html

+

}}</ref> The [[memristors]] (memory resistors) are implemented by thin film materials in which the resistance is electrically tuned via the transport of ions or oxygen vacancies within the film. [[DARPA]]'s [[SyNAPSE|SyNAPSE project]] has funded IBM Research and HP Labs, in collaboration with the Boston University Department of Cognitive and Neural Systems (CNS), to develop neuromorphic architectures which may be based on memristive systems.

+

[[Memristive networks]] are a particular type of [[physical neural network]] that have very similar properties to (Little-)Hopfield networks, as they have a continuous dynamics, have a limited memory capacity and they natural relax via the minimization of a function which is asymptotic to the [[Ising model]]. In this sense, the dynamics of a memristive circuit has the advantage compared to a Resistor-Capacitor network to have a more interesting non-linear behavior. From this point of view, engineering an analog memristive networks accounts to a peculiar type of [[neuromorphic engineering]] in which the device behavior depends on the circuit wiring, or topology.

+

<ref>{{cite journal |last1=Caravelli |first1=Francesco |last2=Traversa |first2=Fabio Lorenzo |last3=Di Ventra |first3=Massimiliano |arxiv=1608.08651 |title=The complex dynamics of memristive circuits: analytical results and universal slow relaxation |year=2017 |doi=10.1103/PhysRevE.95.022140 |pmid=28297937 |volume=95 |issue= 2 |pages= 022140 |journal=Physical Review E|bibcode=2017PhRvE..95b2140C |s2cid=6758362 }}</ref><ref>{{Cite journal |last=Caravelli |first=Francesco |date=2019-11-07 |title=Asymptotic Behavior of Memristive Circuits

+

|journal=Entropy |volume=21 |issue=8 |pages=789 |doi= 10.3390/e21080789 |pmid=33267502 |pmc=789|bibcode=2019Entrp..21..789C |doi-access=free }}</ref>

+

Greg Snider of HP Labs describes a system of cortical computing with memristive nanodevices. The memristors (memory resistors) are implemented by thin film materials in which the resistance is electrically tuned via the transport of ions or oxygen vacancies within the film. DARPA's SyNAPSE project has funded IBM Research and HP Labs, in collaboration with the Boston University Department of Cognitive and Neural Systems (CNS), to develop neuromorphic architectures which may be based on memristive systems.

+

Memristive networks are a particular type of physical neural network that have very similar properties to (Little-)Hopfield networks, as they have a continuous dynamics, have a limited memory capacity and they natural relax via the minimization of a function which is asymptotic to the Ising model. In this sense, the dynamics of a memristive circuit has the advantage compared to a Resistor-Capacitor network to have a more interesting non-linear behavior. From this point of view, engineering an analog memristive networks accounts to a peculiar type of neuromorphic engineering in which the device behavior depends on the circuit wiring, or topology.

+

= = 记忆性网络 = = = 惠普实验室的 Greg Snider 描述了一个使用记忆性纳米设备的皮层计算系统。记忆电阻器(存储电阻器)是通过薄膜材料来实现的，其中电阻是通过薄膜内离子或氧空位的传输进行电调谐的。美国国防部高级研究计划局(DARPA)的 SyNAPSE 项目资助了 IBM 研究所和惠普实验室，与波士顿大学认知和神经系统学院(CNS)合作，开发基于记忆系统的神经形态结构。记忆神经网络是一类特殊的物理神经网络，具有与(Little -) Hopfield 网络非常相似的性质，因为它们具有连续的动力学性质，存储容量有限，并且通过渐近于 Ising 模型的函数的最小化而自然松弛。在这个意义上，记忆电路的动力学比电阻-电容网络具有更有趣的非线性行为的优势。从这个角度来看，工程模拟记忆网络属于一种特殊类型的神经形态工程，其中器件的行为取决于电路布线或拓扑结构。

+

==Training==

+

==Training==

+

= = 训练 = =

+

===Gradient descent===

+

Gradient descent is a [[:Category:First order methods|first-order]] [[Iterative algorithm|iterative]] [[Mathematical optimization|optimization]] [[algorithm]] for finding the minimum of a function. In neural networks, it can be used to minimize the error term by changing each weight in proportion to the derivative of the error with respect to that weight, provided the non-linear [[activation function]]s are [[Differentiable function|differentiable]]. Various methods for doing so were developed in the 1980s and early 1990s by [[Paul Werbos|Werbos]], [[Ronald J. Williams|Williams]], [[Tony Robinson (speech recognition)|Robinson]], [[Jürgen Schmidhuber|Schmidhuber]], [[Sepp Hochreiter|Hochreiter]], Pearlmutter and others.

+

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. In neural networks, it can be used to minimize the error term by changing each weight in proportion to the derivative of the error with respect to that weight, provided the non-linear activation functions are differentiable. Various methods for doing so were developed in the 1980s and early 1990s by Werbos, Williams, Robinson, Schmidhuber, Hochreiter, Pearlmutter and others.

+

= = 梯度下降法 = = = 梯度下降法是求函数最小值的一阶迭代优化算法。在神经网络中，只要非线性激活函数是可微的，就可以通过改变每个权重与误差相对于该权重的导数的比例来使误差项最小化。在20世纪80年代和90年代早期，Werbos，Williams，Robinson，Schmidhuber，Hochreiter，Pearlmutter 和其他人开发了各种各样的方法。

+

The standard method is called "[[backpropagation through time]]" or BPTT, and is a generalization of [[back-propagation]] for feed-forward networks.<ref>{{Cite journal|last=Werbos|first=Paul J.|title=Generalization of backpropagation with application to a recurrent gas market model|journal=Neural Networks|volume=1|issue=4|pages=339–356|doi=10.1016/0893-6080(88)90007-x|year=1988|s2cid=205001834 |url=https://www.semanticscholar.org/paper/Learning-representations-by-back-propagating-errors-Rumelhart-Hinton/052b1d8ce63b07fec3de9dbb583772d860b7c769}}</ref><ref>{{cite book |url={{google books |plainurl=y |id=Ff9iHAAACAAJ}} |title=Learning Internal Representations by Error Propagation |last=Rumelhart |first=David E. |publisher=Institute for Cognitive Science, University of California |location=San Diego (CA) |year=1985 }}</ref> Like that method, it is an instance of [[automatic differentiation]] in the reverse accumulation mode of [[Pontryagin's minimum principle]]. A more computationally expensive online variant is called "Real-Time Recurrent Learning" or RTRL,<ref>{{cite book |url={{google books |plainurl=y |id=6JYYMwEACAAJ }} |title=The Utility Driven Dynamic Error Propagation Network |series=Technical Report CUED/F-INFENG/TR.1 |last1=Robinson |first1=Anthony J. |first2=Frank |last2=Fallside |publisher=Department of Engineering, University of Cambridge |year=1987 }}</ref><ref>{{cite book |url={{google books |plainurl=y |id=B71nu3LDpREC}} |title=Backpropagation: Theory, Architectures, and Applications |editor-last1=Chauvin |editor-first1=Yves |editor-last2=Rumelhart |editor-first2=David E. |first1=Ronald J. |last1=Williams |first2=D. |last2=Zipser |contribution=Gradient-based learning algorithms for recurrent networks and their computational complexity |date=1 February 2013 |publisher=Psychology Press |isbn=978-1-134-77581-1 }}</ref> which is an instance of [[automatic differentiation]] in the forward accumulation mode with stacked tangent vectors. Unlike BPTT, this algorithm is local in time but not local in space.

+

The standard method is called "backpropagation through time" or BPTT, and is a generalization of back-propagation for feed-forward networks. Like that method, it is an instance of automatic differentiation in the reverse accumulation mode of Pontryagin's minimum principle. A more computationally expensive online variant is called "Real-Time Recurrent Learning" or RTRL, which is an instance of automatic differentiation in the forward accumulation mode with stacked tangent vectors. Unlike BPTT, this algorithm is local in time but not local in space.

+

标准方法称为“时间反向传播”或 BPTT，是前馈网络反向传播的一种推广。就像这种方法一样，这是 Pontryagin 最低限度原则的反向积累模式的一个自动微分。一个计算更加昂贵的在线变体被称为“实时循环学习”或者 RTRL，这是一个自动微分在堆叠切向量的前向累积模式下的实例。与 BPTT 算法不同，该算法在时间上是局部的，在空间上不是局部的。

+

In this context, local in space means that a unit's weight vector can be updated using only information stored in the connected units and the unit itself such that update complexity of a single unit is linear in the dimensionality of the weight vector. Local in time means that the updates take place continually (on-line) and depend only on the most recent time step rather than on multiple time steps within a given time horizon as in BPTT. Biological neural networks appear to be local with respect to both time and space.<ref>{{Cite journal |last=Schmidhuber |first=Jürgen |s2cid=18721007 |date=1989-01-01 |title=A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks |journal=Connection Science |volume=1 |issue=4 |pages=403–412 |doi=10.1080/09540098908915650 }}</ref><ref name="PríncipeEuliano2000">{{cite book |first1=José C. |last1=Príncipe |first2=Neil R. |last2= Euliano |first3=W. Curt |last3=Lefebvre |title=Neural and adaptive systems: fundamentals through simulations |url={{google books |plainurl=y |id=jgMZAQAAIAAJ}} |year=2000 |publisher=Wiley |isbn=978-0-471-35167-2 }}</ref>

+

In this context, local in space means that a unit's weight vector can be updated using only information stored in the connected units and the unit itself such that update complexity of a single unit is linear in the dimensionality of the weight vector. Local in time means that the updates take place continually (on-line) and depend only on the most recent time step rather than on multiple time steps within a given time horizon as in BPTT. Biological neural networks appear to be local with respect to both time and space.

+

在这种情况下，局部空间意味着一个单元的权重向量可以更新只使用信息存储在连接的单元和单元本身，使更新复杂性的单个单元是线性的维度的权重向量。本地及时意味着更新将持续发生(在线) ，并且仅仅依赖于最近的时间步骤，而不是像 BPTT 中那样依赖于给定时间范围内的多个时间步骤。生物神经网络似乎在时间和空间上都是局部的。

+

For recursively computing the partial derivatives, RTRL has a time-complexity of O(number of hidden x number of weights) per time step for computing the [[Jacobian matrix|Jacobian matrices]], while BPTT only takes O(number of weights) per time step, at the cost of storing all forward activations within the given time horizon.<ref name="Ollivier2015">{{Cite arXiv |last1=Yann |first1=Ollivier |first2=Corentin |last2=Tallec |first3=Guillaume |last3=Charpiat |date=2015-07-28 |title=Training recurrent networks online without backtracking |eprint=1507.07680 |class=cs.NE }}</ref> An online hybrid between BPTT and RTRL with intermediate complexity exists,<ref>{{Cite journal |last=Schmidhuber |first=Jürgen |date=1992-03-01 |title=A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks |journal=Neural Computation |volume=4 |issue=2 |pages=243–248 |doi=10.1162/neco.1992.4.2.243 |s2cid=11761172 }}</ref><ref>{{cite journal |first=Ronald J. |last=Williams |title=Complexity of exact gradient computation algorithms for recurrent neural networks |location=Boston (MA) |series=Technical Report NU-CCS-89-27 |publisher=Northeastern University, College of Computer Science |year=1989 |url=http://citeseerx.ist.psu.edu/showciting?cid=128036 }}</ref> along with variants for continuous time.<ref>{{Cite journal |last=Pearlmutter |first=Barak A. |date=1989-06-01 |title=Learning State Space Trajectories in Recurrent Neural Networks |journal=Neural Computation |volume=1 |issue=2 |pages=263–269 |doi=10.1162/neco.1989.1.2.263 |s2cid=16813485 |url=http://repository.cmu.edu/cgi/viewcontent.cgi?article=2865&context=compsci }}</ref>

+

For recursively computing the partial derivatives, RTRL has a time-complexity of O(number of hidden x number of weights) per time step for computing the Jacobian matrices, while BPTT only takes O(number of weights) per time step, at the cost of storing all forward activations within the given time horizon. An online hybrid between BPTT and RTRL with intermediate complexity exists, along with variants for continuous time.

+

对于递归计算偏导数，RTRL 在计算雅可比矩阵时，每个时间步长的时间复杂度为 O (隐含 x 个权重数) ，而 BPTT 在每个时间步长的时间复杂度仅为 O (权重数) ，代价是在给定的时间范围内存储所有前向激活。BPTT 和 RTRL 之间存在中等复杂度的在线混合，以及连续时间的变体。

+

A major problem with gradient descent for standard RNN architectures is that [[Vanishing gradient problem|error gradients vanish]] exponentially quickly with the size of the time lag between important events.<ref name="hochreiter1991" /><ref name="HOCH2001">{{cite book |chapter-url={{google books |plainurl=y |id=NWOcMVA64aAC }} |title=A Field Guide to Dynamical Recurrent Networks |last=Hochreiter |first=Sepp |display-authors=etal |date=15 January 2001 |publisher=John Wiley & Sons |isbn=978-0-7803-5369-5 |chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies |editor-last2=Kremer |editor-first2=Stefan C. |editor-first1=John F. |editor-last1=Kolen }}</ref> LSTM combined with a BPTT/RTRL hybrid learning method attempts to overcome these problems.<ref name="lstm" /> This problem is also solved in the independently recurrent neural network (IndRNN)<ref name="auto"/> by reducing the context of a neuron to its own past state and the cross-neuron information can then be explored in the following layers. Memories of different range including long-term memory can be learned without the gradient vanishing and exploding problem.

+

A major problem with gradient descent for standard RNN architectures is that error gradients vanish exponentially quickly with the size of the time lag between important events. LSTM combined with a BPTT/RTRL hybrid learning method attempts to overcome these problems. This problem is also solved in the independently recurrent neural network (IndRNN) by reducing the context of a neuron to its own past state and the cross-neuron information can then be explored in the following layers. Memories of different range including long-term memory can be learned without the gradient vanishing and exploding problem.

+

标准 RNN 架构的梯度下降法存在的一个主要问题是，随着重要事件之间的时间间隔越来越大，误差梯度以指数级方式迅速消失。LSTM 结合 BPTT/RTRL 混合学习方法试图克服这些问题。这个问题也可以在独立递归神经网络(indRNN)中解决，方法是将神经元的上下文减少到它自己过去的状态，然后在下面的层次中探索跨神经元的信息。不同范围的记忆，包括长期记忆，可以学习没有梯度消失和爆炸的问题。

+

The on-line algorithm called causal recursive backpropagation (CRBP), implements and combines BPTT and RTRL paradigms for locally recurrent networks.<ref>{{Cite journal |last1=Campolucci |first1=Paolo |last2=Uncini |first2=Aurelio |last3=Piazza |first3=Francesco |last4=Rao |first4=Bhaskar D. |year=1999 |title=On-Line Learning Algorithms for Locally Recurrent Neural Networks |journal=IEEE Transactions on Neural Networks |volume=10 |issue=2 |pages=253–271 |doi=10.1109/72.750549 |pmid=18252525 |citeseerx=10.1.1.33.7550 }}</ref> It works with the most general locally recurrent networks. The CRBP algorithm can minimize the global error term. This fact improves stability of the algorithm, providing a unifying view on gradient calculation techniques for recurrent networks with local feedback.

+

The on-line algorithm called causal recursive backpropagation (CRBP), implements and combines BPTT and RTRL paradigms for locally recurrent networks. It works with the most general locally recurrent networks. The CRBP algorithm can minimize the global error term. This fact improves stability of the algorithm, providing a unifying view on gradient calculation techniques for recurrent networks with local feedback.

+

在线算法称为因果递归反向传播(CRBP) ，实现并结合了局部递归网络的 BPTT 和 RTRL 范式。它与最一般的局部循环网络一起工作。CRBP 算法可以最小化全局误差项。这一事实提高了算法的稳定性，为局部反馈递归网络的梯度计算技术提供了统一的观点。

+

One approach to the computation of gradient information in RNNs with arbitrary architectures is based on signal-flow graphs diagrammatic derivation.<ref>{{Cite journal |last1=Wan |first1=Eric A. |last2=Beaufays |first2=Françoise |year=1996 |title=Diagrammatic derivation of gradient algorithms for neural networks |journal=Neural Computation |volume=8 |pages=182–201 |doi=10.1162/neco.1996.8.1.182 |s2cid=15512077 }}</ref> It uses the BPTT batch algorithm, based on Lee's theorem for network sensitivity calculations.<ref name="ReferenceA">{{Cite journal |last1=Campolucci |first1=Paolo |last2=Uncini |first2=Aurelio |last3=Piazza |first3=Francesco |year=2000 |title=A Signal-Flow-Graph Approach to On-line Gradient Calculation |journal=Neural Computation |volume=12 |issue=8 |pages=1901–1927 |doi=10.1162/089976600300015196 |pmid=10953244 |citeseerx=10.1.1.212.5406 |s2cid=15090951 }}</ref> It was proposed by Wan and Beaufays, while its fast online version was proposed by Campolucci, Uncini and Piazza.<ref name="ReferenceA"/>

+

One approach to the computation of gradient information in RNNs with arbitrary architectures is based on signal-flow graphs diagrammatic derivation. It uses the BPTT batch algorithm, based on Lee's theorem for network sensitivity calculations. It was proposed by Wan and Beaufays, while its fast online version was proposed by Campolucci, Uncini and Piazza.

+

在任意结构的神经网络中，一种计算梯度信息的方法是基于信号流图的图解推导。它采用 BPTT 批处理算法，基于李定理进行网络灵敏度计算。它是由万和博费提出的，而它的快速在线版本是由坎波卢奇，昂奇尼和广场提出的。

+

===Global optimization methods===

+

Training the weights in a neural network can be modeled as a non-linear [[global optimization]] problem. A target function can be formed to evaluate the fitness or error of a particular weight vector as follows: First, the weights in the network are set according to the weight vector. Next, the network is evaluated against the training sequence. Typically, the sum-squared-difference between the predictions and the target values specified in the training sequence is used to represent the error of the current weight vector. Arbitrary global optimization techniques may then be used to minimize this target function.

+

Training the weights in a neural network can be modeled as a non-linear global optimization problem. A target function can be formed to evaluate the fitness or error of a particular weight vector as follows: First, the weights in the network are set according to the weight vector. Next, the network is evaluated against the training sequence. Typically, the sum-squared-difference between the predictions and the target values specified in the training sequence is used to represent the error of the current weight vector. Arbitrary global optimization techniques may then be used to minimize this target function.

+

= = 全局优化方法 = = = 训练神经网络中的权重可以被建模为非线性全局最佳化问题。通过构造一个目标函数来评估特定权重向量的适应度或误差: 首先，根据权重向量设置网络中的权重;。接下来，根据训练序列对网络进行评估。通常，预测值与训练序列中指定的目标值之间的和平方差用于表示当前权重向量的误差。然后可以使用任意的全局优化技术来使这个目标函数最小化。

+

The most common global optimization method for training RNNs is [[genetic algorithm]]s, especially in unstructured networks.<ref>{{citation |title=IJCAI 99 |year=1999 |last1=Gomez |first1=Faustino J. |last2=Miikkulainen |first2=Risto |contribution=Solving non-Markovian control tasks with neuroevolution |contribution-url=http://www.cs.utexas.edu/users/nn/downloads/papers/gomez.ijcai99.pdf |publisher=Morgan Kaufmann |access-date=5 August 2017 }}</ref><ref>{{cite web |url=http://arimaa.com/arimaa/about/Thesis/ |title=Applying Genetic Algorithms to Recurrent Neural Networks for Learning Network Parameters and Architecture |last=Syed |first=Omar |publisher=M.Sc. thesis, Department of Electrical Engineering, Case Western Reserve University, Advisor Yoshiyasu Takefuji |date=May 1995 }}</ref><ref>{{Cite journal |last1=Gomez |first1=Faustino J. |last2=Schmidhuber |first2=Jürgen |last3=Miikkulainen |first3=Risto |date=June 2008 |title=Accelerated Neural Evolution Through Cooperatively Coevolved Synapses |url=http://dl.acm.org/citation.cfm?id=1390681.1390712 |journal=Journal of Machine Learning Research |volume=9 |pages=937–965 }}</ref>

+

The most common global optimization method for training RNNs is genetic algorithms, especially in unstructured networks.

+

遗传算法是训练神经网络最常用的全局优化方法，尤其是在非结构化网络中。

+

Initially, the genetic algorithm is encoded with the neural network weights in a predefined manner where one gene in the [[Chromosome (genetic algorithm)|chromosome]] represents one weight link. The whole network is represented as a single chromosome. The fitness function is evaluated as follows:

+

Initially, the genetic algorithm is encoded with the neural network weights in a predefined manner where one gene in the chromosome represents one weight link. The whole network is represented as a single chromosome. The fitness function is evaluated as follows:

+

最初，遗传算法是编码与神经网络的权重在一个预定义的方式，其中一个基因在染色体代表一个权重链接。整个网络被表示为一个单一的染色体。对健身功能的评估如下:

+

* Each weight encoded in the chromosome is assigned to the respective weight link of the network.

+

* The training set is presented to the network which propagates the input signals forward.

+

* The mean-squared-error is returned to the fitness function.

+

* This function drives the genetic selection process.

+

Many chromosomes make up the population; therefore, many different neural networks are evolved until a stopping criterion is satisfied. A common stopping scheme is:

+

* When the neural network has learnt a certain percentage of the training data or

+

* When the minimum value of the mean-squared-error is satisfied or

+

* When the maximum number of training generations has been reached.

+

The stopping criterion is evaluated by the fitness function as it gets the reciprocal of the mean-squared-error from each network during training. Therefore, the goal of the genetic algorithm is to maximize the fitness function, reducing the mean-squared-error.

+

* Each weight encoded in the chromosome is assigned to the respective weight link of the network.

+

* The training set is presented to the network which propagates the input signals forward.

+

* The mean-squared-error is returned to the fitness function.

+

* This function drives the genetic selection process.

+

Many chromosomes make up the population; therefore, many different neural networks are evolved until a stopping criterion is satisfied. A common stopping scheme is:

+

* When the neural network has learnt a certain percentage of the training data or

+

* When the minimum value of the mean-squared-error is satisfied or

+

* When the maximum number of training generations has been reached.

+

The stopping criterion is evaluated by the fitness function as it gets the reciprocal of the mean-squared-error from each network during training. Therefore, the goal of the genetic algorithm is to maximize the fitness function, reducing the mean-squared-error.

+

* 染色体中编码的每个权重被分配给网络中各自的权重链接。

+

* 训练集呈现给网络，网络将输入信号向前传播。

+

* 均方误差返回到适应度函数。

+

* 这个功能驱动基因选择过程。许多染色体组成人口，因此，许多不同的神经网络进化，直到一个停止的标准是满足的。一个常见的停止方案是:

+

* 当神经网络学习了一定百分比的训练数据或

+

* 当均方误差的最小值满足或

+

* 当训练代数达到最大数。停止准则由适应度函数计算，得到训练过程中每个网络的均方误差的倒数。因此，遗传算法的目标是最大化适应度函数，减少均方误差。

+

Other global (and/or evolutionary) optimization techniques may be used to seek a good set of weights, such as [[simulated annealing]] or [[particle swarm optimization]].

+

Other global (and/or evolutionary) optimization techniques may be used to seek a good set of weights, such as simulated annealing or particle swarm optimization.

+

其他全局(和/或进化)优化技术可用于寻找一组好的权重，如模拟退火或粒子群优化。

+

==Related fields and models==

+

RNNs may behave [[chaos theory|chaotically]]. In such cases, [[dynamical systems theory]] may be used for analysis.

+

RNNs may behave chaotically. In such cases, dynamical systems theory may be used for analysis.

+

= = 相关字段和模型 = = RNN 可能表现混沌。在这种情况下，动态系统理论可用于分析。

+

They are in fact [[recursive neural network]]s with a particular structure: that of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time step and a hidden representation into the representation for the current time step.

+

They are in fact recursive neural networks with a particular structure: that of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time step and a hidden representation into the representation for the current time step.

+

它们实际上是具有特殊结构的递归神经网络: 线性链的递归神经网络。递归神经网络在任何层次结构上运行，将子代表示与父代表示相结合，而递归神经网络在线性时间序列上运行，将前一个时间步和一个隐藏的表示结合到当前时间步的表示中。

+

In particular, RNNs can appear as nonlinear versions of [[finite impulse response]] and [[infinite impulse response]] filters and also as a [[nonlinear autoregressive exogenous model]] (NARX).<ref>{{cite journal |url={{google books |plainurl=y |id=830-HAAACAAJ |page=208}} |title=Computational Capabilities of Recurrent NARX Neural Networks |last1=Siegelmann |first1=Hava T. |last2=Horne |first2=Bill G. |last3=Giles |first3=C. Lee |journal= IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics)|volume=27 |issue=2 |pages=208–15 |year=1995 |pmid=18255858 |doi=10.1109/3477.558801 |citeseerx=10.1.1.48.7468 }}</ref>

+

In particular, RNNs can appear as nonlinear versions of finite impulse response and infinite impulse response filters and also as a nonlinear autoregressive exogenous model (NARX).

+

特别是，RNN 可以作为有限脉冲响应和无限脉冲响应泸波器过滤器的非线性版本，也可以作为非线性自回归外生模型(nARX)出现。

+

==Libraries==

+

* [[Apache Singa]]

+

* [[Caffe (software)|Caffe]]: Created by the Berkeley Vision and Learning Center (BVLC). It supports both CPU and GPU. Developed in [[C++]], and has [[Python (programming language)|Python]] and [[MATLAB]] wrappers.

+

* [[Chainer]]: The first stable deep learning library that supports dynamic, define-by-run neural networks. Fully in Python, production support for CPU, GPU, distributed training.

+

* [[Deeplearning4j]]: Deep learning in [[Java (programming language)|Java]] and [[Scala (programming language)|Scala]] on multi-GPU-enabled [[Apache Spark|Spark]]. A general-purpose [http://deeplearning4j.org/ deep learning library] for the [[Java virtual machine|JVM]] production stack running on a [https://github.com/deeplearning4j/libnd4j C++ scientific computing engine]. Allows the creation of custom layers. Integrates with [[Hadoop]] and [[Apache Kafka|Kafka]].

+

*[[Flux (machine-learning framework)|Flux]]: includes interfaces for RNNs, including GRUs and LSTMs, written in [[Julia (programming language)|Julia]].

+

* [[Keras]]: High-level, easy to use API, providing a wrapper to many other deep learning libraries.

+

* [[Microsoft Cognitive Toolkit]]

+

* [[MXNet]]: a modern open-source deep learning framework used to train and deploy deep neural networks.

+

* [[PyTorch]]: Tensors and Dynamic neural networks in Python with strong GPU acceleration.

+

* [[TensorFlow]]: Apache 2.0-licensed Theano-like library with support for CPU, GPU and Google's proprietary [[Tensor processing unit|TPU]],<ref>{{cite news |url=https://www.wired.com/2016/05/google-tpu-custom-chips/ |first=Cade |last=Metz |newspaper=Wired |date=May 18, 2016 |title=Google Built Its Very Own Chips to Power Its AI Bots }}</ref> mobile

+

* [[Theano (software)|Theano]]: The reference deep-learning library for Python with an API largely compatible with the popular [[NumPy]] library. Allows user to write symbolic mathematical expressions, then automatically generates their derivatives, saving the user from having to code gradients or backpropagation. These symbolic expressions are automatically compiled to CUDA code for a fast, on-the-GPU implementation.

+

* [[Torch (machine learning)|Torch]] ([http://www.torch.ch/ www.torch.ch]): A scientific computing framework with wide support for machine learning algorithms, written in [[C (programming language)|C]] and [[Lua (programming language)|lua]]. The main author is Ronan Collobert, and it is now used at Facebook AI Research and Twitter.

+

* Apache Singa

+

* Caffe: Created by the Berkeley Vision and Learning Center (BVLC). It supports both CPU and GPU. Developed in C++, and has Python and MATLAB wrappers.

+

* Chainer: The first stable deep learning library that supports dynamic, define-by-run neural networks. Fully in Python, production support for CPU, GPU, distributed training.

+

* Deeplearning4j: Deep learning in Java and Scala on multi-GPU-enabled Spark. A general-purpose deep learning library for the JVM production stack running on a C++ scientific computing engine. Allows the creation of custom layers. Integrates with Hadoop and Kafka.

+

*Flux: includes interfaces for RNNs, including GRUs and LSTMs, written in Julia.

+

* Keras: High-level, easy to use API, providing a wrapper to many other deep learning libraries.

+

* Microsoft Cognitive Toolkit

+

* MXNet: a modern open-source deep learning framework used to train and deploy deep neural networks.

+

* PyTorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration.

+

* TensorFlow: Apache 2.0-licensed Theano-like library with support for CPU, GPU and Google's proprietary TPU, mobile

+

* Theano: The reference deep-learning library for Python with an API largely compatible with the popular NumPy library. Allows user to write symbolic mathematical expressions, then automatically generates their derivatives, saving the user from having to code gradients or backpropagation. These symbolic expressions are automatically compiled to CUDA code for a fast, on-the-GPU implementation.

+

* Torch (www.torch.ch): A scientific computing framework with wide support for machine learning algorithms, written in C and lua. The main author is Ronan Collobert, and it is now used at Facebook AI Research and Twitter.

+

= = 图书馆 = =

+

* Apache Singa

+

* 咖啡馆: 由伯克利视觉与学习中心(BVLC)创建。它同时支持 CPU 和 GPU。用 C + + 开发，并具有 Python 和 MATLAB 包装器。Chainer: 第一个稳定的深度学习库，支持动态的，逐步定义的神经网络。完全在 Python 中，生产支持 CPU，GPU，分布式培训。

+

* Deeplearning4j: Java 和 Scala 中支持多 GPU 的 Spark 的深度学习。用于在 C + + 科学计算引擎上运行的 JVM 产品堆栈的通用深度学习库。允许创建自定义层。集成 Hadoop 和 Kafka。

+

* Flux: 包括用 Julia 编写的 RNN 接口，包括 GRU 和 LSTM。

+

* Kera: 高级、易于使用的 API，为许多其他深度学习库提供了一个包装器。

+

* 微软认知工具包

+

* MXNet: 一个现代的开源深度学习框架，用于训练和部署深度神经网络。

+

* PyTorch: 具有强 GPU 加速的 Python 中的张量和动态神经网络。

+

* TensorFlow: Apache 2.0许可的 Theano-like 库，支持 CPU，GPU 和 Google 专有的 TPU，mobile

+

* Theano: Python 的参考深度学习库，其 API 与流行的 NumPy 库基本兼容。允许用户编写符号数学表达式，然后自动生成它们的导数，节省了用户必须编码梯度或反向传播。这些符号表达式会自动编译成 CUDA 代码，用于快速、基于 GPU 的实现。

+

* Torch (www.Torch.ch) : 一个广泛支持机器学习算法的科学计算框架，用 C 和 lua 编写。本文的主要作者是 Ronan Collobert，它现在被 Facebook AI Research 和 Twitter 所使用。

+

==Applications==

+

Applications of recurrent neural networks include:

+

*[[Machine translation]]<ref name="sutskever2014"/>

+

*[[Robot control]]<ref>{{Cite book |last1=Mayer |first1=Hermann |last2=Gomez |first2=Faustino J. |last3=Wierstra |first3=Daan |last4=Nagy |first4=Istvan |last5=Knoll |first5=Alois |last6=Schmidhuber |first6=Jürgen |date=October 2006 |title=A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks |journal=2006 IEEE/RSJ International Conference on Intelligent Robots and Systems |pages=543–548 |doi=10.1109/IROS.2006.282190 |isbn=978-1-4244-0258-8 |citeseerx=10.1.1.218.3399 |s2cid=12284900 }}</ref>

+

*[[Time series prediction]]<ref>{{Cite journal |last1=Wierstra |first1=Daan |last2=Schmidhuber |first2=Jürgen |last3=Gomez |first3=Faustino J. |year=2005 |title=Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning |url=https://www.academia.edu/5830256 |journal=Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh |pages=853–858 }}</ref><ref>{{cite arXiv |last=Petneházi |first=Gábor |title=Recurrent neural networks for time series forecasting |date=2019-01-01 |eprint=1901.00069 |class=cs.LG }}</ref><ref>{{cite journal |last1=Hewamalage |first1=Hansika |last2=Bergmeir |first2=Christoph |last3=Bandara |first3=Kasun |title=Recurrent Neural Networks for Time Series Forecasting: Current Status and Future Directions |journal=International Journal of Forecasting |year=2020 |volume=37 |pages=388–427 |doi=10.1016/j.ijforecast.2020.06.008 |arxiv=1909.00590 |s2cid=202540863 }}</ref>

+

*[[Speech recognition]]<ref>{{cite journal |last1=Graves |first1=Alex |last2=Schmidhuber |first2=Jürgen |year=2005 |title=Framewise phoneme classification with bidirectional LSTM and other neural network architectures |journal=Neural Networks |volume=18 |issue=5–6 |pages=602–610 |doi=10.1016/j.neunet.2005.06.042 |pmid=16112549 |citeseerx=10.1.1.331.5800 }}</ref><ref>{{Cite book |last1=Fernández |first1=Santiago |last2=Graves |first2=Alex |last3=Schmidhuber |first3=Jürgen |year=2007 |title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting |url=http://dl.acm.org/citation.cfm?id=1778066.1778092 |journal=Proceedings of the 17th International Conference on Artificial Neural Networks |series=ICANN'07 |location=Berlin, Heidelberg |publisher=Springer-Verlag |pages=220–229 |isbn=978-3540746935 }}</ref><ref name="graves2013">{{cite journal |last1=Graves |first1=Alex |last2=Mohamed |first2=Abdel-rahman |last3=Hinton |first3=Geoffrey E. |year=2013 |title=Speech Recognition with Deep Recurrent Neural Networks |journal=Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on |pages=6645–6649 |arxiv=1303.5778 |bibcode=2013arXiv1303.5778G |doi=10.1109/ICASSP.2013.6638947 |isbn=978-1-4799-0356-6 |s2cid=206741496 }}</ref>

+

*[[Speech synthesis]]<ref>{{Cite journal |last1=Chang |first1=Edward F. |last2=Chartier |first2=Josh |last3=Anumanchipalli |first3=Gopala K. |date=24 April 2019 |title=Speech synthesis from neural decoding of spoken sentences |journal=Nature |language=en |volume=568 |issue=7753 |pages=493–498 |doi=10.1038/s41586-019-1119-1 |pmid=31019317 |issn=1476-4687 |bibcode=2019Natur.568..493A |s2cid=129946122 }}</ref>

+

*[[Brain–computer interfaces]]<ref>Moses, David A., Sean L. Metzger, Jessie R. Liu, Gopala K. Anumanchipalli, Joseph G. Makin, Pengfei F. Sun, Josh Chartier, et al. "Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria." New England Journal of Medicine 385, no. 3 (July 15, 2021): 217–27. https://doi.org/10.1056/NEJMoa2027540.

+

</ref>

+

*Time series anomaly detection<ref>{{Cite journal|last1=Malhotra |first1=Pankaj |last2=Vig |first2=Lovekesh |last3=Shroff |first3=Gautam |last4=Agarwal |first4=Puneet |date=April 2015 |title=Long Short Term Memory Networks for Anomaly Detection in Time Series |url=https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2015-56.pdf |journal=European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning — ESANN 2015 }}</ref>

+

*Rhythm learning<ref name="peephole2002">{{cite journal |last1=Gers |first1=Felix A. |last2=Schraudolph |first2=Nicol N. |last3=Schmidhuber |first3=Jürgen |year=2002 |title=Learning precise timing with LSTM recurrent networks |url=http://www.jmlr.org/papers/volume3/gers02a/gers02a.pdf |journal=Journal of Machine Learning Research |volume=3 |pages=115–143 }}</ref>

+

*Music composition<ref>{{Cite book |last1=Eck |first1=Douglas |last2=Schmidhuber |first2=Jürgen |date=2002-08-28 |title=Learning the Long-Term Structure of the Blues |journal=Artificial Neural Networks — ICANN 2002 |publisher=Springer |location=Berlin, Heidelberg |pages=284–289 |doi=10.1007/3-540-46084-5_47 |isbn=978-3540460848 |series=Lecture Notes in Computer Science |volume=2415 |citeseerx=10.1.1.116.3620 }}</ref>

+

*Grammar learning<ref>{{cite journal |last1=Schmidhuber |first1=Jürgen |last2=Gers |first2=Felix A. |last3=Eck |first3=Douglas |year=2002 |title=Learning nonregular languages: A comparison of simple recurrent networks and LSTM |journal=Neural Computation |volume=14 |issue=9 |pages=2039–2041 |doi=10.1162/089976602320263980 |pmid=12184841 |citeseerx=10.1.1.11.7369 |s2cid=30459046 }}</ref><ref name="peepholeLSTM">{{cite journal |last1=Gers |first1=Felix A. |last2=Schmidhuber |first2=Jürgen |year=2001 |title=LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages |url=ftp://ftp.idsia.ch/pub/juergen/L-IEEE.pdf |journal=IEEE Transactions on Neural Networks |volume=12 |issue=6 |pages=1333–1340 |doi=10.1109/72.963769 |pmid=18249962 }}</ref><ref>{{cite journal |last1=Pérez-Ortiz |first1=Juan Antonio |last2=Gers |first2=Felix A. |last3=Eck |first3=Douglas |last4=Schmidhuber |first4=Jürgen |year=2003 |title=Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets |journal=Neural Networks |volume=16 |issue=2 |pages=241–250 |doi=10.1016/s0893-6080(02)00219-8 |pmid=12628609 |citeseerx=10.1.1.381.1992 }}</ref>

+

*[[Handwriting recognition]]<ref>{{cite journal |first1=Alex |last1=Graves |first2=Jürgen |last2=Schmidhuber |title=Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks |journal=Advances in Neural Information Processing Systems 22, NIPS'22 |pages=545–552 |location=Vancouver (BC) |publisher=MIT Press |year=2009 }}</ref><ref>{{Cite book |last1=Graves |first1=Alex |last2=Fernández |first2=Santiago |last3=Liwicki |first3=Marcus |last4=Bunke |first4=Horst |last5=Schmidhuber |first5=Jürgen |year=2007 |title=Unconstrained Online Handwriting Recognition with Recurrent Neural Networks |url=http://dl.acm.org/citation.cfm?id=2981562.2981635 |journal=Proceedings of the 20th International Conference on Neural Information Processing Systems |series=NIPS'07 |publisher=Curran Associates Inc. |pages=577–584 |isbn=9781605603520 }}</ref>

+

*Human action recognition<ref>{{cite journal |first1=Moez |last1=Baccouche |first2=Franck |last2=Mamalet |first3=Christian |last3=Wolf |first4=Christophe |last4=Garcia |first5=Atilla |last5=Baskurt |title=Sequential Deep Learning for Human Action Recognition |journal=2nd International Workshop on Human Behavior Understanding (HBU) |editor-first1=Albert Ali |editor-last1=Salah |editor-first2=Bruno |editor-last2=Lepri |location=Amsterdam, Netherlands |pages=29–39 |series=Lecture Notes in Computer Science |volume=7065 |publisher=Springer |year=2011 |doi=10.1007/978-3-642-25446-8_4 |isbn=978-3-642-25445-1 }}</ref>

+

*Protein homology detection<ref>{{Cite journal

+

| last1 = Hochreiter | first1 = Sepp

+

| last2 = Heusel | first2 = Martin

+

| last3 = Obermayer | first3 = Klaus

+

| doi = 10.1093/bioinformatics/btm247

+

| title = Fast model-based protein homology detection without alignment

+

| journal = Bioinformatics

+

| volume = 23

+

| issue = 14

+

| pages = 1728–1736

+

| year = 2007

+

| pmid = 17488755

+

| doi-access = free

+

}}</ref>

+

*Predicting subcellular localization of proteins<ref name="ThireoReczko" />

+

*Several prediction tasks in the area of business process management<ref>{{cite book |last1=Tax |first1=Niek |last2=Verenich |first2=Ilya |last3=La Rosa |first3=Marcello |last4=Dumas |first4=Marlon |year=2017 |title=Predictive Business Process Monitoring with LSTM neural networks |journal=Proceedings of the International Conference on Advanced Information Systems Engineering (CAiSE) |pages=477–492 |doi=10.1007/978-3-319-59536-8_30 |series=Lecture Notes in Computer Science |volume=10253 |isbn=978-3-319-59535-1 |arxiv=1612.02130 |s2cid=2192354 }}</ref>

+

*Prediction in medical care pathways<ref>{{cite journal |last1=Choi |first1=Edward |last2=Bahadori |first2=Mohammad Taha |last3=Schuetz |first3=Andy |last4=Stewart |first4=Walter F. |last5=Sun |first5=Jimeng |year=2016 |title=Doctor AI: Predicting Clinical Events via Recurrent Neural Networks |url=http://proceedings.mlr.press/v56/Choi16.html |journal=Proceedings of the 1st Machine Learning for Healthcare Conference |volume=56 |pages=301–318 |bibcode=2015arXiv151105942C |arxiv=1511.05942 |pmid=28286600 |pmc=5341604 }}</ref>

+

Applications of recurrent neural networks include:

+

*Machine translation

+

*Robot control

+

*Time series prediction

+

*Speech recognition

+

*Speech synthesis

+

*Brain–computer interfacesMoses, David A., Sean L. Metzger, Jessie R. Liu, Gopala K. Anumanchipalli, Joseph G. Makin, Pengfei F. Sun, Josh Chartier, et al. "Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria." New England Journal of Medicine 385, no. 3 (July 15, 2021): 217–27. https://doi.org/10.1056/NEJMoa2027540.

+

*Time series anomaly detection

+

*Rhythm learning

+

*Music composition

+

*Grammar learning

+

*Handwriting recognition

+

*Human action recognition

+

*Protein homology detection

+

*Predicting subcellular localization of proteins

+

*Several prediction tasks in the area of business process management

+

*Prediction in medical care pathways

+

= = 应用 = = 循环神经网络的应用包括:

+

* 机器翻译

+

* 机器人控制

+

* 时间序列预测

+

* 语音识别

+

* 语音合成

+

* 大脑-计算机接口摩西，大卫 A。神经假体用于解码患有关节炎的瘫痪患者的言语新英格兰医学杂志385号，不。3(July 15,2021) : 217-27(2021年7月15日)。Https://doi.org/10.1056/nejmoa2027540.

+

* 时间序列异常检测

+

* 节奏学习

+

* 音乐创作

+

* 语法学习

+

* 手写识别

+

* 人类行为识别

+

* 蛋白质超家族检测

+

* 预测蛋白质的亚细胞定位

+

* 业务流程管理领域的几项预测任务

+

* 医疗保健途径中的预测

+

==References==

+

==Further reading==

+

* {{cite book |last1=Mandic |first1=Danilo P. |last2=Chambers |first2=Jonathon A. |name-list-style=amp |title=Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability |publisher=Wiley |year=2001 |isbn=978-0-471-49517-8 }}

+

*

+

= = 延伸阅读 = =

+

*

+

==External links==

+

*[http://www.idsia.ch/~juergen/rnn.html Recurrent Neural Networks] with over 60 RNN papers by [[Jürgen Schmidhuber]]'s group at [[Dalle Molle Institute for Artificial Intelligence Research]]

+

*[http://jsalatas.ictpro.gr/weka Elman Neural Network implementation] for [[WEKA]]

+

*Recurrent Neural Networks with over 60 RNN papers by Jürgen Schmidhuber's group at Dalle Molle Institute for Artificial Intelligence Research

+

*Elman Neural Network implementation for WEKA

+

= = 外部链接 = =

+

* 回归神经网络与超过60个 RNN 论文的于尔根·施密德胡伯小组在戴尔莫尔人工智能研究所

+

* 埃尔曼神经网络的 WEKA 实施

+

[[Category:Artificial intelligence]]

+

[[Category:Artificial neural networks]]

+

Category:Artificial intelligence

+

Category:Artificial neural networks

+

类别: 人工智能类别: 人工神经网络

−

<small>This page was moved from [[wikipedia:en:Recurrent neural ~~networks~~]]. Its edit history can be viewed at [[循环神经网络/edithistory]]</small></noinclude>

+

<small>This page was moved from [[wikipedia:en:Recurrent neural network]]. Its edit history can be viewed at [[循环神经网络/edithistory]]</small></noinclude>

[[Category:待整理页面]]

Moonscar

1,564

个编辑

更改

循环神经网络 (查看源代码)

2022年7月4日 (一) 10:50的版本

导航菜单

搜索