第143行: |
第143行: |
| 一个[https://en.wikipedia.org/wiki/Deep_neural_network 深度神经网络]可以使用标准反向传播算法判别地训练。反向传播是一种计算关于ANN中权重的[https://en.wikipedia.org/wiki/Loss_function 损失函数](产生与给定状态相联系的损失)[https://en.wikipedia.org/wiki/Gradient 梯度]的方法。 | | 一个[https://en.wikipedia.org/wiki/Deep_neural_network 深度神经网络]可以使用标准反向传播算法判别地训练。反向传播是一种计算关于ANN中权重的[https://en.wikipedia.org/wiki/Loss_function 损失函数](产生与给定状态相联系的损失)[https://en.wikipedia.org/wiki/Gradient 梯度]的方法。 |
| | | |
− | 连续反向传播的基础<ref name="SCHIDHUB2"/><ref name="scholarpedia2">{{cite journal|year=2015|title=Deep Learning|url=http://www.scholarpedia.org/article/Deep_Learning|journal=Scholarpedia|volume=10|issue=11|page=32832|last1=Schmidhuber|first1=Jürgen|authorlink=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber}}</ref><ref name=":5">{{Cite journal|last=Dreyfus|first=Stuart E.|date=1990-09-01|title=Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure|url=http://arc.aiaa.org/doi/10.2514/3.25422|journal=Journal of Guidance, Control, and Dynamics|volume=13|issue=5|pages=926–928}}</ref><ref name="mizutani2000">Eiji Mizutani, [https://en.wikipedia.org/wiki/Stuart_Dreyfus Stuart Dreyfus], Kenichi Nishio (2000). On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2000), Como Italy, July 2000. [http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/ijcnn2k.pdf Online]</ref> 由[https://en.wikipedia.org/wiki/Henry_J._Kelley Kelley]<ref name="kelley1960">{{cite journal|year=1960|title=Gradient theory of optimal flight paths|url=http://arc.aiaa.org/doi/abs/10.2514/8.5282?journalCode=arsj|journal=Ars Journal|volume=30|issue=10|pages=947–954|last1=Kelley|first1=Henry J.|authorlink=https://en.wikipedia.org/wiki/Henry_J._Kelley}}</ref> 在1960和[https://en.wikipedia.org/wiki/Arthur_E._Bryson Bryson]在1961<ref name="bryson1961">[https://en.wikipedia.org/wiki/Arthur_E._Bryson Arthur E. Bryson] (1961, April). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard Univ. Symposium on digital computers and their applications.</ref>使用[https://en.wikipedia.org/wiki/Chain_rule 动态编程]的原则从[https://en.wikipedia.org/wiki/Control_theory 控制论]引出。1962,[https://en.wikipedia.org/wiki/Stuart_Dreyfus Dreyfus]发表了只基于[https://en.wikipedia.org/wiki/Chain_rule 链式法则]<ref name="dreyfus1962">{{cite journal|year=1962|title=The numerical solution of variational problems|url=https://www.researchgate.net/publication/256244271_The_numerical_solution_of_variational_problems|journal=Journal of Mathematical Analysis and Applications|volume=5|issue=1|pages=30–45|last1=Dreyfus|first1=Stuart|authorlink=https://en.wikipedia.org/wiki/Stuart_Dreyfus}}</ref>的更简单的衍生。1969,Bryson和[https://en.wikipedia.org/wiki/Yu-Chi_Ho Ho]把它描述成一种多级动态系统优化方法。<ref>{{cite book|url=https://books.google.com/books?id=8jZBksh-bUMC&pg=PA578|title=Artificial Intelligence A Modern Approach|last2=Norvig|first2=Peter|publisher=Prentice Hall|year=2010|page=578|quote=The most popular method for learning in multilayer networks is called Back-propagation.|author-link2=https://en.wikipedia.org/wiki/Peter_Norvig|first1=Stuart J.|last1=Russell|author-link1=https://en.wikipedia.org/wiki/Stuart_J._Russell}}</ref><ref name="Bryson1969">{{cite book|url=https://books.google.com/books?id=1bChDAEACAAJ&pg=PA481|title=Applied Optimal Control: Optimization, Estimation and Control|last=Bryson|first=Arthur Earl|publisher=Blaisdell Publishing Company or Xerox College Publishing|year=1969|page=481}}</ref>1970,[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Linnainmaa]最终发表了嵌套[https://en.wikipedia.org/wiki/Differentiable_function 可微函数]<ref name="lin1970">[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Seppo Linnainmaa] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6–7.</ref><ref name="lin1976">{{cite journal|year=1976|title=Taylor expansion of the accumulated rounding error|url=|journal=BIT Numerical Mathematics|volume=16|issue=2|pages=146–160|last1=Linnainmaa|first1=Seppo|authorlink=https://en.wikipedia.org/wiki/Seppo_Linnainmaa}}</ref> 的离散连接网络[https://en.wikipedia.org/wiki/Automatic_differentiation 自动差分机](AD)的通用方法。这对应于反向传播的现代版本,它在网络稀疏时仍有效<ref name="SCHIDHUB2"/><ref name="scholarpedia2"/><ref name="grie2012">{{Cite journal|last=Griewank|first=Andreas|date=2012|title=Who Invented the Reverse Mode of Differentiation?|url=http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|journal=Documenta Matematica, Extra Volume ISMP|volume=|pages=389–400|via=}}</ref><ref name="grie2008">{{cite book|url=https://books.google.com/books?id=xoiiLaRxcbEC|title=Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition|last2=Walther|first2=Andrea|publisher=SIAM|year=2008|first1=Andreas|last1=Griewank}}</ref>。1973<ref name="dreyfus1973">{{cite journal|year=1973|title=The computational solution of optimal control problems with time lag|url=|journal=IEEE Transactions on Automatic Control|volume=18|issue=4|pages=383–385|last1=Dreyfus|first1=Stuart|authorlink=https://en.wikipedia.org/wiki/Stuart_Dreyfus}}</ref> ,Dreyfus使用反向传播适配与误差梯度成比例的控制器[https://en.wikipedia.org/wiki/Parameter 参数]。1974,[https://en.wikipedia.org/wiki/Paul_Werbos Werbos]提出将这个规则应用到ANN上的可能<ref name="werbos1974">https://en.wikipedia.org/wiki/Paul_Werbos (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.</ref>,1982他将LInnainmaa的AD方法以今天广泛使用的方式应用到神经网络上<ref name="scholarpedia2"/><ref name="werbos1982">{{Cite book|url=http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|title=System modeling and optimization|last=Werbos|first=Paul|authorlink=https://en.wikipedia.org/wiki/Paul_Werbos|publisher=Springer|year=1982|location=|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis}}</ref>。1986, [https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart], Hinton和[https://en.wikipedia.org/wiki/Ronald_J._Williams Williams]注意到这种方法可以产生有用的神经网络隐藏层到来数据的内部表征。<ref name=":4">{{Cite journal|last=Rumelhart|first=David E.|last2=Hinton|first2=Geoffrey E.|last3=Williams|first3=Ronald J.|title=Learning representations by back-propagating errors|url=http://www.nature.com/articles/Art323533a0|journal=Nature|volume=323|issue=6088|pages=533–536|year=1986}}</ref> 1933,Wan第一个<ref name="SCHIDHUB2"/> 用反向传播赢得国际模式识别竞赛。<ref name="wan1993">Eric A. Wan (1993). "Time series prediction by using a connectionist network with internal delay lines." In ''Proceedings of the Santa Fe Institute Studies in the Sciences of Complexity'', '''15''': p. 195. Addison-Wesley Publishing Co.</ref> | + | 连续反向传播的基础<ref name="SCHIDHUB2"/><ref name="scholarpedia2">{{cite journal|year=2015|title=Deep Learning|url=http://www.scholarpedia.org/article/Deep_Learning|journal=Scholarpedia|volume=10|issue=11|page=32832|last1=Schmidhuber|first1=Jürgen|authorlink=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber}}</ref><ref name=":5">{{Cite journal|last=Dreyfus|first=Stuart E.|date=1990-09-01|title=Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure|url=http://arc.aiaa.org/doi/10.2514/3.25422|journal=Journal of Guidance, Control, and Dynamics|volume=13|issue=5|pages=926–928}}</ref><ref name="mizutani2000">Eiji Mizutani, [https://en.wikipedia.org/wiki/Stuart_Dreyfus Stuart Dreyfus], Kenichi Nishio (2000). On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2000), Como Italy, July 2000. [http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/ijcnn2k.pdf Online]</ref> 由[https://en.wikipedia.org/wiki/Henry_J._Kelley Kelley]<ref name="kelley1960">{{cite journal|year=1960|title=Gradient theory of optimal flight paths|url=http://arc.aiaa.org/doi/abs/10.2514/8.5282?journalCode=arsj|journal=Ars Journal|volume=30|issue=10|pages=947–954|last1=Kelley|first1=Henry J.|authorlink=https://en.wikipedia.org/wiki/Henry_J._Kelley}}</ref> 在1960和[https://en.wikipedia.org/wiki/Arthur_E._Bryson Bryson]在1961<ref name="bryson1961">[https://en.wikipedia.org/wiki/Arthur_E._Bryson Arthur E. Bryson] (1961, April). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard Univ. Symposium on digital computers and their applications.</ref>使用[https://en.wikipedia.org/wiki/Chain_rule 动态编程]的原则从[https://en.wikipedia.org/wiki/Control_theory 控制论]引出。1962,[https://en.wikipedia.org/wiki/Stuart_Dreyfus Dreyfus]发表了只基于[https://en.wikipedia.org/wiki/Chain_rule 链式法则]<ref name="dreyfus1962">{{cite journal|year=1962|title=The numerical solution of variational problems|url=https://www.researchgate.net/publication/256244271_The_numerical_solution_of_variational_problems|journal=Journal of Mathematical Analysis and Applications|volume=5|issue=1|pages=30–45|last1=Dreyfus|first1=Stuart|authorlink=https://en.wikipedia.org/wiki/Stuart_Dreyfus}}</ref>的更简单的衍生。1969,Bryson和[https://en.wikipedia.org/wiki/Yu-Chi_Ho Ho]把它描述成一种多级动态系统优化方法。<ref>{{cite book|url=https://books.google.com/books?id=8jZBksh-bUMC&pg=PA578|title=Artificial Intelligence A Modern Approach|last2=Norvig|first2=Peter|publisher=Prentice Hall|year=2010|page=578|quote=The most popular method for learning in multilayer networks is called Back-propagation.|first1=Stuart J.|last1=Russell}}</ref><ref name="Bryson1969">{{cite book|url=https://books.google.com/books?id=1bChDAEACAAJ&pg=PA481|title=Applied Optimal Control: Optimization, Estimation and Control|last=Bryson|first=Arthur Earl|publisher=Blaisdell Publishing Company or Xerox College Publishing|year=1969|page=481}}</ref>1970,[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Linnainmaa]最终发表了嵌套[https://en.wikipedia.org/wiki/Differentiable_function 可微函数]<ref name="lin1970">[https://en.wikipedia.org/wiki/Seppo_Linnainmaa Seppo Linnainmaa] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6–7.</ref><ref name="lin1976">{{cite journal|year=1976|title=Taylor expansion of the accumulated rounding error|url=|journal=BIT Numerical Mathematics|volume=16|issue=2|pages=146–160|last1=Linnainmaa|first1=Seppo|authorlink=https://en.wikipedia.org/wiki/Seppo_Linnainmaa}}</ref> 的离散连接网络[https://en.wikipedia.org/wiki/Automatic_differentiation 自动差分机](AD)的通用方法。这对应于反向传播的现代版本,它在网络稀疏时仍有效<ref name="SCHIDHUB2"/><ref name="scholarpedia2"/><ref name="grie2012">{{Cite journal|last=Griewank|first=Andreas|date=2012|title=Who Invented the Reverse Mode of Differentiation?|url=http://www.math.uiuc.edu/documenta/vol-ismp/52_griewank-andreas-b.pdf|journal=Documenta Matematica, Extra Volume ISMP|volume=|pages=389–400|via=}}</ref><ref name="grie2008">{{cite book|url=https://books.google.com/books?id=xoiiLaRxcbEC|title=Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition|last2=Walther|first2=Andrea|publisher=SIAM|year=2008|first1=Andreas|last1=Griewank}}</ref>。1973<ref name="dreyfus1973">{{cite journal|year=1973|title=The computational solution of optimal control problems with time lag|url=|journal=IEEE Transactions on Automatic Control|volume=18|issue=4|pages=383–385|last1=Dreyfus|first1=Stuart|authorlink=https://en.wikipedia.org/wiki/Stuart_Dreyfus}}</ref> ,Dreyfus使用反向传播适配与误差梯度成比例的控制器[https://en.wikipedia.org/wiki/Parameter 参数]。1974,[https://en.wikipedia.org/wiki/Paul_Werbos Werbos]提出将这个规则应用到ANN上的可能<ref name="werbos1974">https://en.wikipedia.org/wiki/Paul_Werbos (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.</ref>,1982他将LInnainmaa的AD方法以今天广泛使用的方式应用到神经网络上<ref name="scholarpedia2"/><ref name="werbos1982">{{Cite book|url=http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|title=System modeling and optimization|last=Werbos|first=Paul|authorlink=https://en.wikipedia.org/wiki/Paul_Werbos|publisher=Springer|year=1982|location=|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis}}</ref>。1986, [https://en.wikipedia.org/wiki/David_E._Rumelhart Rumelhart], Hinton和[https://en.wikipedia.org/wiki/Ronald_J._Williams Williams]注意到这种方法可以产生有用的神经网络隐藏层到来数据的内部表征。<ref name=":4">{{Cite journal|last=Rumelhart|first=David E.|last2=Hinton|first2=Geoffrey E.|last3=Williams|first3=Ronald J.|title=Learning representations by back-propagating errors|url=http://www.nature.com/articles/Art323533a0|journal=Nature|volume=323|issue=6088|pages=533–536|year=1986}}</ref> 1933,Wan第一个<ref name="SCHIDHUB2"/> 用反向传播赢得国际模式识别竞赛。<ref name="wan1993">Eric A. Wan (1993). "Time series prediction by using a connectionist network with internal delay lines." In ''Proceedings of the Santa Fe Institute Studies in the Sciences of Complexity'', '''15''': p. 195. Addison-Wesley Publishing Co.</ref> |
| | | |
| 反向传播的权重更新可以通过[https://en.wikipedia.org/wiki/Stochastic_gradient_descent 随机梯度下降]完成,使用下面的等式: | | 反向传播的权重更新可以通过[https://en.wikipedia.org/wiki/Stochastic_gradient_descent 随机梯度下降]完成,使用下面的等式: |
第371行: |
第371行: |
| ===长短期记忆( Long short-term memory) === | | ===长短期记忆( Long short-term memory) === |
| | | |
− | 长短期记忆 (LSTM) 网络是避免了[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]。<ref name=":03">{{Cite journal|last=Hochreiter|first=Sepp|author-link=https://en.wikipedia.org/wiki/Sepp_Hochreiter|last2=Schmidhuber|first2=Jürgen|author-link2=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber|date=1997-11-01|title=Long Short-Term Memory|url=http://www.mitpressjournals.org/doi/10.1162/neco.1997.9.8.1735|journal=Neural Computation|volume=9|issue=8|pages=1735–1780|via=}}</ref> LSTM通常被称为遗忘门的循环门扩展<ref name=":10">{{Cite journal|url=https://www.researchgate.net/publication/220320057_Learning_Precise_Timing_with_LSTM_Recurrent_Networks|title=Learning Precise Timing with LSTM Recurrent Networks (PDF Download Available)|journal=ResearchGate|language=en|access-date=2017-06-13|pp=115–143}}</ref>。 LSTM网络避免了反向传播误差的消失或爆炸。<ref name="HOCH19912"/> 误差可以通过在空间展开的LSTM中的无限制的虚层反向回流 。也就是说,LSTM可以学习“非常深的学习”任务,<ref name="SCHIDHUB2" />这些任务需要记住上千甚至上百万离散时间步前的事件。问题特殊的LSTM形态的拓扑结构可以成为进化的LSTM,<ref>{{Cite journal|last=Bayer|first=Justin|last2=Wierstra|first2=Daan|last3=Togelius|first3=Julian|last4=Schmidhuber|first4=Jürgen|date=2009-09-14|title=Evolving Memory Cell Structures for Sequence Learning|url=https://link.springer.com/chapter/10.1007/978-3-642-04277-5_76|journal=Artificial Neural Networks – ICANN 2009|volume=5769|language=en|publisher=Springer, Berlin, Heidelberg|pages=755–764|series=Lecture Notes in Computer Science}}</ref> 能处理长延迟和混合高低频成分的信号。 | + | 长短期记忆 (LSTM) 网络是避免了[https://en.wikipedia.org/wiki/Vanishing_gradient_problem 梯度消失问题]。<ref name=":03">{{Cite journal|last=Hochreiter|first=Sepp|last2=Schmidhuber|first2=Jürgen|date=1997-11-01|title=Long Short-Term Memory|url=http://www.mitpressjournals.org/doi/10.1162/neco.1997.9.8.1735|journal=Neural Computation|volume=9|issue=8|pages=1735–1780|via=}}</ref> LSTM通常被称为遗忘门的循环门扩展<ref name=":10">{{Cite journal|url=https://www.researchgate.net/publication/220320057_Learning_Precise_Timing_with_LSTM_Recurrent_Networks|title=Learning Precise Timing with LSTM Recurrent Networks (PDF Download Available)|journal=ResearchGate|language=en|access-date=2017-06-13|pp=115–143}}</ref>。 LSTM网络避免了反向传播误差的消失或爆炸。<ref name="HOCH19912"/> 误差可以通过在空间展开的LSTM中的无限制的虚层反向回流 。也就是说,LSTM可以学习“非常深的学习”任务,<ref name="SCHIDHUB2" />这些任务需要记住上千甚至上百万离散时间步前的事件。问题特殊的LSTM形态的拓扑结构可以成为进化的LSTM,<ref>{{Cite journal|last=Bayer|first=Justin|last2=Wierstra|first2=Daan|last3=Togelius|first3=Julian|last4=Schmidhuber|first4=Jürgen|date=2009-09-14|title=Evolving Memory Cell Structures for Sequence Learning|url=https://link.springer.com/chapter/10.1007/978-3-642-04277-5_76|journal=Artificial Neural Networks – ICANN 2009|volume=5769|language=en|publisher=Springer, Berlin, Heidelberg|pages=755–764|series=Lecture Notes in Computer Science}}</ref> 能处理长延迟和混合高低频成分的信号。 |
| 大量LSTM RNN<ref>{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=Sequence labelling in structured domains with hierarchical recurrent neural networks|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.1887|journal=In Proc. 20th Int. Joint Conf. on Artificial In℡ligence, Ijcai 2007|pages=774–779}}</ref> 使用联结主义时间分类(CTC)训练,<ref name=":12">{{Cite journal|last=Graves|first=Alex|last2=Fernández|first2=Santiago|last3=Gomez|first3=Faustino|date=2006|title=Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.6306|journal=In Proceedings of the International Conference on Machine Learning, ICML 2006|pages=369–376}}</ref> 给定相应输入序列,可以找到一个最大化训练集中标记序列概率的RNN权重矩阵。CTC达到了校准和识别。 | | 大量LSTM RNN<ref>{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=Sequence labelling in structured domains with hierarchical recurrent neural networks|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.1887|journal=In Proc. 20th Int. Joint Conf. on Artificial In℡ligence, Ijcai 2007|pages=774–779}}</ref> 使用联结主义时间分类(CTC)训练,<ref name=":12">{{Cite journal|last=Graves|first=Alex|last2=Fernández|first2=Santiago|last3=Gomez|first3=Faustino|date=2006|title=Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.6306|journal=In Proceedings of the International Conference on Machine Learning, ICML 2006|pages=369–376}}</ref> 给定相应输入序列,可以找到一个最大化训练集中标记序列概率的RNN权重矩阵。CTC达到了校准和识别。 |
| 2003,LSTM开始在传统语音识别器中具有竞争力。<ref name="graves2003">{{Cite journal|url=Ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf|title=Biologically Plausible Speech Recognition with LSTM Neural Nets|last=Graves|first=Alex|last2=Eck|first2=Douglas|date=2003|journal=1st Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland|pages=175–184|archive-url=|archive-date=|dead-url=|access-date=|last3=Beringer|first3=Nicole|last4=Schmidhuber|first4=Jürgen|authorlink4=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber}}</ref>2007,与CTC的结合在语音数据上达到了第一个良好的结果。<ref name="fernandez2007keyword">{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting|url=http://dl.acm.org/citation.cfm?id=1778066.1778092|journal=Proceedings of the 17th International Conference on Artificial Neural Networks|series=ICANN'07|location=Berlin, Heidelberg|publisher=Springer-Verlag|pages=220–229}}</ref>2009,一个CTC训练的LSTM成为第一个赢得模式识别比赛的RNN,当它赢得了几个连笔[https://en.wikipedia.org/wiki/Handwriting_recognition 手写识别]比赛。<ref name="SCHIDHUB2" /><ref name="graves20093"/>2014,[https://en.wikipedia.org/wiki/Baidu 百度]使用CTC训练的RNN打破了Switchboard Hub5'00语音识别在基准测试数据集上的表现,而没有使用传统语音处理方法。<ref name="hannun2014">{{cite journal|last=Hannun|first=Awni|last2=Case|first2=Carl|last3=Casper|first3=Jared|last4=Catanzaro|first4=Bryan|last5=Diamos|first5=Greg|last6=Elsen|first6=Erich|last7=Prenger|first7=Ryan|last8=Satheesh|first8=Sanjeev|last9=Sengupta|first9=Shubho|date=2014-12-17|title=Deep Speech: Scaling up end-to-end speech recognition|url=https://arxiv.org/abs/1412.5567}}</ref> LSTM也提高了大量词汇语音识别,<ref name="sak2014">{{Cite journal|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling|last=Sak|first=Hasim|last2=Senior|first2=Andrew|date=2014|archive-url=|archive-date=|dead-url=|access-date=|last3=Beaufays|first3=Francoise}}</ref><ref name="liwu2015">{{cite journal|last=Li|first=Xiangang|last2=Wu|first2=Xihong|date=2014-10-15|title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition|url=https://arxiv.org/abs/1410.4281}}</ref>文本到语音合成,<ref>{{Cite journal|url=https://www.researchgate.net/publication/287741874_TTS_synthesis_with_bidirectional_LSTM_based_Recurrent_Neural_Networks|title=TTS synthesis with bidirectional LSTM based Recurrent Neural Networks|last=Fan|first=Y.|last2=Qian|first2=Y.|date=2014|journal=ResearchGate|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-06-13|last3=Xie|first3=F.|last4=Soong|first4=F. K.}}</ref> 对谷歌安卓<ref name="scholarpedia2"/><ref name="zen2015">{{Cite journal|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis|last=Zen|first=Heiga|last2=Sak|first2=Hasim|date=2015|journal=Google.com|publisher=ICASSP|pages=4470–4474|archive-url=|archive-date=|dead-url=|access-date=}}</ref>和真实图片的传声头像。<ref name="fan2015">{{Cite journal|last=Fan|first=Bo|last2=Wang|first2=Lijuan|last3=Soong|first3=Frank K.|last4=Xie|first4=Lei|date=2015|title=Photo-Real Talking Head with Deep Bidirectional LSTM|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2015/04/icassp2015_fanbo_1009.pdf|journal=Proceedings of ICASSP|volume=|pages=|via=}}</ref>2015,谷歌的语音识别通过CTC训练的LSTM提高了49%的性能。<ref name="sak2015">{{Cite journal|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html|title=Google voice search: faster and more accurate|last=Sak|first=Haşim|last2=Senior|first2=Andrew|date=September 2015|archive-url=|archive-date=|dead-url=|access-date=|last3=Rao|first3=Kanishka|last4=Beaufays|first4=Françoise|last5=Schalkwyk|first5=Johan}}</ref> | | 2003,LSTM开始在传统语音识别器中具有竞争力。<ref name="graves2003">{{Cite journal|url=Ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf|title=Biologically Plausible Speech Recognition with LSTM Neural Nets|last=Graves|first=Alex|last2=Eck|first2=Douglas|date=2003|journal=1st Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland|pages=175–184|archive-url=|archive-date=|dead-url=|access-date=|last3=Beringer|first3=Nicole|last4=Schmidhuber|first4=Jürgen|authorlink4=https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber}}</ref>2007,与CTC的结合在语音数据上达到了第一个良好的结果。<ref name="fernandez2007keyword">{{Cite journal|last=Fernández|first=Santiago|last2=Graves|first2=Alex|last3=Schmidhuber|first3=Jürgen|date=2007|title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting|url=http://dl.acm.org/citation.cfm?id=1778066.1778092|journal=Proceedings of the 17th International Conference on Artificial Neural Networks|series=ICANN'07|location=Berlin, Heidelberg|publisher=Springer-Verlag|pages=220–229}}</ref>2009,一个CTC训练的LSTM成为第一个赢得模式识别比赛的RNN,当它赢得了几个连笔[https://en.wikipedia.org/wiki/Handwriting_recognition 手写识别]比赛。<ref name="SCHIDHUB2" /><ref name="graves20093"/>2014,[https://en.wikipedia.org/wiki/Baidu 百度]使用CTC训练的RNN打破了Switchboard Hub5'00语音识别在基准测试数据集上的表现,而没有使用传统语音处理方法。<ref name="hannun2014">{{cite journal|last=Hannun|first=Awni|last2=Case|first2=Carl|last3=Casper|first3=Jared|last4=Catanzaro|first4=Bryan|last5=Diamos|first5=Greg|last6=Elsen|first6=Erich|last7=Prenger|first7=Ryan|last8=Satheesh|first8=Sanjeev|last9=Sengupta|first9=Shubho|date=2014-12-17|title=Deep Speech: Scaling up end-to-end speech recognition|url=https://arxiv.org/abs/1412.5567}}</ref> LSTM也提高了大量词汇语音识别,<ref name="sak2014">{{Cite journal|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf|title=Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling|last=Sak|first=Hasim|last2=Senior|first2=Andrew|date=2014|archive-url=|archive-date=|dead-url=|access-date=|last3=Beaufays|first3=Francoise}}</ref><ref name="liwu2015">{{cite journal|last=Li|first=Xiangang|last2=Wu|first2=Xihong|date=2014-10-15|title=Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition|url=https://arxiv.org/abs/1410.4281}}</ref>文本到语音合成,<ref>{{Cite journal|url=https://www.researchgate.net/publication/287741874_TTS_synthesis_with_bidirectional_LSTM_based_Recurrent_Neural_Networks|title=TTS synthesis with bidirectional LSTM based Recurrent Neural Networks|last=Fan|first=Y.|last2=Qian|first2=Y.|date=2014|journal=ResearchGate|language=en|archive-url=|archive-date=|dead-url=|access-date=2017-06-13|last3=Xie|first3=F.|last4=Soong|first4=F. K.}}</ref> 对谷歌安卓<ref name="scholarpedia2"/><ref name="zen2015">{{Cite journal|url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43266.pdf|title=Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis|last=Zen|first=Heiga|last2=Sak|first2=Hasim|date=2015|journal=Google.com|publisher=ICASSP|pages=4470–4474|archive-url=|archive-date=|dead-url=|access-date=}}</ref>和真实图片的传声头像。<ref name="fan2015">{{Cite journal|last=Fan|first=Bo|last2=Wang|first2=Lijuan|last3=Soong|first3=Frank K.|last4=Xie|first4=Lei|date=2015|title=Photo-Real Talking Head with Deep Bidirectional LSTM|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2015/04/icassp2015_fanbo_1009.pdf|journal=Proceedings of ICASSP|volume=|pages=|via=}}</ref>2015,谷歌的语音识别通过CTC训练的LSTM提高了49%的性能。<ref name="sak2015">{{Cite journal|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html|title=Google voice search: faster and more accurate|last=Sak|first=Haşim|last2=Senior|first2=Andrew|date=September 2015|archive-url=|archive-date=|dead-url=|access-date=|last3=Rao|first3=Kanishka|last4=Beaufays|first4=Françoise|last5=Schalkwyk|first5=Johan}}</ref> |
第577行: |
第577行: |
| <gallery widths="260"> | | <gallery widths="260"> |
| | | |
− | File:Single_layer_ann.svg.png|一个单层前馈人工神经网络。从<math>\scriptstyle x_2</math>开始的箭头为了清晰省略了。这个网络有p个输入和q个输出。在这个系统中,第q个输出的值<math>\scriptstyle {y_q}</math>被以<math>\scriptstyle {y_q} = K*({\sum({x_i}*{w_{iq}})}-{b_q}) </math>计算 | + | [[File:Single_layer_ann.svg.png|一个单层前馈人工神经网络。从<math>\scriptstyle x_2</math>开始的箭头为了清晰省略了。这个网络有p个输入和q个输出。在这个系统中,第q个输出的值<math>\scriptstyle {y_q}</math>被以<math>\scriptstyle {y_q} = K*({\sum({x_i}*{w_{iq}})}-{b_q}) </math>计算]] |
− | File:Two_layer_ann.svg.png|一个两层前馈人工神经网络 | + | [[File:Two_layer_ann.svg.png|一个两层前馈人工神经网络]] |
− | File:Artificial_neural_network.svg.png|一个人工神经网络 | + | [[File:Artificial_neural_network.svg.png|一个人工神经网络]] |
− | File:Ann_dependency_(graph).svg.png|一个ANN依赖图 | + | [[File:Ann_dependency_(graph).svg.png|一个ANN依赖图]] |
− | File:Single-layer_feedforward_artificial_neural_network.png|有4输入,6隐藏单元和2输出的单层前馈神经网络。给定位置状态和方向,输出转动基于控制值。 | + | [[File:Single-layer_feedforward_artificial_neural_network.png|有4输入,6隐藏单元和2输出的单层前馈神经网络。给定位置状态和方向,输出转动基于控制值。 ]] |
− | File:Two-layer_feedforward_artificial_neural_network.png|有8输入,2x8隐藏单元和2输出的两层前馈人工神经网络。给定位置状态,方向和其他环境值,输出推进基于控制值。 | + | [[File:Two-layer_feedforward_artificial_neural_network.png|有8输入,2x8隐藏单元和2输出的两层前馈人工神经网络。给定位置状态,方向和其他环境值,输出推进基于控制值。]] |
− | File:Cmac.jpg|CMAC神经网络的并行流水线结构。这种学习算法可以一步收敛。 | + | [[File:Cmac.jpg|CMAC神经网络的并行流水线结构。这种学习算法可以一步收敛。]] |
| </gallery> | | </gallery> |
| | | |