更改
跳到导航
跳到搜索
第282行:
第282行:
− +
− 除了【长短期记忆】(LSTM), 其他方法也在循环函数中加入可微记忆,例如:+
第294行:
第294行:
− +
− 【可微神经计算机】(DNC)是一个NTM的延伸。他们在序列处理任务中表现超过神经图灵机,【长短期记忆】系统和记忆网络。<ref name=":02">{{Cite news|url=https://www.wired.co.uk/article/deepmind-ai-tube-london-underground|title=DeepMind's AI learned to ride the London Underground using human-like reason and memory|last=Burgess|first=Matt|newspaper=WIRED UK|language=en-GB|access-date=2016-10-19}}</ref><ref>{{Cite news|url=https://www.pcmag.com/news/348701/deepmind-ai-learns-to-navigate-london-tube|title=DeepMind AI 'Learns' to Navigate London Tube|newspaper=PCMAG|access-date=2016-10-19}}</ref><ref>{{Cite web|url=https://techcrunch.com/2016/10/13/__trashed-2/|title=DeepMind's differentiable neural computer helps you navigate the subway with its memory|last=Mannes|first=John|website=TechCrunch|access-date=2016-10-19}}</ref><ref>{{Cite journal|last=Graves|first=Alex|last2=Wayne|first2=Greg|last3=Reynolds|first3=Malcolm|last4=Harley|first4=Tim|last5=Danihelka|first5=Ivo|last6=Grabska-Barwińska|first6=Agnieszka|last7=Colmenarejo|first7=Sergio Gómez|last8=Grefenstette|first8=Edward|last9=Ramalho|first9=Tiago|date=2016-10-12|title=Hybrid computing using a neural network with dynamic external memory|url=http://www.nature.com/nature/journal/vaop/ncurrent/full/nature20101.html|journal=Nature|language=en|volume=538|issue=7626|doi=10.1038/nature20101|issn=1476-4687|pages=471–476|pmid=27732574|bibcode=2016Natur.538..471G}}</ref><ref>{{Cite web|url=https://deepmind.com/blog/differentiable-neural-computers/|title=Differentiable neural computers {{!}} DeepMind|website=DeepMind|access-date=2016-10-19}}</ref>+
− 直接代表过去经验,【使用相同经验形成局部模型】的方法通常称为【最近邻】或【k最近邻】方法。<ref>{{cite journal|last2=Schaal|first2=Stefan|year=1995|title=Memory-based neural networks for robot learning|url=|journal=Neurocomputing|volume=9|issue=3|pages=243–269|doi=10.1016/0925-2312(95)00033-6|last1=Atkeson|first1=Christopher G.}}</ref>深度学习在语义哈希<ref>Salakhutdinov, Ruslan, and Geoffrey Hinton. [http://www.utstat.toronto.edu/~rsalakhu/papers/sdarticle.pdf "Semantic hashing."] International Journal of Approximate Reasoning 50.7 (2009): 969–978.</ref>中十分有用,其中一个深度【图模型】建模由一个大的文档集中获取的字数向量。<ref name="Le 2014">{{Cite arXiv|eprint=1405.4053|first=Quoc V.|last=Le|first2=Tomas|last2=Mikolov|title=Distributed representations of sentences and documents|year=2014|class=cs.CL}}</ref> 文档映射到内存地址,这样语义相似的文档位于临近的地址。与查询文档相似的文档可以通过访问所有仅来自查询文档地址的几位不同的地址找到。不像在1000位地址上操作的【稀疏分布记忆】,语义哈希在常见计算机结构的32或64位地址上工作。+
− +
− 深度神经网络可能通过在维持可训练性的同时,加深和减少参数改进。当训练十分深(例如一百万层)神经网络可能不可行,类【CPU】结构如指针网络<ref>Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." {{arxiv|1506.03134}} (2015).</ref>和神经随机访问机器<ref>Kurach, Karol, Andrychowicz, Marcin and Sutskever, Ilya. "Neural Random-Access Machines." {{arxiv|1511.06392}} (2015).</ref>通过使用外部【随机访问内存】和其他属于【计算机组成】的组件,如【寄存器】,【ALU】和【指针】解决了这个限制。这种系统在储存在记忆单元和寄存器中的【概率分布】向量上操作。这样,模型是全可微并且端到端训练的。这些模型的关键特点是它们的深度,它们短期记忆的大小和参数的数量可以独立切换——不像类似LSTM的模型,它们的参数数量随内存大小二次增长。+
− 编码解码框架是基于从高度【结构化】输入到高度结构化输出的映射的神经网络。这种方法在【机器翻译】<ref>{{Cite web|url=http://www.aclweb.org/anthology/D13-1176|title=Recurrent continuous translation models|last=Kalchbrenner|first=N.|last2=Blunsom|first2=P.|date=2013|website=|publisher=EMNLP'2013|archive-url=|archive-date=|dead-url=|access-date=}}</ref><ref>{{Cite web|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|title=Sequence to sequence learning with neural networks|last=Sutskever|first=I.|last2=Vinyals|first2=O.|date=2014|website=|publisher=NIPS'2014|archive-url=|archive-date=|dead-url=|access-date=|last3=Le|first3=Q. V.}}</ref><ref>{{Cite journal|last=Cho|first=K.|last2=van Merrienboer|first2=B.|last3=Gulcehre|first3=C.|last4=Bougares|first4=F.|last5=Schwenk|first5=H.|last6=Bengio|first6=Y.|date=October 2014|title=Learning phrase representations using RNN encoder-decoder for statistical machine translation|journal=Proceedings of the Empiricial Methods in Natural Language Processing|volume=1406|pages=arXiv:1406.1078|via=|arxiv=1406.1078|bibcode=2014arXiv1406.1078C}}</ref>的背景下被提出,它的输入和输出是使用两种自然语言写成的句子。在这个工作中,LSTM RNN或CNN被用作编码机,来总结源语句,这个总结被条件RNN【语言模型】解码来产生翻译。<ref>Cho, Kyunghyun, Aaron Courville, and Yoshua Bengio. "Describing Multimedia Content using Attention-based Encoder–Decoder Networks." {{arxiv|1507.01053}} (2015).</ref> 这些系统共享建立的模块:门限RNN,CNN,和训练的注意机制。+
→带单独记忆结构的网络(Networks with separate memory structures)
=== 带单独记忆结构的网络(Networks with separate memory structures) ===
=== 带单独记忆结构的网络(Networks with separate memory structures) ===
使用ANN整合外部记忆可以追溯到关于分布表征<ref name="Hinton, Geoffrey E 19842">{{Cite web|url=http://repository.cmu.edu/cgi/viewcontent.cgi?article=2841&context=compsci|title=Distributed representations|last=Hinton|first=Geoffrey E.|date=1984|website=|archive-url=|archive-date=|dead-url=|access-date=}}</ref>和【Kohonen】的【自组织映射】的早期研究。例如, 在【稀疏分布式记忆】或【层级空间记忆】中,神经网络编码的模式被用于【可寻址内容的记忆】的地址,使用“神经元”本质上作为地址 【编码器】和【解码器】。 然而早期这种记忆的控制器不可微。
使用ANN整合外部记忆可以追溯到关于分布表征<ref name="Hinton, Geoffrey E 19842">{{Cite web|url=http://repository.cmu.edu/cgi/viewcontent.cgi?article=2841&context=compsci|title=Distributed representations|last=Hinton|first=Geoffrey E.|date=1984|website=|archive-url=|archive-date=|dead-url=|access-date=}}</ref>和[https://en.wikipedia.org/wiki/Teuvo_Kohonen Kohonen]的[https://en.wikipedia.org/wiki/Teuvo_Kohonen 自组织映射]的早期研究。例如, 在[https://en.wikipedia.org/wiki/Sparse_distributed_memory 稀疏分布式记忆]或[https://en.wikipedia.org/wiki/Hierarchical_temporal_memory 层级空间记忆]中,神经网络编码的模式被用于[https://en.wikipedia.org/wiki/Content-addressable_memory 可寻址内容的记忆]的地址,使用“神经元”本质上作为地址 [https://en.wikipedia.org/wiki/Encoder 编码器]和[https://en.wikipedia.org/wiki/Binary_decoder 解码器]。 然而早期这种记忆的控制器不可微。
====LSTM相关的可微记忆结构(LSTM-related differentiable memory structures) ====
====LSTM相关的可微记忆结构(LSTM-related differentiable memory structures) ====
除了[https://en.wikipedia.org/wiki/Long_short-term_memory 长短期记忆](LSTM), 其他方法也在循环函数中加入可微记忆,例如:
* 交替记忆网络的可微的推和弹动作,称为神经叠加机器<ref name="S. Das, C.L. Giles p. 79">S. Das, C.L. Giles, G.Z. Sun, "Learning Context Free Grammars: Limitations of a Recurrent Neural Network with an External Stack Memory," Proc. 14th Annual Conf. of the Cog. Sci. Soc., p. 79, 1992.</ref><ref name="Mozer, M. C. 1993 pp. 863-870">{{Cite web|url=https://papers.nips.cc/paper/626-a-connectionist-symbol-manipulator-that-discovers-the-structure-of-context-free-languages|title=A connectionist symbol manipulator that discovers the structure of context-free languages|last=Mozer|first=M. C.|last2=Das|first2=S.|date=1993|website=|publisher=NIPS 5|pages=863–870|archive-url=|archive-date=|dead-url=|access-date=}}</ref>
* 交替记忆网络的可微的推和弹动作,称为神经叠加机器<ref name="S. Das, C.L. Giles p. 79">S. Das, C.L. Giles, G.Z. Sun, "Learning Context Free Grammars: Limitations of a Recurrent Neural Network with an External Stack Memory," Proc. 14th Annual Conf. of the Cog. Sci. Soc., p. 79, 1992.</ref><ref name="Mozer, M. C. 1993 pp. 863-870">{{Cite web|url=https://papers.nips.cc/paper/626-a-connectionist-symbol-manipulator-that-discovers-the-structure-of-context-free-languages|title=A connectionist symbol manipulator that discovers the structure of context-free languages|last=Mozer|first=M. C.|last2=Das|first2=S.|date=1993|website=|publisher=NIPS 5|pages=863–870|archive-url=|archive-date=|dead-url=|access-date=}}</ref>
* 控制网络的外部可微存储在其他网络的快速幂中的记忆网络。<ref name="ReferenceC">{{cite journal|year=1992|title=Learning to control fast-weight memories: An alternative to recurrent nets|url=|journal=Neural Computation|volume=4|issue=1|pages=131–139|doi=10.1162/neco.1992.4.1.131|last1=Schmidhuber|first1=J.}}</ref>
* 控制网络的外部可微存储在其他网络的快速幂中的记忆网络。<ref name="ReferenceC">{{cite journal|year=1992|title=Learning to control fast-weight memories: An alternative to recurrent nets|url=|journal=Neural Computation|volume=4|issue=1|pages=131–139|doi=10.1162/neco.1992.4.1.131|last1=Schmidhuber|first1=J.}}</ref>
===== 神经图灵机(Neural Turing machines) =====
===== 神经图灵机(Neural Turing machines) =====
神经图灵机<ref name="Graves, Alex 14102">Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural Turing Machines." {{arxiv|1410.5401}} (2014).</ref>将LSTM网络与外部记忆资源结合,这样他们可以通过注意过程相互影响。这种组合系统和【图灵机】相似但是端到端可微,允许使用【梯度下降】有效训练 。初步结果表明神经图灵机可以推断简单算法,如复制,排序和从输入输出例子的联想回忆。
神经图灵机<ref name="Graves, Alex 14102">Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural Turing Machines." {{arxiv|1410.5401}} (2014).</ref>将LSTM网络与外部记忆资源结合,这样他们可以通过注意过程相互影响。这种组合系统和[https://en.wikipedia.org/wiki/Turing_machine 图灵机]相似但是端到端可微,允许使用[https://en.wikipedia.org/wiki/Gradient_descent 梯度下降]有效训练 。初步结果表明神经图灵机可以推断简单算法,如复制,排序和从输入输出例子的联想回忆。
[https://en.wikipedia.org/wiki/Differentiable_neural_computer 可微神经计算机](DNC)是一个NTM的延伸。他们在序列处理任务中表现超过神经图灵机,长短期记忆系统和记忆网络。<ref name=":02">{{Cite news|url=https://www.wired.co.uk/article/deepmind-ai-tube-london-underground|title=DeepMind's AI learned to ride the London Underground using human-like reason and memory|last=Burgess|first=Matt|newspaper=WIRED UK|language=en-GB|access-date=2016-10-19}}</ref><ref>{{Cite news|url=https://www.pcmag.com/news/348701/deepmind-ai-learns-to-navigate-london-tube|title=DeepMind AI 'Learns' to Navigate London Tube|newspaper=PCMAG|access-date=2016-10-19}}</ref><ref>{{Cite web|url=https://techcrunch.com/2016/10/13/__trashed-2/|title=DeepMind's differentiable neural computer helps you navigate the subway with its memory|last=Mannes|first=John|website=TechCrunch|access-date=2016-10-19}}</ref><ref>{{Cite journal|last=Graves|first=Alex|last2=Wayne|first2=Greg|last3=Reynolds|first3=Malcolm|last4=Harley|first4=Tim|last5=Danihelka|first5=Ivo|last6=Grabska-Barwińska|first6=Agnieszka|last7=Colmenarejo|first7=Sergio Gómez|last8=Grefenstette|first8=Edward|last9=Ramalho|first9=Tiago|date=2016-10-12|title=Hybrid computing using a neural network with dynamic external memory|url=http://www.nature.com/nature/journal/vaop/ncurrent/full/nature20101.html|journal=Nature|language=en|volume=538|issue=7626|doi=10.1038/nature20101|issn=1476-4687|pages=471–476|pmid=27732574|bibcode=2016Natur.538..471G}}</ref><ref>{{Cite web|url=https://deepmind.com/blog/differentiable-neural-computers/|title=Differentiable neural computers {{!}} DeepMind|website=DeepMind|access-date=2016-10-19}}</ref>
==== 语义哈希(Semantic hashing )====
==== 语义哈希(Semantic hashing )====
直接代表过去经验,[https://en.wikipedia.org/wiki/Instance-based_learning 使用相同经验形成局部模型]的方法通常称为[https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm 最近邻]或[https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm k最近邻]方法。<ref>{{cite journal|last2=Schaal|first2=Stefan|year=1995|title=Memory-based neural networks for robot learning|url=|journal=Neurocomputing|volume=9|issue=3|pages=243–269|doi=10.1016/0925-2312(95)00033-6|last1=Atkeson|first1=Christopher G.}}</ref>深度学习在语义哈希<ref>Salakhutdinov, Ruslan, and Geoffrey Hinton. [http://www.utstat.toronto.edu/~rsalakhu/papers/sdarticle.pdf "Semantic hashing."] International Journal of Approximate Reasoning 50.7 (2009): 969–978.</ref>中十分有用,其中一个深度[https://en.wikipedia.org/wiki/Graphical_model 图模型]建模由一个大的文档集中获取的字数向量。<ref name="Le 2014">{{Cite arXiv|eprint=1405.4053|first=Quoc V.|last=Le|first2=Tomas|last2=Mikolov|title=Distributed representations of sentences and documents|year=2014|class=cs.CL}}</ref> 文档映射到内存地址,这样语义相似的文档位于临近的地址。与查询文档相似的文档可以通过访问所有仅来自查询文档地址的几位不同的地址找到。不像在1000位地址上操作的[https://en.wikipedia.org/wiki/Sparse_distributed_memory 稀疏分布记忆],语义哈希在常见计算机结构的32或64位地址上工作。
==== 记忆网络(Memory networks) ====
==== 记忆网络(Memory networks) ====
记忆网络<ref name="Weston, Jason 14102">Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." {{arxiv|1410.3916}} (2014).</ref><ref>Sukhbaatar, Sainbayar, et al. "End-To-End Memory Networks." {{arxiv|1503.08895}} (2015).</ref>是神经网络结合【长期记忆】的另一个扩展。长期记忆可以可以被读写,目的是用来预测。这些模型用于【问题回答】,其中长期记忆有效地作为(动态)知识基础,输出是文本回应。<ref>Bordes, Antoine, et al. "Large-scale Simple Question Answering with Memory Networks." {{arxiv|1506.02075}} (2015).</ref>一个来自UCLA萨穆埃利工程学院的电子和计算机工程师团队做出了一种物理人工神经网络。它可以在实际光速下分析大量数据并识别物体。<ref>{{Cite news|url=https://www.sciencedaily.com/releases/2018/08/180802130750.htm|title=AI device identifies objects at the speed of light: The 3D-printed artificial neural network can be used in medicine, robotics and security|work=ScienceDaily|access-date=2018-08-08|language=en}}</ref>
记忆网络<ref name="Weston, Jason 14102">Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." {{arxiv|1410.3916}} (2014).</ref><ref>Sukhbaatar, Sainbayar, et al. "End-To-End Memory Networks." {{arxiv|1503.08895}} (2015).</ref>是神经网络结合[https://en.wikipedia.org/wiki/Long-term_memory 长期记忆]的另一个扩展。长期记忆可以可以被读写,目的是用来预测。这些模型用于[https://en.wikipedia.org/wiki/Question_answering 问题回答],其中长期记忆有效地作为(动态)知识基础,输出是文本回应。<ref>Bordes, Antoine, et al. "Large-scale Simple Question Answering with Memory Networks." {{arxiv|1506.02075}} (2015).</ref>一个来自UCLA萨穆埃利工程学院的电子和计算机工程师团队做出了一种物理人工神经网络。它可以在实际光速下分析大量数据并识别物体。<ref>{{Cite news|url=https://www.sciencedaily.com/releases/2018/08/180802130750.htm|title=AI device identifies objects at the speed of light: The 3D-printed artificial neural network can be used in medicine, robotics and security|work=ScienceDaily|access-date=2018-08-08|language=en}}</ref>
==== 指针网络(Pointer networks) ====
==== 指针网络(Pointer networks) ====
深度神经网络可能通过在维持可训练性的同时,加深和减少参数改进。当训练十分深(例如一百万层)神经网络可能不可行,类[https://en.wikipedia.org/wiki/CPU CPU]结构如指针网络<ref>Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." {{arxiv|1506.03134}} (2015).</ref>和神经随机访问机器<ref>Kurach, Karol, Andrychowicz, Marcin and Sutskever, Ilya. "Neural Random-Access Machines." {{arxiv|1511.06392}} (2015).</ref>通过使用外部[https://en.wikipedia.org/wiki/Random-access_memory 随机访问内存]和其他属于[https://en.wikipedia.org/wiki/Computer_architecture 计算机组成]的组件,如[https://en.wikipedia.org/wiki/Processor_register 寄存器],[https://en.wikipedia.org/wiki/Arithmetic_logic_unit ALU]和[https://en.wikipedia.org/wiki/Pointer_(computer_programming) 指针]解决了这个限制。这种系统在储存在记忆单元和寄存器中的[https://en.wikipedia.org/wiki/Probability_distribution 概率分布]向量上操作。这样,模型是全可微并且端到端训练的。这些模型的关键特点是它们的深度,它们短期记忆的大小和参数的数量可以独立切换——不像类似LSTM的模型,它们的参数数量随内存大小二次增长。
==== 编码解码网络(Encoder–decoder networks )====
==== 编码解码网络(Encoder–decoder networks )====
编码解码框架是基于从高度[https://en.wikipedia.org/wiki/Structured_prediction 结构化]输入到高度结构化输出的映射的神经网络。这种方法在[https://en.wikipedia.org/wiki/Machine_translation 机器翻译]<ref>{{Cite web|url=http://www.aclweb.org/anthology/D13-1176|title=Recurrent continuous translation models|last=Kalchbrenner|first=N.|last2=Blunsom|first2=P.|date=2013|website=|publisher=EMNLP'2013|archive-url=|archive-date=|dead-url=|access-date=}}</ref><ref>{{Cite web|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf|title=Sequence to sequence learning with neural networks|last=Sutskever|first=I.|last2=Vinyals|first2=O.|date=2014|website=|publisher=NIPS'2014|archive-url=|archive-date=|dead-url=|access-date=|last3=Le|first3=Q. V.}}</ref><ref>{{Cite journal|last=Cho|first=K.|last2=van Merrienboer|first2=B.|last3=Gulcehre|first3=C.|last4=Bougares|first4=F.|last5=Schwenk|first5=H.|last6=Bengio|first6=Y.|date=October 2014|title=Learning phrase representations using RNN encoder-decoder for statistical machine translation|journal=Proceedings of the Empiricial Methods in Natural Language Processing|volume=1406|pages=arXiv:1406.1078|via=|arxiv=1406.1078|bibcode=2014arXiv1406.1078C}}</ref>的背景下被提出,它的输入和输出是使用两种自然语言写成的句子。在这个工作中,LSTM RNN或CNN被用作编码机,来总结源语句,这个总结被条件RNN【语言模型】解码来产生翻译。<ref>Cho, Kyunghyun, Aaron Courville, and Yoshua Bengio. "Describing Multimedia Content using Attention-based Encoder–Decoder Networks." {{arxiv|1507.01053}} (2015).</ref> 这些系统共享建立的模块:门限RNN,CNN,和训练的注意机制。
=== 多层核机器(Multilayer kernel machine) ===
=== 多层核机器(Multilayer kernel machine) ===