更改
→深度叠加网络( Deep stacking networks )
===深度叠加网络( Deep stacking networks )===
===深度叠加网络( Deep stacking networks )===
深度叠加网络 (DSN)<ref name="ref17">{{cite journal|last2=Yu|first2=Dong|last3=Platt|first3=John|date=2012|title=Scalable stacking and learning for building deep architectures|url=http://research-srv.microsoft.com/pubs/157586/DSN-ICASSP2012.pdf|journal=2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)|pages=2133–2136|last1=Deng|first1=Li}}</ref> (深度凸网络)是基于多块的简化神经网络模块的层级。在2011被Deng和Dong引入。<ref name="ref16">{{cite journal|last2=Yu|first2=Dong|date=2011|title=Deep Convex Net: A Scalable Architecture for Speech Pattern Classification|url=http://www.truebluenegotiations.com/files/deepconvexnetwork-interspeech2011-pub.pdf|journal=Proceedings of the Interspeech|pages=2285–2288|last1=Deng|first1=Li}}</ref> 它用带【闭型解】的【凸优化】表达学习,强调机制与【层叠泛化】的相似。<ref name="ref18">{{cite journal|date=1992|title=Stacked generalization|journal=Neural Networks|volume=5|issue=2|pages=241–259|doi=10.1016/S0893-6080(05)80023-1|last1=David|first1=Wolpert}}</ref>每个DSN块是一个容易被【监督】式自我训练的简单模块,不需要整个块的反向传播。<ref>{{Cite journal|last=Bengio|first=Y.|date=2009-11-15|title=Learning Deep Architectures for AI|url=http://www.nowpublishers.com/article/Details/MAL-006|journal=Foundations and Trends® in Machine Learning|language=English|volume=2|issue=1|pages=1–127|doi=10.1561/2200000006|issn=1935-8237}}</ref>
深度叠加网络 (DSN)<ref name="ref17">{{cite journal|last2=Yu|first2=Dong|last3=Platt|first3=John|date=2012|title=Scalable stacking and learning for building deep architectures|url=http://research-srv.microsoft.com/pubs/157586/DSN-ICASSP2012.pdf|journal=2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)|pages=2133–2136|last1=Deng|first1=Li}}</ref> (深度凸网络)是基于多块的简化神经网络模块的层级。在2011被Deng和Dong引入。<ref name="ref16">{{cite journal|last2=Yu|first2=Dong|date=2011|title=Deep Convex Net: A Scalable Architecture for Speech Pattern Classification|url=http://www.truebluenegotiations.com/files/deepconvexnetwork-interspeech2011-pub.pdf|journal=Proceedings of the Interspeech|pages=2285–2288|last1=Deng|first1=Li}}</ref> 它用带[https://en.wikipedia.org/wiki/Closed-form_expression 闭型解]的[https://en.wikipedia.org/wiki/Convex_optimization_problem 凸优化]表达学习,强调机制与[https://en.wikipedia.org/wiki/Ensemble_learning 层叠泛化]的相似。<ref name="ref18">{{cite journal|date=1992|title=Stacked generalization|journal=Neural Networks|volume=5|issue=2|pages=241–259|doi=10.1016/S0893-6080(05)80023-1|last1=David|first1=Wolpert}}</ref>每个DSN块是一个容易被[https://en.wikipedia.org/wiki/Supervised_learning 监督]式自我训练的简单模块,不需要整个块的反向传播。<ref>{{Cite journal|last=Bengio|first=Y.|date=2009-11-15|title=Learning Deep Architectures for AI|url=http://www.nowpublishers.com/article/Details/MAL-006|journal=Foundations and Trends® in Machine Learning|language=English|volume=2|issue=1|pages=1–127|doi=10.1561/2200000006|issn=1935-8237}}</ref>
每块由一个简化的带单隐层的[https://en.wikipedia.org/wiki/Multi-layer_perceptron 多层感知机](MLP)组成。隐藏层 '''''h''''' 有逻辑[https://en.wikipedia.org/wiki/Sigmoid_function 双弯曲的][https://en.wikipedia.org/wiki/Logistic_function 单元],输出层有线性单元。这些层之间的连接用权重矩阵'''''U;'''''表示,输入到隐藏层连接有权重矩阵 '''''W'''''。目标向量'''''t''''' 形成矩阵 '''''T'''''的列, 输入数据向量 '''''x'''''形成矩阵 '''''X.''''' 的列。隐藏单元的矩阵是<math>\boldsymbol{H} = \sigma(\boldsymbol{W}^T\boldsymbol{X})</math>. 。模块按顺序训练,因此底层的权重 '''''W''''' 在每一阶段已知。函数执行对应元素的[https://en.wikipedia.org/wiki/Logistic_function 逻辑双弯曲]操作。每块估计同一个最终标记类 ''y'',这个估计被原始输入'''''X''''' 串级起来,形成下一个块的扩展输入。因此第一块的输入只包含原始输入,而下游的块输入加上了前驱块的输出。然后学习上层权重矩阵 '''''U''''' ,给定网络中其他权重可以被表达为一个凸优化问题:
: <math>\min_{U^T} f = ||\boldsymbol{U}^T \boldsymbol{H} - \boldsymbol{T}||^2_F,</math>
: <math>\min_{U^T} f = ||\boldsymbol{U}^T \boldsymbol{H} - \boldsymbol{T}||^2_F,</math>
,它有闭型解。
,它有闭型解。
不像其他如DBN的深度结构,它的目标不是找到转化的[https://en.wikipedia.org/wiki/Feature_(machine_learning) 特征]表示。这种层级的结构使并行学习更简单了,正如批处理模式优化问题。在完全[https://en.wikipedia.org/wiki/Discriminative_model 判别任务]中,DSN比传统的[https://en.wikipedia.org/wiki/Deep_belief_network 深度置信网络](DBN)表现更好。<ref name="ref17" />
=== 张量深度叠加网络(Tensor deep stacking networks) ===
=== 张量深度叠加网络(Tensor deep stacking networks) ===