更改
跳到导航
跳到搜索
第177行:
第177行:
−
− ==== 收敛递归学习算法(Convergent recursive learning algorithm) ====
− 这是一种特别为[https://en.wikipedia.org/wiki/Cerebellar_model_articulation_controller 小脑模型关节控制器](CMAC)神经网络设计的学习方法。2014,一种递推最小二乘法被引入在线训练CMAC神经网络。这个算法可以一步收敛,然后根据任何新输入的数据在一步内更新所有权重。最初,这个算法有''O''(''N''<sup>3</sup>)的[https://en.wikipedia.org/wiki/Computational_complexity_theory 计算复杂度]。基于[https://en.wikipedia.org/wiki/QR_decomposition QR分解],这种递推学习算法被简化为''O''(''N'').
→收敛递归学习算法(Convergent recursive learning algorithm)
强化学习中,ANN通常被用作整个算法的一部分<ref>{{cite conference| author = Dominic, S. |author2=Das, R. |author3=Whitley, D. |author4=Anderson, C. |date=July 1991 | title = Genetic reinforcement learning for neural networks | conference = IJCNN-91-Seattle International Joint Conference on Neural Networks | booktitle = IJCNN-91-Seattle International Joint Conference on Neural Networks | publisher = IEEE | location = Seattle, Washington, USA | doi = 10.1109/IJCNN.1991.155315 | accessdate = | isbn = 0-7803-0164-1 }}</ref><ref>{{cite journal |last=Hoskins |first=J.C. |author2=Himmelblau, D.M. |title=Process control via artificial neural networks and reinforcement learning |journal=Computers & Chemical Engineering |year=1992 |volume=16 |pages=241–251 |doi=10.1016/0098-1354(92)80045-B |issue=4}}</ref>。[https://en.wikipedia.org/wiki/Dimitri_Bertsekas Bertsekas]和Tsiksiklis<ref>{{cite book|url=https://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|title=Neuro-dynamic programming|first=D.P.|first2=J.N.|publisher=Athena Scientific|year=1996|isbn=1-886529-10-8|location=|page=512|pages=|author=Bertsekas|author2=Tsitsiklis}}</ref> 给[https://en.wikipedia.org/wiki/Dynamic_programming 动态编程]加上ANN(给出神经动力的编程)并应用到如[https://en.wikipedia.org/wiki/Vehicle_routing 车辆路径]<ref>{{cite journal |last=Secomandi |first=Nicola |title=Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands |journal=Computers & Operations Research |year=2000 |volume=27 |pages=1201–1225 |doi=10.1016/S0305-0548(99)00146-X |issue=11–12}}</ref> 和[https://en.wikipedia.org/wiki/Natural_resource_management 自然资源管理]<ref>{{cite conference| author = de Rigo, D. |author2=Rizzoli, A. E. |author3=Soncini-Sessa, R. |author4=Weber, E. |author5=Zenesi, P. | year = 2001 | title = Neuro-dynamic programming for the efficient management of reservoir networks | conference = MODSIM 2001, International Congress on Modelling and Simulation | conferenceurl = http://www.mssanz.org.au/MODSIM01/MODSIM01.htm | booktitle = Proceedings of MODSIM 2001, International Congress on Modelling and Simulation | publisher = Modelling and Simulation Society of Australia and New Zealand | location = Canberra, Australia | doi = 10.5281/zenodo.7481 | url = https://zenodo.org/record/7482/files/de_Rigo_etal_MODSIM2001_activelink_authorcopy.pdf | accessdate = 29 July 2012 | isbn = 0-867405252 }}</ref><ref>{{cite conference| author = Damas, M. |author2=Salmeron, M. |author3=Diaz, A. |author4=Ortega, J. |author5=Prieto, A. |author6=Olivares, G.| year = 2000 | title = Genetic algorithms and neuro-dynamic programming: application to water supply networks | conference = 2000 Congress on Evolutionary Computation | booktitle = Proceedings of 2000 Congress on Evolutionary Computation | publisher = IEEE | location = La Jolla, California, USA | doi = 10.1109/CEC.2000.870269 | accessdate = | isbn = 0-7803-6375-2 }}</ref>或[https://en.wikipedia.org/wiki/Medicine 医药]<ref>{{cite journal |last=Deng |first=Geng |author2=Ferris, M.C. |title=Neuro-dynamic programming for fractionated radiotherapy planning |journal=Springer Optimization and Its Applications |year=2008 |volume=12 |pages=47–70 |doi=10.1007/978-0-387-73299-2_3|citeseerx=10.1.1.137.8288 |series=Springer Optimization and Its Applications |isbn=978-0-387-73298-5 }}</ref>领域中的多维非线性问题。因为ANN能够减小精度损失,甚至在为数值逼近原始控制问题解而降低离散化网格密度时。
强化学习中,ANN通常被用作整个算法的一部分<ref>{{cite conference| author = Dominic, S. |author2=Das, R. |author3=Whitley, D. |author4=Anderson, C. |date=July 1991 | title = Genetic reinforcement learning for neural networks | conference = IJCNN-91-Seattle International Joint Conference on Neural Networks | booktitle = IJCNN-91-Seattle International Joint Conference on Neural Networks | publisher = IEEE | location = Seattle, Washington, USA | doi = 10.1109/IJCNN.1991.155315 | accessdate = | isbn = 0-7803-0164-1 }}</ref><ref>{{cite journal |last=Hoskins |first=J.C. |author2=Himmelblau, D.M. |title=Process control via artificial neural networks and reinforcement learning |journal=Computers & Chemical Engineering |year=1992 |volume=16 |pages=241–251 |doi=10.1016/0098-1354(92)80045-B |issue=4}}</ref>。[https://en.wikipedia.org/wiki/Dimitri_Bertsekas Bertsekas]和Tsiksiklis<ref>{{cite book|url=https://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|title=Neuro-dynamic programming|first=D.P.|first2=J.N.|publisher=Athena Scientific|year=1996|isbn=1-886529-10-8|location=|page=512|pages=|author=Bertsekas|author2=Tsitsiklis}}</ref> 给[https://en.wikipedia.org/wiki/Dynamic_programming 动态编程]加上ANN(给出神经动力的编程)并应用到如[https://en.wikipedia.org/wiki/Vehicle_routing 车辆路径]<ref>{{cite journal |last=Secomandi |first=Nicola |title=Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands |journal=Computers & Operations Research |year=2000 |volume=27 |pages=1201–1225 |doi=10.1016/S0305-0548(99)00146-X |issue=11–12}}</ref> 和[https://en.wikipedia.org/wiki/Natural_resource_management 自然资源管理]<ref>{{cite conference| author = de Rigo, D. |author2=Rizzoli, A. E. |author3=Soncini-Sessa, R. |author4=Weber, E. |author5=Zenesi, P. | year = 2001 | title = Neuro-dynamic programming for the efficient management of reservoir networks | conference = MODSIM 2001, International Congress on Modelling and Simulation | conferenceurl = http://www.mssanz.org.au/MODSIM01/MODSIM01.htm | booktitle = Proceedings of MODSIM 2001, International Congress on Modelling and Simulation | publisher = Modelling and Simulation Society of Australia and New Zealand | location = Canberra, Australia | doi = 10.5281/zenodo.7481 | url = https://zenodo.org/record/7482/files/de_Rigo_etal_MODSIM2001_activelink_authorcopy.pdf | accessdate = 29 July 2012 | isbn = 0-867405252 }}</ref><ref>{{cite conference| author = Damas, M. |author2=Salmeron, M. |author3=Diaz, A. |author4=Ortega, J. |author5=Prieto, A. |author6=Olivares, G.| year = 2000 | title = Genetic algorithms and neuro-dynamic programming: application to water supply networks | conference = 2000 Congress on Evolutionary Computation | booktitle = Proceedings of 2000 Congress on Evolutionary Computation | publisher = IEEE | location = La Jolla, California, USA | doi = 10.1109/CEC.2000.870269 | accessdate = | isbn = 0-7803-6375-2 }}</ref>或[https://en.wikipedia.org/wiki/Medicine 医药]<ref>{{cite journal |last=Deng |first=Geng |author2=Ferris, M.C. |title=Neuro-dynamic programming for fractionated radiotherapy planning |journal=Springer Optimization and Its Applications |year=2008 |volume=12 |pages=47–70 |doi=10.1007/978-0-387-73299-2_3|citeseerx=10.1.1.137.8288 |series=Springer Optimization and Its Applications |isbn=978-0-387-73298-5 }}</ref>领域中的多维非线性问题。因为ANN能够减小精度损失,甚至在为数值逼近原始控制问题解而降低离散化网格密度时。
强化学习范式中的任务是控制问题,[https://en.wikipedia.org/wiki/Game 游戏]和其他序列决策任务。
强化学习范式中的任务是控制问题,[https://en.wikipedia.org/wiki/Game 游戏]和其他序列决策任务。
===学习算法===
===学习算法===