给定一个状态s和动作a,r和s’是执行(s,a)后的奖励和状态,a'动作集。<ref>Torrey, L. Crowd Simulation Via Multi-agent Reinforcement Learning. In: ''Proceedings of the Sixth AAAI Conference On Artificial Intelligence and Interactive Digital Entertainment''. AAAI Press, Menlo Park (2010)</ref> | 给定一个状态s和动作a,r和s’是执行(s,a)后的奖励和状态,a'动作集。<ref>Torrey, L. Crowd Simulation Via Multi-agent Reinforcement Learning. In: ''Proceedings of the Sixth AAAI Conference On Artificial Intelligence and Interactive Digital Entertainment''. AAAI Press, Menlo Park (2010)</ref> |