为什么老虎机问题在强化学习中也称为一步/状态 MDP? [英]Why the bandit problem is also called a one-step/state MDP in Reinforcement learning?
基于模型的大状态和动作空间强化学习算法的推广 [英]Generalizing the Policy for Model-based reinforcement learning algorithm with large state and action spaces
强化学习:为连续动作和连续状态空间选择离散化步骤和性能指标的困境 [英]Reinforcement Learning: The dilemma of choosing discretization steps and performance metrics for continuous action and continuous state space