简体   繁体   English

在任意较大的动作/状态空间中进行强化学习

[英]Reinforcement Learning in arbitrarily large action/state spaces

I'm interested to use Deep Reinforcement Learning in order to find an - unique - optimal path back home among (too many) possibilities and a few (required) intermediate stopes (for instance, buy a coffee or refuel). 我对使用深度强化学习感兴趣,以便在(太多)可能性和一些(必需)中间采场(例如,购买咖啡或加油)中找到一条独特的最佳回家之路。

Furthermore, I want to apply this in cases where the agent doesn't know a “model” of the environment, and the agent can't try all possible combinations of states and actions at all. 此外,我想在代理不了解环境的“模型”并且代理完全无法尝试状态和动作的所有可能组合的情况下应用此方法。 Ie needing to use approximation techniques in Q-value function (and/or policy). 即需要在Q值函数(和/或策略)中使用近似技术。

I've read of methods for facing cases like this - where rewards, if any, are sparse and binary - like Monte Carlo Tree search (which implies some sort of modeling and planning, according to my understandings) or Hindsight Experience Replay (HER), applying ideas of DDPG. 我已经读过处理此类情况的方法-奖励(如果有的话)是稀疏的和二进制的-例如,蒙特卡洛树搜索(根据我的理解,这意味着某种建模和计划)或Hindsight Experience Replay(HER) ,运用DDPG的想法。

But there are so many different kind of algorithms to consider, I'm a bit confused what's best to begin with. 但是有太多不同种类的算法需要考虑,我对最好的开始有点困惑。 I know it's a difficult problem, and maybe it's too naive to ask this, but Is there any clear, direct and we'll-known way to solve the problem I want to face? 我知道这是一个棘手的问题,也许问这个问题太天真了,但是有没有明确,直接且我们熟知的方法来解决我要面对的问题?

Thanks a lot! 非常感谢!

Matias 马蒂亚斯

If the final destination is fixed as in this case(home) you can go for dynamic search as a* will not work due to changeable enviornment. 如果最终目的地是固定的(在这种情况下(家)),则可以进行动态搜索,因为a *由于环境变化而无法使用。 And if you want to use deep learning algorithm then go for a3c with experience replay due to the large action/state spaces.It capable of handling complex probelm. 如果您想使用深度学习算法,那么由于动作/状态空间较大,请选择具有重播经验的a3c,它能够处理复杂的探针。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于模型的大状态和动作空间强化学习算法的推广 - Generalizing the Policy for Model-based reinforcement learning algorithm with large state and action spaces 加强对大型国家空间中多个参与者的政策的学习 - Reinforcement learning of a policy for multiple actors in large state spaces 增强学习以获取连续的状态和动作空间 - Reinforcement learning for continuous state and action space 强化学习中的状态依赖动作集 - State dependent action set in reinforcement learning 如何将强化学习应用于连续动作空间? - How can I apply reinforcement learning to continuous action spaces? 当行动不影响强化学习中的状态时,这叫什么? - What is it called when the action doesnt affect the state in reinforcement learning? 强化学习中的多维动作空间 - Multidimensional Action Space in Reinforcement Learning 在强化学习中将离散动作转换为连续动作 - Transfer Discrete action to Continuous action in Reinforcement Learning 强化学习:为连续动作和连续状态空间选择离散化步骤和性能指标的困境 - Reinforcement Learning: The dilemma of choosing discretization steps and performance metrics for continuous action and continuous state space DQN(强化学习):状态应该标准化吗? - DQN(Reinforcement learning) : should state be standardized?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM