简体   繁体   English

在多智能体环境中降低一个智能体的动作采样频率

[英]Decreasing action sampling frequency for one agent in a multi-agent environment

I'm using rllib for the first time, and trying to traini a custom multi-agent RL environment, and would like to train a couple of PPO agents on it.我第一次使用 rllib,并尝试训练一个自定义的多代理 RL 环境,并希望在其上训练几个 PPO 代理。 The implementation hiccup I need to figure out is how to alter the training for one special agent such that this one only takes an action every X timesteps.我需要弄清楚的实现问题是如何改变对一名特工的训练,以便这个特工每 X 个时间步才采取一次行动。 Is it best to only call compute_action() every X timesteps?最好只在每 X 个时间步调用 compute_action() 吗? Or, on the other steps, to mask the policy selection such that they have to re-sample an action until a No-Op is called?或者,在其他步骤中,屏蔽策略选择,以便他们必须重新采样操作,直到调用 No-Op? Or to modify the action that gets fed into the environment + the previous actions in the training batches to be No-Ops?或者将输入环境的动作 + 训练批次中的先前动作修改为 No-Ops?

What's the easiest way to implement this that still takes advantage of rllib's training features?仍然利用 rllib 的训练功能的最简单的实现方法是什么? Do I need to create a custom training loop for this, or is there a way to configure PPOTrainer to do this?我需要为此创建一个自定义训练循环,还是有办法配置 PPOTrainer 来做到这一点?

Thanks谢谢

Let t:=timesteps so far.让 t:= 到目前为止的时间步长。 Give the special agent this feature: t (mod X), and don't process its actions in the environment when t (mod X).= 0: This accomplishes:给特殊代理这个特性:t (mod X),并且当 t (mod X).= 0 时不处理它在环境中的动作:这样完成:

  1. the agent in effect is only taking actions every X timesteps because you are ignoring all the other ones实际上,代理仅在每 X 个时间步执行一次操作,因为您忽略了所有其他操作
  2. the agent can learn that only the actions taken every X timesteps will affect the future rewards智能体可以了解到,只有每 X 时间步采取的行动才会影响未来的奖励

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用于多代理游戏的 Openai 健身房环境 - Openai gym environment for multi-agent games 多智能体强化学习环境 公共交通问题 - Multi-agent reinforcement learning environment Public transport problem 多主体健身房环境中的随机主体 - Random agent on multi-agent gym environments 在NetLogo中实施强化学习(在多智能体模型中学习) - Implementing reinforcement learning in NetLogo (Learning in multi-agent models) 如何设置 rllib 多代理 PPO? - How to set up rllib multi-agent PPO? 在强化学习中,agent 和 environment 有不同的状态还是只有一种状态? - In Reinforcement learning , do both agent and environment have different states or there is only one state? 如何在openai体育馆环境中修改代理? - How to modify the agent in an openai gym environment? 我可以使用 DQN 和样本在不与环境交互的情况下训练代理吗? - Can I train an agent without interactions with environment using DQN and samples? 如何告诉代理动作空间中的某些动作当前在健身房中不可用? - How to tell an agent that some actions in the action space are currently not available in gym? 自定义环境 Gym,用于使用 DDPG Agent 进行步进功能处理 - Custom environment Gym for step function processing with DDPG Agent
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM