简体   繁体   中英

Merging and splitting time and action steps from TF-agents

I am trying to use TF-agents in a simple multi-agent non-cooperative parallel game. To simplify, I have two agents, defined with TF-agents. I defined a custom gym environment that takes as input the combined actions of agents and return an observation. The agents' policies should not take the full observation as input, only part of it. So I need to do two things:

  • Split the time_step instance returned by the TF-agents environment wrapper to feed to the agents' policies independently
  • Merge the action_step instances coming from the agents' policies to feed the environment.

If agent1_policy and agent2_policy are the two TF-agents policies and environment is a TF-agents environment, I would like to be able to do that to collect steps:

from tf_agents.trajectories import trajectory

time_step = environment.current_time_step()

# Split the time_step to have partial observability
time_step1, time_step2 = split(time_step)

# Get action from each agent
action_step1 = agent1_policy.action(time_step1)
action_step2 = agent2_policy.action(time_step2)

# Merge the independent actions
action_merged = merge(action_step1, action_step2)

# Use the merged actions to have the next step
next_time_step = environment.step(action_merged)

# Split the next step too
next_time_step1, next_time_step2 = split(next_time_step)

# Build two distinct trajectories
traj1 = trajectory.from_transition(time_step1, action_step1, next_time_step1)
traj2 = trajectory.from_transition(time_step2, action_step2, next_time_step2)

traj1 and traj2 are then added to buffers that are used to train the two agents.

How should I define the functions merge and split in this example?

This can be done by defining the proper action_spec and observation_spec in the environment class. See this documentation for an example of producing an observation that is a dictionary of tensors. A similar approach can be used for accepting an action that is a dictionary or a tuple.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM