简体   繁体   中英

Federated reinforcement learning

I am implementing federated deep Q-learning by PyTorch, using multiple agents, each running DQN. My problem is that when I use multiple replay buffers for agents, each appending experiences at the corresponding agent, two elements of experiences in each agent replay buffer, ie, "current_state" and "next_state" becomes the same after the first time slot. I mean in each buffer, we see the same values for current states and the same values for next states . I have included simplified parts of the code and results below. Whay is it changing the current states and next states already exixting in the buffer when doing append? Is there something wrong with defining the buffers as a global variable? or do you have another idea?

<<< time 0 and agent 0:
current_state[0] = [1,2]
next_state[0] = [11,12]
*** experience: (array([ 1., 2.]), 2.0, array([200]), array([ 11., 12.]), 0)
*** buffer: deque([(array([ 1., 2.]), 2.0, array([200]), array([ 11., 12.]), 0)], maxlen=10000)

<<< time 0 and agent 1: 
current_state[1] = [3, 4]
next_state[1] = [13, 14]
*** experience: (array([ 3., 4.]), 4.0, array([400]), array([ 13., 14.]), 0)
*** buffer: deque([(array([ 1., 2.]), 4.0, array([400]), array([ 11., 12.]), 0)], maxlen=10000)

<<< time 1 and agent 0:
current_state = [11,12]
next_state[0] = [110, 120]
*** experience: (array([ 11., 12.]), 6.0, array([600]), array([ 110., 120.]), 0)
*** buffer: deque([(array([ 11., 12.]), 2.0, array([200]), array([ 110., 120.]), 0),(array([ 11., 12.]), 6.0, array([600]), array([ 110., 120.]), 0)], maxlen=10000)

<<< time 1 and agent 1:
current_state = [13, 14]
next_state[1] = [130, 140]
*** experience: (array([ 13., 14.]), 8.0, array([800]), array([ 130., 140.]), 0)
*** buffer: deque([(array([ 13., 14.]), 4.0, array([400]), array([ 130., 140.]), 0),(array([ 13., 14.]), 8.0, array([800]), array([ 130., 140.]), 0)], maxlen=10000)
class BasicBuffer:
def __init__(self, max_size):
    self.max_size = max_size
    self.buffer = deque(maxlen=10000)

def add(self, current_state, action, reward, next_state, done):
    ## """"Add a new experience to buffer.""""
    experience = (current_state, action, np.array([reward]), next_state, done)

def DQNtrain(env, state_size, agent):
for time in range(time_max):
    for e in range(agents_numbers):
        next_state_edge[e, :] 
        ## """"Add a new experience to buffer.""""
        replay_buffer_t[e].add(current_state, action, reward, next_state, done)
        current_state[e, :] = next_state[e, :]

if __name__ == '__main__':
   DQNtrain(env, state_size, agent)
   replay_buffer_t = [[] for _ in range(edge_max)]
   for e in range(edge_max):
       replay_buffer_t[e] = BasicBuffer(max_size=agent_buffer_size)

I just found what is causing the problem. I should have used copy.deepcopy() for experiences:

experience = copy.deepcopy((current_state, action, np.array([reward]), next_state, done))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM