Federated reinforcement learning

Question

I am implementing federated deep Q-learning by PyTorch, using multiple agents, each running DQN. My problem is that when I use multiple replay buffers for agents, each appending experiences at the corresponding agent, two elements of experiences in each agent replay buffer, ie, "current_state" and "next_state" becomes the same after the first time slot. I mean in each buffer, we see the same values for current states and the same values for next states . I have included simplified parts of the code and results below. Whay is it changing the current states and next states already exixting in the buffer when doing append? Is there something wrong with defining the buffers as a global variable? or do you have another idea?

<<< time 0 and agent 0:
current_state[0] = [1,2]
next_state[0] = [11,12]
*** experience: (array([ 1., 2.]), 2.0, array([200]), array([ 11., 12.]), 0)
*** buffer: deque([(array([ 1., 2.]), 2.0, array([200]), array([ 11., 12.]), 0)], maxlen=10000)

<<< time 0 and agent 1: 
current_state[1] = [3, 4]
next_state[1] = [13, 14]
*** experience: (array([ 3., 4.]), 4.0, array([400]), array([ 13., 14.]), 0)
*** buffer: deque([(array([ 1., 2.]), 4.0, array([400]), array([ 11., 12.]), 0)], maxlen=10000)

<<< time 1 and agent 0:
current_state = [11,12]
next_state[0] = [110, 120]
*** experience: (array([ 11., 12.]), 6.0, array([600]), array([ 110., 120.]), 0)
*** buffer: deque([(array([ 11., 12.]), 2.0, array([200]), array([ 110., 120.]), 0),(array([ 11., 12.]), 6.0, array([600]), array([ 110., 120.]), 0)], maxlen=10000)

<<< time 1 and agent 1:
current_state = [13, 14]
next_state[1] = [130, 140]
*** experience: (array([ 13., 14.]), 8.0, array([800]), array([ 130., 140.]), 0)
*** buffer: deque([(array([ 13., 14.]), 4.0, array([400]), array([ 130., 140.]), 0),(array([ 13., 14.]), 8.0, array([800]), array([ 130., 140.]), 0)], maxlen=10000)

class BasicBuffer:
def __init__(self, max_size):
    self.max_size = max_size
    self.buffer = deque(maxlen=10000)

def add(self, current_state, action, reward, next_state, done):
    ## """"Add a new experience to buffer.""""
    experience = (current_state, action, np.array([reward]), next_state, done)
    self.buffer.append(experience)

def DQNtrain(env, state_size, agent):
for time in range(time_max):
    for e in range(agents_numbers):
       current_state[e,:]
        next_state_edge[e, :] 
        ## """"Add a new experience to buffer.""""
        replay_buffer_t[e].add(current_state, action, reward, next_state, done)
        current_state[e, :] = next_state[e, :]

if __name__ == '__main__':
   DQNtrain(env, state_size, agent)
   replay_buffer_t = [[] for _ in range(edge_max)]
   for e in range(edge_max):
       replay_buffer_t[e] = BasicBuffer(max_size=agent_buffer_size)

Answer 1

I just found what is causing the problem. I should have used copy.deepcopy() for experiences:

experience = copy.deepcopy((current_state, action, np.array([reward]), next_state, done))
self.buffer.append(experience)

Federated reinforcement learning

Question

1 answers

solution1
0 2021-03-31 05:34:10

Federated reinforcement learning

Question

1 answers

solution1 0 2021-03-31 05:34:10

solution1
0 2021-03-31 05:34:10