简体   繁体   English

如何并行运行多个实验和 select 在深度强化学习中进行细化的最佳案例?

[英]How to run multiple experiments in parallel and select best cases for refinement in deep reinforcement learning?

I am working on a custom environment using gym and currently trying to parallelize the training of my D3QN model as it is taking a lot of time to finish an episode.我正在使用健身房的自定义环境,目前正在尝试并行化我的 D3QN model 的训练,因为完成一集需要很多时间。

Is there a way to parallelize the training and take only best cases for refinement using Keras and tensorflow?有没有办法使用 Keras 和 tensorflow 并行化训练并仅采用最佳案例进行细化?

    def run(self):
        reward_list = []
        ave_reward_list = []
        decay_step = 0
        start_time = time.time()
        for e in range(self.EPISODES):
            state = self.env.reset()
            state = np.asarray(state).reshape((1, 24))
            state = (state - state.mean()) / state.std()
            done = False
            i = 0
            first_ps = 0
            total_reward = 0
            #counter = 0
            while not done:
                #self.env.render()
                decay_step += 1
                action, explore_probability = self.act(state, decay_step)
                acting = [action, first_ps]
                next_state, reward, done, _ = self.env.step(acting)
                next_state = np.asarray(next_state).reshape((1, 24))
                next_state = (next_state - next_state.mean()) / next_state.std()
                #print('next_state: {}'.format(next_state))
                first_ps = 1

                self.remember(state, action, reward, next_state, done)
                state = next_state
                i += 1
                total_reward += reward
                #print(total_reward)
                
                #counter +=1
                #if counter==100:
                    #self.update_target_model()
                    #counter = 0
                
                if done:
                    # track the reward list
                    reward_list.append(total_reward)
                    if (e+1) % 100 == 0:
                        ave_reward = np.mean(reward_list)
                        ave_reward_list.append(ave_reward)
                        reward_list = []
                    
                    # every step update target model
                    self.update_target_model()

                    # every episode, plot the result
                    average = self.PlotModel(i, e)
                    
                    # every episode, plot the total_reward
                    #average_reward = self.PlotModel_reward(total_reward, e)
                    
                    print("episode: {}/{}, iterations: {}, e: {:.2}, average: {}, tot_reward: {}".format(e, self.EPISODES, i, explore_probability, average, total_reward))
                    
                    if e==self.EPISODES-1:
                        hours, rem = divmod((time.time() - start_time), 3600)
                        minutes, seconds = divmod(rem, 60)
                        print("The running time is: {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
                        
                        print("Saving trained model to", self.Model_name)
                        self.save(self.Model_name+'_'+str(int(total_reward))+".h5")
                        
                self.replay(done)

My main function:我的主function:

if __name__ == "__main__":
        env_name = 'trainSim-v0'
        agent = DQNAgent(env_name)
        agent.run()

you can only do this if you have multiple GPU.如果您有多个 GPU,您只能这样做。 one GPU can only focus on 1 task, since your model is already slow so you need to upgrade your hardware, you either need more GPUs to train single model(opposite of your question).一个 GPU 只能专注于一项任务,因为您的 model 已经很慢,因此您需要升级硬件,或者需要更多的 GPU 来训练单个模型(与您的问题相反)。 or you can get better GPU to train model.或者你可以得到更好的 GPU 来训练 model。

https://keras.io/guides/distributed_training/ https://keras.io/guides/distributed_training/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM