簡體   English   中英

如何並行運行多個實驗和 select 在深度強化學習中進行細化的最佳案例?

[英]How to run multiple experiments in parallel and select best cases for refinement in deep reinforcement learning?

我正在使用健身房的自定義環境,目前正在嘗試並行化我的 D3QN model 的訓練,因為完成一集需要很多時間。

有沒有辦法使用 Keras 和 tensorflow 並行化訓練並僅采用最佳案例進行細化?

    def run(self):
        reward_list = []
        ave_reward_list = []
        decay_step = 0
        start_time = time.time()
        for e in range(self.EPISODES):
            state = self.env.reset()
            state = np.asarray(state).reshape((1, 24))
            state = (state - state.mean()) / state.std()
            done = False
            i = 0
            first_ps = 0
            total_reward = 0
            #counter = 0
            while not done:
                #self.env.render()
                decay_step += 1
                action, explore_probability = self.act(state, decay_step)
                acting = [action, first_ps]
                next_state, reward, done, _ = self.env.step(acting)
                next_state = np.asarray(next_state).reshape((1, 24))
                next_state = (next_state - next_state.mean()) / next_state.std()
                #print('next_state: {}'.format(next_state))
                first_ps = 1

                self.remember(state, action, reward, next_state, done)
                state = next_state
                i += 1
                total_reward += reward
                #print(total_reward)
                
                #counter +=1
                #if counter==100:
                    #self.update_target_model()
                    #counter = 0
                
                if done:
                    # track the reward list
                    reward_list.append(total_reward)
                    if (e+1) % 100 == 0:
                        ave_reward = np.mean(reward_list)
                        ave_reward_list.append(ave_reward)
                        reward_list = []
                    
                    # every step update target model
                    self.update_target_model()

                    # every episode, plot the result
                    average = self.PlotModel(i, e)
                    
                    # every episode, plot the total_reward
                    #average_reward = self.PlotModel_reward(total_reward, e)
                    
                    print("episode: {}/{}, iterations: {}, e: {:.2}, average: {}, tot_reward: {}".format(e, self.EPISODES, i, explore_probability, average, total_reward))
                    
                    if e==self.EPISODES-1:
                        hours, rem = divmod((time.time() - start_time), 3600)
                        minutes, seconds = divmod(rem, 60)
                        print("The running time is: {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
                        
                        print("Saving trained model to", self.Model_name)
                        self.save(self.Model_name+'_'+str(int(total_reward))+".h5")
                        
                self.replay(done)

我的主function:

if __name__ == "__main__":
        env_name = 'trainSim-v0'
        agent = DQNAgent(env_name)
        agent.run()

如果您有多個 GPU,您只能這樣做。 一個 GPU 只能專注於一項任務,因為您的 model 已經很慢,因此您需要升級硬件,或者需要更多的 GPU 來訓練單個模型(與您的問題相反)。 或者你可以得到更好的 GPU 來訓練 model。

https://keras.io/guides/distributed_training/

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM