[英]How to run multiple experiments in parallel and select best cases for refinement in deep reinforcement learning?
我正在使用健身房的自定義環境,目前正在嘗試並行化我的 D3QN model 的訓練,因為完成一集需要很多時間。
有沒有辦法使用 Keras 和 tensorflow 並行化訓練並僅采用最佳案例進行細化?
def run(self):
reward_list = []
ave_reward_list = []
decay_step = 0
start_time = time.time()
for e in range(self.EPISODES):
state = self.env.reset()
state = np.asarray(state).reshape((1, 24))
state = (state - state.mean()) / state.std()
done = False
i = 0
first_ps = 0
total_reward = 0
#counter = 0
while not done:
#self.env.render()
decay_step += 1
action, explore_probability = self.act(state, decay_step)
acting = [action, first_ps]
next_state, reward, done, _ = self.env.step(acting)
next_state = np.asarray(next_state).reshape((1, 24))
next_state = (next_state - next_state.mean()) / next_state.std()
#print('next_state: {}'.format(next_state))
first_ps = 1
self.remember(state, action, reward, next_state, done)
state = next_state
i += 1
total_reward += reward
#print(total_reward)
#counter +=1
#if counter==100:
#self.update_target_model()
#counter = 0
if done:
# track the reward list
reward_list.append(total_reward)
if (e+1) % 100 == 0:
ave_reward = np.mean(reward_list)
ave_reward_list.append(ave_reward)
reward_list = []
# every step update target model
self.update_target_model()
# every episode, plot the result
average = self.PlotModel(i, e)
# every episode, plot the total_reward
#average_reward = self.PlotModel_reward(total_reward, e)
print("episode: {}/{}, iterations: {}, e: {:.2}, average: {}, tot_reward: {}".format(e, self.EPISODES, i, explore_probability, average, total_reward))
if e==self.EPISODES-1:
hours, rem = divmod((time.time() - start_time), 3600)
minutes, seconds = divmod(rem, 60)
print("The running time is: {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
print("Saving trained model to", self.Model_name)
self.save(self.Model_name+'_'+str(int(total_reward))+".h5")
self.replay(done)
我的主function:
if __name__ == "__main__":
env_name = 'trainSim-v0'
agent = DQNAgent(env_name)
agent.run()
如果您有多個 GPU,您只能這樣做。 一個 GPU 只能專注於一項任務,因為您的 model 已經很慢,因此您需要升級硬件,或者需要更多的 GPU 來訓練單個模型(與您的問題相反)。 或者你可以得到更好的 GPU 來訓練 model。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.