简体   繁体   English

为 pytorch.nn.lstm 批处理训练创建批处理的正确方法

[英]correct way to create batch for pytorch.nn.lstm batch training

I am new to lstm and have a code that looks like this.我是 lstm 的新手,并且有一个看起来像这样的代码。

class TD3(object):

def __init__(
    self,
    state_dim,
    action_dim,
    max_action,
    ):
    self.actor = Actor(state_dim, action_dim, max_action).to(device)
    self.actor_target = Actor(state_dim, action_dim,
                              max_action).to(device)
    self.actor_target.load_state_dict(self.actor.state_dict())
    self.actor_optimizer = torch.optim.Adam(self.actor.parameters())
    self.critic = Critic(state_dim, action_dim).to(device)
    self.critic_target = Critic(state_dim, action_dim).to(device)
    self.critic_target.load_state_dict(self.critic.state_dict())
    self.critic_optimizer = \
        torch.optim.Adam(self.critic.parameters())
    self.max_action = max_action

def select_action(self, state, hx1):
    (hx, cx) = hx1
    x = self.actor(state, hx1)
    return x

def train(
    self,
    replay_buffer,
    iterations,
    batch_size=50,
    discount=0.99,
    tau=0.005,
    policy_noise=0.2,
    noise_clip=0.5,
    policy_freq=2,
    ):

    for it in range(iterations):
        print ('it: ', it, ' iterations: ', iterations)

  # Step 4: We sample a batch of transitions (s, s’, a, r) from the memory

        (batch_states, batch_next_states, batch_actions,
         batch_rewards, batch_dones) = \
            replay_buffer.sample(batch_size)

        batch_states = batch_states.astype(float)
        batch_next_states = batch_next_states.astype(float)
        batch_actions = batch_actions.astype(float)
        batch_rewards = batch_rewards.astype(float)
        batch_dones = batch_dones.astype(float)

        state = torch.from_numpy(batch_states)
        next_state = torch.from_numpy(batch_next_states)
        action = torch.from_numpy(batch_actions)
        reward = torch.from_numpy(batch_rewards)
        done = torch.from_numpy(batch_dones)

        b_size = 1
        seq_len = state.shape[0]
        batch = b_size
        input_size = state_dim

  # for h and c shape (num_layers * num_directions, batch, hidden_size)

        h0 = torch.zeros(1, 1, 256)
        c0 = torch.zeros(1, 1, 256)
        state = torch.reshape(state, (seq_len, batch, state_dim))
        next_state = torch.reshape(next_state, (seq_len, batch,
                state_dim))
        done = torch.reshape(done, (seq_len, batch, 1))
        reward = torch.reshape(reward, (seq_len, batch, 1))

  # Step 5: From the next state s’, the Actor target plays the next action a’

        next_action = self.actor_target(next_state, (h0, c0))
        next_action = next_action[0]

  # Step 6: We add Gaussian noise to this next action a’ and we clamp it in a range of values supported by the environment

        noise = torch.Tensor(next_action).data.normal_(0,
                policy_noise).to(device)
        noise = noise.clamp(-noise_clip, noise_clip)
        next_action = (next_action + noise).clamp(-self.max_action,
                self.max_action)

  # Step 7: The two Critic targets take each the couple (s’, a’) as input and return two Q-values Qt1(s’,a’) and Qt2(s’,a’) as outputs

        result = self.critic_target(next_state, next_action, (h0,
                c0))
        target_Q1 = result[0]
        target_Q2 = result[1]

  # Step 8: We keep the minimum of these two Q-values: min(Qt1, Qt2)

        target_Q = torch.min(target_Q1, target_Q2).double()

  # Step 9: We get the final target of the two Critic models, which is: Qt = r + γ * min(Qt1, Qt2), where γ is the discount factor

        target_Q = reward + (1 - done) * discount * target_Q

  # Step 10: The two Critic models take each the couple (s, a) as input and return two Q-values Q1(s,a) and Q2(s,a) as outputs

        action = torch.reshape(action, next_action.shape)
        result = self.critic(state, action, (h0, c0))
        current_Q1 = result[0]
        current_Q2 = result[1]

  # Step 11: We compute the loss coming from the two Critic models: Critic Loss = MSE_Loss(Q1(s,a), Qt) + MSE_Loss(Q2(s,a), Qt)

        critic_loss = F.mse_loss(current_Q1, target_Q) \
            + F.mse_loss(current_Q2, target_Q)

  # Step 12: We backpropagate this Critic loss and update the parameters of the two Critic models with a SGD optimizer

        self.critic_optimizer.zero_grad()
        critic_loss.backward()
        self.critic_optimizer.step()

  # Step 13: Once every two iterations, we update our Actor model by performing gradient ascent on the output of the first Critic model

        if it % policy_freq == 0:
            out = self.actor(state, (h0, c0))
            out = out[0]
            (actor_loss, hx, cx) = self.critic.Q1(state, out, (h0,
                    c0))
            actor_loss = -1 * actor_loss.mean()
            self.actor_optimizer.zero_grad()
            actor_loss.backward()
            self.actor_optimizer.step()

  # Step 14: Still once every two iterations, we update the weights of the Actor target by polyak averaging

        for (param, target_param) in zip(self.actor.parameters(),
                self.actor_target.parameters()):
            target_param.data.copy_(tau * param.data + (1 - tau)
                    * target_param.data)

  # Step 15: Still once every two iterations, we update the weights of the Critic target by polyak averaging

        for (param, target_param) in zip(self.critic.parameters(),
                self.critic_target.parameters()):
            target_param.data.copy_(tau * param.data + (1 - tau)
                    * target_param.data)

Inside the train function I am having a for loop for training different batches...but I have come to know that it also supports batch processing.在火车 function 内部,我有一个用于训练不同批次的 for 循环……但我知道它也支持批处理。

can you please let me know what is the correct way to create a batch tensor of 2 batches from 2 tensors representing each batch in pytorch.你能告诉我从代表 pytorch 中的每个批次的 2 个张量创建 2 个批次的批次张量的正确方法是什么。 eg.例如。 how to convert 2 tensors of (1,1,256)--> tensor of (1,2,256) and not (2,1,256)#such that data is not overlapped for the input tensors to get output tensor.如何转换 (1,1,256) 的 2 个张量--> (1,2,256) 而不是 (2,1,256) 的张量#这样输入张量的数据不会重叠以获得 output 张量。

Thanks in advance.提前致谢。

correct way to create a batch tensor ?创建批量张量的正确方法

I think you want to know the correct way to initialize the h0 and co variables?我想您想知道初始化h0co变量的正确方法吗?

If I'm correct, then you could do:如果我是正确的,那么你可以这样做:

def init_hidden(self):
    weight = next(self.parameters())
    return (weight.new_zeros(self.num_layers, self.batch_size, self.hidden_size),
            weight.new_zeros(self.num_layers, self.batch_size, self.hidden_size))

Of course, you need to initialize num_layers , batch_size and hidden_size in the __init__ method.当然,你需要在__init__方法中初始化num_layersbatch_sizehidden_size

h0, co = self.init_hidden()

how to convert 2 tensors?如何转换 2 个张量?

You could use reshape method.你可以使用reshape方法。

h_new = h0.reshape(1, 2, 256)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM