简体   繁体   中英

Confused on using dropout in batch gradient descent with Q-learning

I am using PyTorch and adding dropout layers to my inner layers.

class MLP(nn.Module):
  #def __init__(self, n_inputs, n_action, n_hidden_layers=2, hidden_dim=8, drop=0.25):
  def __init__(self, console_file, n_inputs, n_action, layers_list, drop=0.25):
    super(MLP, self).__init__()
    print("Layers structure:")
    console_file.write("Layers structure:\n")
    print(f"inputs: {n_inputs}")
    console_file.write(f"inputs: {n_inputs}\n")
    self.layers = []
    for i, layer_size in enumerate(layers_list):
      if i == 0:
        layer = nn.Linear(n_inputs, layer_size)
      else:
        layer = nn.Linear(layers_list[i-1], layer_size)
      self.layers.append(layer)
      print(f"layer {i}: {layer_size}")
      console_file.write(f"layer {i}: {layer_size}\n")
      self.layers.append(nn.LeakyReLU(0.1))
      if drop > 0.01:
        #self.layers.append(nn.Dropout(p = drop**(len(layers_list)-i)))
        self.layers.append(nn.Dropout(p = drop))
        #print(f"drop {i}: {drop**(len(layers_list)-i)}")
        print(f"drop {i}: {drop}")
        #console_file.write(f"drop {i}: {drop**(len(layers_list)-i)}\n")
        console_file.write(f"drop {i}: {drop}\n")
    # final layer
    self.layers.append(nn.Linear(layers_list[-1], n_action))
    self.layers = nn.Sequential(*self.layers)
    print(f"outputs: {n_action}")
    console_file.write(f"outputs: {n_action}\n")
    print("========= NN structure =========\n")
    console_file.write("========= NN structure =========\n\n")
  def forward(self, X):
    return self.layers(X)

  def save_weights(self, path):
    torch.save(self.state_dict(), path)

  def load_weights(self, path):
    self.load_state_dict(torch.load(path))

I am making sure to turn training on during training and eval mode outside training (removing the droput layers).

    self.model = he.MLP(console_file, state_size, self.action_size, DIMENSION, DROPOUT)
    if DROPOUT > 0.01:
      self.model.train()
...
      if DROPOUT > 0.01: # before testing
        agent.model.eval()
...
    if DROPOUT > 0.01: # after testing
      self.model.train()

My confusion is as to HOW if at all PyTorch keeps track of which neurons it disables between the forward propagation and the later time where a random batch of output-and-reward is selected and backpropagation is performed on a specific case originating in a forward propagation with specific neurons disabled.

def train_one_step(model, criterion, optimizer, inputs, targets):  
  # convert to tensors
  inputs = torch.from_numpy(inputs.astype(np.float32))
  targets = torch.from_numpy(targets.astype(np.float32))

  # zero the parameter gradients
  optimizer.zero_grad()

  # Forward pass
  outputs = model(inputs)
  loss = criterion(outputs, targets)

  # Backward and optimize
  loss.backward()
  optimizer.step()

inputs and targets are from random selection (batch over buffered history)

For me it makes sense to have the SAME neurons disabled between the forward and back propagation but since I can find nothing on the subject everything would suggest that the dropout will apply randomly and therefor nonsensically to me. Either the forward and back propagations need to happen identically (and somehow PyTorch manages to remember by some markers, because I don't seem to pass any markers during batching) or I need to understand why they can be randomly different.

I'm not sure what is the problem, but let me try to explain how things work.

The .train() and .eval() calls only change the .training flag to True or False .

The Dropout layer samples the noise during the forward pass. Here's an example of forward implementation (I removed the if s for the alpha and feature dropouts for readability):

template<bool feature_dropout, bool alpha_dropout, bool inplace, typename T>
Ctype<inplace> _dropout_impl(T& input, double p, bool train) {
  TORCH_CHECK(p >= 0 && p <= 1, "dropout probability has to be between 0 and 1, but got ", p);
  if (p == 0 || !train || input.numel() == 0) {
    return input;
  }

  if (p == 1) {
    return multiply<inplace>(input, at::zeros({}, input.options()));
  }

  auto noise = at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
  noise.bernoulli_(1 - p);
  noise.div_(1 - p);
  return multiply<inplace>(input, noise);
}

As you can see, if !train (ie, .eval() ), it will return the input as it is. Moreover, you could say that it "remembers" which neurons were disabled the same way it "remembers" what is the value of each active neuron. Notice that the dropout layer actually works as a mask of 0s and (scaled) 1s on the output of the previous layer. It does not actually mask the neurons, although in pratice the effect is equivalent, since the neurons that generated the outputs multiplied by 0 will get no gradient, and the rest will get properly scaled gradients (because of the .div_(1-p) ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM