简体   繁体   English

如何在 PyTorch 中初始化权重?

[英]How to initialize weights in PyTorch?

如何在 PyTorch 中初始化网络中的权重和偏差(例如,使用 He 或 Xavier 初始化)?

Single layer单层

To initialize the weights of a single layer, use a function from torch.nn.init .要初始化单个层的权重,请使用torch.nn.init的函数。 For instance:例如:

conv1 = torch.nn.Conv2d(...)
torch.nn.init.xavier_uniform(conv1.weight)

Alternatively, you can modify the parameters by writing to conv1.weight.data (which is a torch.Tensor ).或者,您可以通过写入conv1.weight.data (这是一个torch.Tensor )来修改参数。 Example:例子:

conv1.weight.data.fill_(0.01)

The same applies for biases:这同样适用于偏见:

conv1.bias.data.fill_(0.01)

nn.Sequential or custom nn.Module nn.Sequential或自定义nn.Module

Pass an initialization function to torch.nn.Module.apply .将初始化函数传递给torch.nn.Module.apply It will initialize the weights in the entire nn.Module recursively.它将递归地初始化整个nn.Module的权重。

apply( fn ): Applies fn recursively to every submodule (as returned by .children() ) as well as self. apply( fn ):将fn递归地应用于每个子模块(由.children()返回)以及 self. Typical use includes initializing the parameters of a model (see also torch-nn-init).典型用途包括初始化模型的参数(另请参见 torch-nn-init)。

Example:例子:

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

We compare different mode of weight-initialization using the same neural-network(NN) architecture.我们使用相同的神经网络 (NN) 架构比较了不同的权重初始化模式。

All Zeros or Ones全零或一

If you follow the principle of Occam's razor , you might think setting all the weights to 0 or 1 would be the best solution.如果您遵循奥卡姆剃刀原则,您可能会认为将所有权重设置为 0 或 1 是最佳解决方案。 This is not the case.不是这种情况。

With every weight the same, all the neurons at each layer are producing the same output.在每个权重相同的情况下,每一层的所有神经元都产生相同的输出。 This makes it hard to decide which weights to adjust.这使得很难决定要调整哪些权重。

    # initialize two NN's with 0 and 1 constant weights
    model_0 = Net(constant_weight=0)
    model_1 = Net(constant_weight=1)
  • After 2 epochs: 2个时期后:

权重初始化为常数的训练损失图

Validation Accuracy
9.625% -- All Zeros
10.050% -- All Ones
Training Loss
2.304  -- All Zeros
1552.281  -- All Ones

Uniform Initialization统一初始化

A uniform distribution has the equal probability of picking any number from a set of numbers.均匀分布具有从一组数字中选择任何数字的相等概率。

Let's see how well the neural network trains using a uniform weight initialization, where low=0.0 and high=1.0 .让我们看看使用统一权重初始化的神经网络训练效果如何,其中low=0.0high=1.0

Below, we'll see another way (besides in the Net class code) to initialize the weights of a network.下面,我们将看到另一种初始化网络权重的方法(除了在 Net 类代码中)。 To define weights outside of the model definition, we can:要在模型定义之外定义权重,我们可以:

  1. Define a function that assigns weights by the type of network layer, then定义一个按网络层类型分配权重的函数,然后
  2. Apply those weights to an initialized model using model.apply(fn) , which applies a function to each model layer.使用model.apply(fn)将这些权重应用于初始化模型,该模型将函数应用于每个模型层。
    # takes in a module and applies the specified weight initialization
    def weights_init_uniform(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # apply a uniform distribution to the weights and a bias=0
            m.weight.data.uniform_(0.0, 1.0)
            m.bias.data.fill_(0)

    model_uniform = Net()
    model_uniform.apply(weights_init_uniform)
  • After 2 epochs: 2个时期后:

在此处输入图片说明

Validation Accuracy
36.667% -- Uniform Weights
Training Loss
3.208  -- Uniform Weights

General rule for setting weights设置权重的一般规则

The general rule for setting the weights in a neural network is to set them to be close to zero without being too small.在神经网络中设置权重的一般规则是将它们设置为接近于零而不是太小。

Good practice is to start your weights in the range of [-y, y] where y=1/sqrt(n)好的做法是在 [-y, y] 范围内开始您的权重,其中y=1/sqrt(n)
(n is the number of inputs to a given neuron). (n 是给定神经元的输入数量)。

    # takes in a module and applies the specified weight initialization
    def weights_init_uniform_rule(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # get the number of the inputs
            n = m.in_features
            y = 1.0/np.sqrt(n)
            m.weight.data.uniform_(-y, y)
            m.bias.data.fill_(0)

    # create a new model with these weights
    model_rule = Net()
    model_rule.apply(weights_init_uniform_rule)

below we compare performance of NN, weights initialized with uniform distribution [-0.5,0.5) versus the one whose weight is initialized using general rule下面我们比较了 NN 的性能,权重初始化为均匀分布 [-0.5,0.5) 与权重使用一般规则初始化的那个

  • After 2 epochs: 2个时期后:

图显示了权重的统一初始化的性能与初始化的一般规则

Validation Accuracy
75.817% -- Centered Weights [-0.5, 0.5)
85.208% -- General Rule [-y, y)
Training Loss
0.705  -- Centered Weights [-0.5, 0.5)
0.469  -- General Rule [-y, y)

normal distribution to initialize the weights初始化权重的正态分布

The normal distribution should have a mean of 0 and a standard deviation of y=1/sqrt(n) , where n is the number of inputs to NN正态分布的均值为 0,标准差为y=1/sqrt(n) ,其中 n 是 NN 的输入数量

    ## takes in a module and applies the specified weight initialization
    def weights_init_normal(m):
        '''Takes in a module and initializes all linear layers with weight
           values taken from a normal distribution.'''

        classname = m.__class__.__name__
        # for every Linear layer in a model
        if classname.find('Linear') != -1:
            y = m.in_features
        # m.weight.data shoud be taken from a normal distribution
            m.weight.data.normal_(0.0,1/np.sqrt(y))
        # m.bias.data should be 0
            m.bias.data.fill_(0)

below we show the performance of two NN one initialized using uniform-distribution and the other using normal-distribution下面我们展示了两个 NN 的性能,一个使用均匀分布初始化,另一个使用正态分布初始化

  • After 2 epochs: 2个时期后:

使用均匀分布与正态分布的权重初始化性能

Validation Accuracy
85.775% -- Uniform Rule [-y, y)
84.717% -- Normal Distribution
Training Loss
0.329  -- Uniform Rule [-y, y)
0.443  -- Normal Distribution

To initialize layers you typically don't need to do anything.要初始化图层,您通常不需要做任何事情。

PyTorch will do it for you. PyTorch 会为您完成。 If you think about it, this makes a lot of sense.如果你仔细想想,这很有意义。 Why should we initialize layers, when PyTorch can do that following the latest trends.当 PyTorch 可以按照最新趋势进行初始化时,我们为什么要初始化层。

Check for instance the Linear layer .检查例如线性层

In the __init__ method it will call Kaiming He init function.__init__方法中,它将调用Kaiming He init 函数。

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(3))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

The similar is for other layers types.其他图层类型也类似。 For conv2d for instance check here .例如,对于conv2d ,请检查此处

To note : The gain of proper initialization is the faster training speed.注意:正确初始化的增益是更快的训练速度。 If your problem deserves special initialization you can do it afterwards.如果您的问题需要特殊初始化,您可以在之后进行。

import torch.nn as nn        

# a simple network
rand_net = nn.Sequential(nn.Linear(in_features, h_size),
                         nn.BatchNorm1d(h_size),
                         nn.ReLU(),
                         nn.Linear(h_size, h_size),
                         nn.BatchNorm1d(h_size),
                         nn.ReLU(),
                         nn.Linear(h_size, 1),
                         nn.ReLU())

# initialization function, first checks the module type,
# then applies the desired changes to the weights
def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.uniform_(m.weight)

# use the modules apply function to recursively apply the initialization
rand_net.apply(init_normal)

Sorry for being so late, I hope my answer will help.抱歉来晚了,希望我的回答能帮到你。

To initialise weights with a normal distribution use:要使用normal distribution初始化权重,请使用:

torch.nn.init.normal_(tensor, mean=0, std=1)

Or to use a constant distribution write:或者使用constant distribution写入:

torch.nn.init.constant_(tensor, value)

Or to use an uniform distribution :或者使用uniform distribution

torch.nn.init.uniform_(tensor, a=0, b=1) # a: lower_bound, b: upper_bound

You can check other methods to initialise tensors here您可以在此处查看初始化张量的其他方法

If you want some extra flexibility, you can also set the weights manually .如果您想要一些额外的灵活性,您还可以手动设置权重

Say you have input of all ones:假设你有所有的输入:

import torch
import torch.nn as nn

input = torch.ones((8, 8))
print(input)
tensor([[1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.]])

And you want to make a dense layer with no bias (so we can visualize):并且您想要制作一个没有偏差的密集层(以便我们可以可视化):

d = nn.Linear(8, 8, bias=False)

Set all the weights to 0.5 (or anything else):将所有权重设置为 0.5(或其他任何值):

d.weight.data = torch.full((8, 8), 0.5)
print(d.weight.data)

The weights:权重:

Out[14]: 
tensor([[0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000]])

All your weights are now 0.5.你所有的权重现在都是 0.5。 Pass the data through:通过以下方式传递数据:

d(input)
Out[13]: 
tensor([[4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.]], grad_fn=<MmBackward>)

Remember that each neuron receives 8 inputs, all of which have weight 0.5 and value of 1 (and no bias), so it sums up to 4 for each.请记住,每个神经元接收 8 个输入,所有输入的权重均为 0.5,值为 1(并且没有偏差),因此每个输入的总和为 4。

Iterate over parameters迭代参数

If you cannot use apply for instance if the model does not implement Sequential directly:例如,如果模型没有直接实现Sequential则不能使用apply

Same for all所有人都一样

# see UNet at https://github.com/milesial/Pytorch-UNet/tree/master/unet


def init_all(model, init_func, *params, **kwargs):
    for p in model.parameters():
        init_func(p, *params, **kwargs)

model = UNet(3, 10)
init_all(model, torch.nn.init.normal_, mean=0., std=1) 
# or
init_all(model, torch.nn.init.constant_, 1.) 

Depending on shape根据形状

def init_all(model, init_funcs):
    for p in model.parameters():
        init_func = init_funcs.get(len(p.shape), init_funcs["default"])
        init_func(p)

model = UNet(3, 10)
init_funcs = {
    1: lambda x: torch.nn.init.normal_(x, mean=0., std=1.), # can be bias
    2: lambda x: torch.nn.init.xavier_normal_(x, gain=1.), # can be weight
    3: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv1D filter
    4: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv2D filter
    "default": lambda x: torch.nn.init.constant(x, 1.), # everything else
}

init_all(model, init_funcs)

You can try with torch.nn.init.constant_(x, len(x.shape)) to check that they are appropriately initialized:您可以尝试使用torch.nn.init.constant_(x, len(x.shape))来检查它们是否已正确初始化:

init_funcs = {
    "default": lambda x: torch.nn.init.constant_(x, len(x.shape))
}

Cuz I haven't had the enough reputation so far, I can't add a comment under因为到目前为止我还没有足够的声誉,我无法在下面添加评论

the answer posted by prosti in Jun 26 '19 at 13:16 . prosti20196 月 26 日 13:16发布的答案。

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(3))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

But I wanna point out that actually we know some assumptions in the paper of Kaiming He , Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , are not appropriate, though it looks like the deliberately designed initialization method makes a hit in practice.但我想指出,实际上我们知道Kaiming He的论文中的一些假设, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification ,虽然看起来故意设计的初始化方法在实践中很受欢迎.

Eg, within the subsection of Backward Propagation Case , they assume that $w_l$ and $\\delta y_l$ are independent of each other.例如,在Backward Propagation Case的小节中,他们假设 $w_l$ 和 $\\delta y_l$ 彼此独立。 But as we all known, take the score map $\\delta y^L_i$ as an instance, it often is $y_i-softmax(y^L_i)=y_i-softmax(w^L_ix^L_i)$ if we use a typical cross entropy loss function objective.但是众所周知,以得分图 $\\delta y^L_i$ 为例,如果我们使用典型的,通常是 $y_i-softmax(y^L_i)=y_i-softmax(w^L_ix^L_i)交叉熵损失函数目标。

So I think the true underlying reason why He's Initialization works well remains to unravel.所以我认为他的初始化工作良好的真正根本原因仍有待解开。 Cuz everyone has witnessed its power on boosting deep learning training.因为每个人都见证了它在促进深度学习训练方面的力量。

If you see a deprecation warning (@Fábio Perez)...如果您看到弃用警告 (@Fábio Perez)...

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform_(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

Here is the better way, just pass your whole model这是更好的方法,只需通过您的整个模型

import torch.nn as nn
def initialize_weights(model):
    # Initializes weights according to the DCGAN paper
    for m in model.modules():
        if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d, nn.BatchNorm2d)):
            nn.init.normal_(m.weight.data, 0.0, 0.02)
        # if you also want for linear layers ,add one more elif condition 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM