For each layer in neural networks (pytorch), how many biases should be there?

Question

I have a simple model in pytorch.

model = Network()

It's details are:

Network(
  (hidden): Linear(in_features=784, out_features=256, bias=True)
  (output): Linear(in_features=256, out_features=10, bias=True)
  (sigmoid): Sigmoid()
  (softmax): Softmax(dim=1)
)

There are 3 neurons' layers in total. 1 input(786 neurons), 1 hidden(256 neurons) and 1 output (10 neurons). Therefore there'll be two weight layers. So there must be two biases (simply two floating point numbers) for both weight layers right? (correct me if i am wrong).

Now after initializing my network i was curious about the two bias values. So i wanted to check the bias value of my hidden layer so i wrote:

model.hidden.bias

And what i got as the result was not i expected! I actually expected one value! And this is what i actually got:

tensor([-1.6868e-02, -3.5661e-02,  1.2489e-02, -2.7880e-02,  1.4025e-02,
        -2.6085e-02,  1.2625e-02, -3.1748e-02,  5.0335e-03,  3.8031e-03,
        -3.1648e-02, -3.4881e-02, -2.0026e-02,  1.9728e-02,  6.2461e-03,
         9.3936e-04, -5.9270e-03, -2.7183e-02, -1.9850e-02, -3.5693e-02,
        -1.9393e-02,  2.6555e-02,  2.3482e-02,  2.1230e-02, -2.2175e-02,
        -2.4386e-02,  3.4848e-02, -2.6044e-02,  1.3575e-02,  9.4125e-03,
         3.0012e-02, -2.6078e-02,  7.1615e-05, -1.7061e-02,  6.6355e-03,
        -3.4966e-02,  2.9311e-02,  1.4060e-02, -2.5763e-02, -1.4020e-02,
         2.9852e-02, -7.9176e-03, -1.8396e-02,  1.6927e-02, -1.1001e-03,
         1.5595e-02,  1.2169e-02, -1.2275e-02, -2.9270e-03, -6.5685e-04,
        -2.4297e-02,  3.0048e-02,  2.9692e-03, -2.5398e-02,  2.9955e-03,
        -9.3653e-04, -1.2932e-02,  2.4232e-02, -3.5182e-02, -1.6163e-02,
         3.0025e-02,  3.1227e-02, -8.2498e-04,  2.7102e-02, -2.3830e-02,
        -3.4958e-02, -1.1886e-02,  1.6097e-02,  1.4579e-02, -2.6744e-02,
         1.1900e-02, -3.4855e-02, -4.2208e-03, -5.2035e-03,  1.7055e-02,
        -4.8580e-03,  3.4088e-03,  1.6923e-02,  3.5570e-04, -3.0478e-02,
         8.4647e-03,  2.5704e-02, -2.3255e-02,  6.9396e-03, -1.2521e-03,
        -9.4101e-03, -2.5798e-02, -1.4438e-03, -7.2684e-03,  3.5417e-02,
        -3.4388e-02,  1.3706e-02, -5.1430e-03,  1.6174e-02,  1.8135e-03,
        -2.9018e-02, -2.9083e-02,  7.4100e-03, -2.7758e-02,  2.4367e-02,
        -3.8350e-03,  9.4390e-03, -1.0844e-02,  1.6381e-02, -2.5268e-02,
         1.3553e-02, -1.0545e-02, -1.3782e-02,  2.8519e-02,  2.3630e-02,
        -1.9703e-02, -2.0147e-02, -1.0485e-02,  2.4637e-02,  1.9989e-02,
         5.6601e-03,  1.9121e-02, -1.5286e-02,  2.5996e-02, -2.9833e-02,
        -2.9458e-02,  2.3944e-02, -3.0107e-02, -1.2307e-02, -1.8419e-02,
         3.3551e-02,  1.2396e-02,  2.9356e-02,  3.3274e-02,  5.4677e-03,
         3.1715e-02,  1.3361e-02,  3.3042e-02,  2.7843e-03,  2.2837e-02,
        -3.4981e-02,  3.2355e-02, -2.7658e-03,  2.2184e-02, -2.0203e-02,
        -3.3264e-02, -3.4858e-02,  1.0820e-03, -1.4279e-02, -2.8041e-02,
         4.1962e-03,  2.4266e-02, -3.5704e-02, -2.6172e-02,  2.3335e-02,
         2.0657e-02, -3.0387e-03, -5.7096e-03, -1.1062e-02,  1.3450e-02,
        -3.3965e-02,  1.9623e-03, -2.0067e-02, -3.3858e-02, -2.1931e-02,
        -1.5414e-02,  2.4454e-02,  2.5668e-02, -1.1932e-02,  5.7540e-04,
         1.5130e-02,  1.3916e-02, -2.1521e-02, -3.0575e-02,  1.8841e-02,
        -2.3240e-02, -2.7297e-02, -3.2668e-02, -1.5544e-02, -5.9408e-03,
         3.0241e-02,  2.2039e-02, -2.4389e-02,  3.1703e-02,  3.5305e-02,
        -2.7501e-03,  2.0154e-02, -5.3489e-03,  1.4177e-02,  1.6829e-02,
         3.3066e-02, -1.3425e-02, -3.2565e-02,  6.5624e-03, -1.5681e-02,
         2.3047e-02,  6.5880e-03, -3.3803e-02,  2.3790e-02, -5.5061e-03,
         2.9413e-02,  1.2290e-02, -1.0958e-02,  1.2680e-03,  1.3343e-02,
         6.6689e-03, -2.2975e-03, -1.2068e-02,  1.6523e-02, -3.1612e-02,
        -1.7529e-02, -2.2220e-02, -1.4723e-02, -1.3495e-02, -5.1805e-03,
        -2.9620e-02,  3.0571e-02, -3.0999e-02,  3.3681e-03,  1.3579e-02,
         1.4837e-02,  1.5694e-02, -1.1178e-02,  4.6233e-03, -2.2583e-02,
        -3.5281e-03,  3.0918e-02,  2.6407e-02,  1.5822e-04, -3.0181e-03,
         8.6989e-03,  2.8998e-02, -1.5975e-02, -3.1574e-02, -1.5609e-02,
         1.0472e-02,  5.8976e-03,  7.0131e-03, -3.2047e-02,  2.6045e-02,
        -2.8882e-02, -2.2121e-02, -3.2960e-02,  1.8268e-02,  3.0984e-02,
         1.4824e-02,  3.0010e-02, -5.7523e-03, -2.0017e-02,  4.8700e-03,
         1.4997e-02, -1.4898e-02,  6.8572e-03,  9.7713e-03,  1.3410e-02,
         4.9619e-03,  3.1016e-02,  3.1240e-02, -3.0203e-02,  2.1435e-02,
         2.7331e-02], requires_grad=True)

Can someone explain to me this behaviour? Why did i get 256 values instead of one?

Edit1:

Here is my understanding of the layers: For a whole layer of neurons, a bias is just a single value. Am i right? But what i am seeing as the output about are 256 values ? why ? did pytorch assume that i have a bias with each neuron ? is that okay?

Answer 1

So first it's important to realize what's going on inside of one of these layers. When you write:

Linear(in_features=784, out_features=256, bias=True)

You are modeling a linear relationship between the input and the output. You're probably familiar with this from basic math:

Y = MX + B

However instead of a "slope" and a "y-intercept", you have a weights matrix and a bias term. This is still a linear relationship, but with matrices as our input and output.

Y is our output, M is our weights matrix, X is our input, and B is our Bias. You define that the input is a (N x 784) matrix, and our output is a (N x 256) matrix (N is the number of samples).

If you're familiar with matrix multiplication this means that our weights matrix is (784 X 256). The output of MX will be a (N x 256) matrix, so our bias term must also be (N x 256) for the MX + B to work out.

In general the number of values in the bias term will be the same as the number of out_features .

Answer 2

Check this out:

from torchvision.models import resnet18
model = resnet18(pretrained=False)    

for name, param in model.named_parameters():
    if param.requires_grad:
        print (name)

This will give you a huge list like this:

conv1.weight
bn1.weight
bn1.bias
layer1.0.conv1.weight
layer1.0.bn1.weight
layer1.0.bn1.bias
layer1.0.conv2.weight
layer1.0.bn2.weight
layer1.0.bn2.bias
layer1.1.conv1.weight
layer1.1.bn1.weight
layer1.1.bn1.bias
layer1.1.conv2.weight
layer1.1.bn2.weight
layer1.1.bn2.bias
layer2.0.conv1.weight
layer2.0.bn1.weight
layer2.0.bn1.bias
layer2.0.conv2.weight
layer2.0.bn2.weight
layer2.0.bn2.bias
layer2.0.downsample.0.weight
layer2.0.downsample.1.weight
layer2.0.downsample.1.bias
layer2.1.conv1.weight
layer2.1.bn1.weight
layer2.1.bn1.bias
layer2.1.conv2.weight
layer2.1.bn2.weight
layer2.1.bn2.bias
layer3.0.conv1.weight
layer3.0.bn1.weight
layer3.0.bn1.bias
layer3.0.conv2.weight
layer3.0.bn2.weight
layer3.0.bn2.bias
layer3.0.downsample.0.weight
layer3.0.downsample.1.weight
layer3.0.downsample.1.bias
layer3.1.conv1.weight
layer3.1.bn1.weight
layer3.1.bn1.bias
layer3.1.conv2.weight
layer3.1.bn2.weight
layer3.1.bn2.bias
layer4.0.conv1.weight
layer4.0.bn1.weight
layer4.0.bn1.bias
layer4.0.conv2.weight
layer4.0.bn2.weight
layer4.0.bn2.bias
layer4.0.downsample.0.weight
layer4.0.downsample.1.weight
layer4.0.downsample.1.bias
layer4.1.conv1.weight
layer4.1.bn1.weight
layer4.1.bn1.bias
layer4.1.conv2.weight
layer4.1.bn2.weight
layer4.1.bn2.bias
fc.weight
fc.bias

And you will know all the parameters and get the bases, however if you print the model you will get:

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)

The information where bias is set to True or False , meaning if they will be actually used or not. You can check the last also by modifying the first piece of code, but hope this will be helpful.

For each layer in neural networks (pytorch), how many biases should be there?

Question

2 answers

solution1
2 ACCPTED 2019-09-07 19:21:29

solution2
0 2019-09-08 11:50:39

For each layer in neural networks (pytorch), how many biases should be there?

Question

2 answers

solution1 2 ACCPTED 2019-09-07 19:21:29

solution2 0 2019-09-08 11:50:39

solution1
2 ACCPTED 2019-09-07 19:21:29

solution2
0 2019-09-08 11:50:39