layer Normalization in pytorch?

Question

shouldn't the layer normalization of x = torch.tensor([[1.5,0,0,0,0]]) be [[1.5,-0.5,-0.5,-0.5]] ? according to this paper paper and the equation from the pytorch doc . But the torch.nn.LayerNorm gives [[ 1.7320, -0.5773, -0.5773, -0.5773]]

Here is the example code:

x = torch.tensor([[1.5,.0,.0,.0]])
layerNorm = torch.nn.LayerNorm(4, elementwise_affine = False)

y1 = layerNorm(x)

mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True)
y2 = (x-mean)/torch.sqrt(var+layerNorm.eps)

where:

y1 == tensor([[ 1.7320, -0.5773, -0.5773, -0.5773]])
y2 == tensor([[ 1.5000, -0.5000, -0.5000, -0.5000]])

Answer 1

Yet another simplified implementation of a Layer Norm layer with bare PyTorch.

from typing import Tuple
import torch


def layer_norm(
    x: torch.Tensor, dim: Tuple[int], eps: float = 0.00001
) -> torch.Tensor:
    mean = torch.mean(x, dim=dim, keepdim=True)
    var = torch.square(x - mean).mean(dim=dim, keepdim=True)
    return (x - mean) / torch.sqrt(var + eps)


def test_that_results_match() -> None:
    dims = (1, 2)
    X = torch.normal(0, 1, size=(3, 3, 3))

    indices = torch.tensor(dims)
    normalized_shape = torch.tensor(X.size()).index_select(0, indices)
    orig_layer_norm = torch.nn.LayerNorm(normalized_shape)

    y = orig_layer_norm(X)
    y_hat = layer_norm(X, dim=dims)

    assert torch.allclose(y, y_hat)

Note, that original implementation also has trainable parameters 𝛄 and β (see the docs ).

Answer 2

So apparently, the code should be as:

...
var = x.mean((x-mean)**2, -1, keepdim = True)
...

hopefully this is helpful to anyone, who stumbles on this same mistake.

Answer 3

Here's a simple correct example:

x = torch.normal(0, 1, [5])
mean = x.sum(axis = 0)/(x.shape[0])
std = (((x - mean)**2).sum()/(x.shape[0])).sqrt()

print((x - mean)/std)
# tensor([ 0.8972, -0.1496, -1.5654,  1.2419, -0.4242])

print(nn.LayerNorm(5, elementwise_affine = False)(x.unsqueeze(0)))
# tensor([ 0.8972, -0.1496, -1.5654,  1.2419, -0.4242])

Answer 4

instead of

var = x.var(-1, keepdim = True)

you should use

var = x.var(-1, keepdim = True, unbiased=False)

This will produce identical result as pytorch, full code:

x = torch.tensor([[1.5,.0,.0,.0]])
layerNorm = torch.nn.LayerNorm(4, elementwise_affine = False)
y1 = layerNorm(x)
mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True, unbiased=False)
y2 = (x-mean)/torch.sqrt(var+layerNorm.eps)

layer Normalization in pytorch?

Question

4 answers

solution1
2 2020-08-31 19:28:35

solution2
1 2020-01-20 20:37:54

solution3
1 2020-07-01 15:14:41

solution4
0 2021-12-02 03:11:09

layer Normalization in pytorch?

Question

4 answers

solution1 2 2020-08-31 19:28:35

solution2 1 2020-01-20 20:37:54

solution3 1 2020-07-01 15:14:41

solution4 0 2021-12-02 03:11:09

solution1
2 2020-08-31 19:28:35

solution2
1 2020-01-20 20:37:54

solution3
1 2020-07-01 15:14:41

solution4
0 2021-12-02 03:11:09