shouldn't the layer normalization of x = torch.tensor([[1.5,0,0,0,0]])
be [[1.5,-0.5,-0.5,-0.5]]
? according to this paper paper and the equation from the pytorch doc . But the torch.nn.LayerNorm
gives [[ 1.7320, -0.5773, -0.5773, -0.5773]]
Here is the example code:
x = torch.tensor([[1.5,.0,.0,.0]])
layerNorm = torch.nn.LayerNorm(4, elementwise_affine = False)
y1 = layerNorm(x)
mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True)
y2 = (x-mean)/torch.sqrt(var+layerNorm.eps)
where:
y1 == tensor([[ 1.7320, -0.5773, -0.5773, -0.5773]])
y2 == tensor([[ 1.5000, -0.5000, -0.5000, -0.5000]])
Yet another simplified implementation of a Layer Norm layer with bare PyTorch.
from typing import Tuple
import torch
def layer_norm(
x: torch.Tensor, dim: Tuple[int], eps: float = 0.00001
) -> torch.Tensor:
mean = torch.mean(x, dim=dim, keepdim=True)
var = torch.square(x - mean).mean(dim=dim, keepdim=True)
return (x - mean) / torch.sqrt(var + eps)
def test_that_results_match() -> None:
dims = (1, 2)
X = torch.normal(0, 1, size=(3, 3, 3))
indices = torch.tensor(dims)
normalized_shape = torch.tensor(X.size()).index_select(0, indices)
orig_layer_norm = torch.nn.LayerNorm(normalized_shape)
y = orig_layer_norm(X)
y_hat = layer_norm(X, dim=dims)
assert torch.allclose(y, y_hat)
Note, that original implementation also has trainable parameters 𝛄 and β (see the docs ).
So apparently, the code should be as:
...
var = x.mean((x-mean)**2, -1, keepdim = True)
...
hopefully this is helpful to anyone, who stumbles on this same mistake.
Here's a simple correct example:
x = torch.normal(0, 1, [5])
mean = x.sum(axis = 0)/(x.shape[0])
std = (((x - mean)**2).sum()/(x.shape[0])).sqrt()
print((x - mean)/std)
# tensor([ 0.8972, -0.1496, -1.5654, 1.2419, -0.4242])
print(nn.LayerNorm(5, elementwise_affine = False)(x.unsqueeze(0)))
# tensor([ 0.8972, -0.1496, -1.5654, 1.2419, -0.4242])
instead of
var = x.var(-1, keepdim = True)
you should use
var = x.var(-1, keepdim = True, unbiased=False)
This will produce identical result as pytorch, full code:
x = torch.tensor([[1.5,.0,.0,.0]])
layerNorm = torch.nn.LayerNorm(4, elementwise_affine = False)
y1 = layerNorm(x)
mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True, unbiased=False)
y2 = (x-mean)/torch.sqrt(var+layerNorm.eps)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.