简体   繁体   English

nn.Linear Output 同一个输入不同

[英]Output of nn.Linear is different for the same input

In torch==1.7.1+cu101 , I have two tensorstorch==1.7.1+cu101中,我有两个张量

import torch
a = torch.rand(1,5,10)
b = torch.rand(100,1,10)

and a feed-forward.network和一个前馈网络

import torch.nn as nn
l = nn.Linear(10,10)

I force one row of them to be equal:我强制其中一排相等:

a[0,0] = b[80][0].clone()

Then I feed both tensors to l :然后我将两个张量都提供给l

r1 = l(a)
r2 = l(b)

Apparently, since a[0,0] is equal to b[80,0] , r1[0,0] must be equal to r2[80,0] .显然,由于a[0,0]等于b[80,0]r1[0,0]必须等于r2[80,0] But it turns out to be like:但事实证明是这样的:

(r1[0,0] == r2[80,0]).all()
>>> False

I've fixed the randomness by:我已经通过以下方式修复了随机性:

seed = 42

random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = True

Does anyone know why (r1[0,0] == r2[80,0]).all() is False?有谁知道为什么(r1[0,0] == r2[80,0]).all()是假的?

If you were to print r1[0, 0] and r2[80, 0] , you'd see that they are extremely similar.如果您要打印r1[0, 0]r2[80, 0] ,您会发现它们非常相似。 Even identical within the number of printed digits.即使在打印位数内也相同。

However, if you print r1[0, 0] - r2[80, 0] you'll see that the resulting entries are not perfectly 0.0 (though they are close to it), meaning that r1[0, 0] and r2[80, 0] are close but not perfectly identical.但是,如果您打印r1[0, 0] - r2[80, 0]您会看到结果条目并不完全是0.0 (尽管它们接近它),这意味着r1[0, 0]r2[80, 0]接近但不完全相同。

Now, if we were to do take those individual vectors out first, and then pass them through the linear layer like this:现在,如果我们要先取出那些单独的向量,然后像这样将它们传递给线性层:

r1_small = l(a[0, 0])
r2_small = l(b[80, 0])
print((r1_small == r2_small).all())  # tensor(True)

we get that they are perfectly identical, even despite being floats.我们知道它们完全相同,即使是浮点数。

So, that means that some difference is introduced by the identical vectors being smaller parts of bigger tensors when they are passed through the linear layer .因此,这意味着当相同的向量通过线性层时,它们是较大张量的较小部分,从而引入了一些差异。

It's also very much worth noting that the same difference does not arise when the first n-1 dimensions are all powers of 2:同样非常值得注意的是,当前 n-1 个维度都是 2 的幂时,不会出现相同的差异:

a2 = torch.randn(8, 8, 10)
b2 = torch.randn(4, 16, 10)
a2[0, 0] = b2[1, 0].clone()
r1_2 = l(a2)
r2_2 = l(b2)
print((r1_2[0, 0] == r2_2[1, 0]).all())  # tensor(True)

So, while I don't know the details, I suspect it has to do with byte alignment .所以,虽然我不知道细节,但我怀疑它与字节 alignment 有关

In general, testing for perfect equality between float values that should be mathematically equal will not always give the expected results.通常,测试在数学上应该相等的浮点值之间的完全相等并不总是会给出预期的结果。 So how do we handle these differences?那么我们如何处理这些差异呢? You can use torch.isclose or torch.allclose to check for imperfect equality.您可以使用torch.isclosetorch.allclose来检查不完全相等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM