[英]How does Pytorch's "Fold" and "Unfold" work?
I've gone through the official doc .我已经阅读了官方文档。 I'm having a hard time understanding what this function is used for and how it works.
我很难理解这个 function 的用途以及它是如何工作的。 Can someone explain this in layman's terms?
有人可以用外行的话解释一下吗?
unfold
and fold
are used to facilitate "sliding window" operation (like convolutions). unfold
和fold
用于促进“滑动窗口”操作(如卷积)。
Suppose you want to apply a function foo
to every 5x5 window in a feature map/image:假设您想将函数
foo
应用于特征图/图像中的每个 5x5 窗口:
from torch.nn import functional as f
windows = f.unfold(x, kernel_size=5)
Now windows
has size
of batch-(5*5* x.size(1)
)-num_windows, you can apply foo
on windows
:现在,
windows
有size
分批(5 * 5 * x.size(1)
)-num_windows,你可以申请foo
的windows
:
processed = foo(windows)
Now you need to "fold" processed
back to the original size of x
:现在您需要“折叠”
processed
回x
的原始大小:
out = f.fold(processed, x.shape[-2:], kernel_size=5)
You need to take care of padding
, and kernel_size
that may affect your ability to "fold" back processed
to the size of x
.您需要注意
padding
和kernel_size
,它们可能会影响您“折叠” processed
到x
大小的能力。
Moreover, fold
sums over overlapping elements, so you might want to divide the output of fold
by patch size.此外,在重叠元素上
fold
总和,因此您可能希望将fold
的输出除以补丁大小。
unfold
imagines a tensor as a longer tensor with repeated columns/rows of values 'folded' on top of each other, which is then "unfolded": unfold
将张量想象成一个较长的张量,其中重复的列/行值“折叠”在彼此的顶部,然后“展开”:
size
determines how large the folds are size
决定了折叠的大小step
determines how often it is folded step
决定折叠的频率Eg for a 2x5 tensor, unfolding it with step=1
, and patch size=2
across dim=1
:例如,对于一个 2x5 张量,用
step=1
展开它,在dim=1
patch size=2
展开它:
x = torch.tensor([[1,2,3,4,5],
[6,7,8,9,10]])
>>> x.unfold(1,2,1)
tensor([[[ 1, 2], [ 2, 3], [ 3, 4], [ 4, 5]],
[[ 6, 7], [ 7, 8], [ 8, 9], [ 9, 10]]])
fold
is roughly the opposite of this operation, but "overlapping" values are summed in the output. fold
与此操作大致相反,但“重叠”值在输出中求和。
x = torch.arange(1, 9).float()
print(x)
# dimension, size, step
print(x.unfold(0, 2, 1))
print(x.unfold(0, 3, 2))
Out:出去:
tensor([1., 2., 3., 4., 5., 6., 7., 8.])
tensor([[1., 2.],
[2., 3.],
[3., 4.],
[4., 5.],
[5., 6.],
[6., 7.],
[7., 8.]])
tensor([[1., 2., 3.],
[3., 4., 5.],
[5., 6., 7.]])
import torch
patch=(3,3)
x=torch.arange(16).float()
print(x, x.shape)
x2d = x.reshape(1,1,4,4)
print(x2d, x2d.shape)
h,w = patch
c=x2d.size(1)
print(c) # channels
# unfold(dimension, size, step)
r = x2d.unfold(2,h,1).unfold(3,w,1).transpose(1,3).reshape(-1, c, h, w)
print(r.shape)
print(r) # result
tensor([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13.,
14., 15.]) torch.Size([16])
tensor([[[[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.]]]]) torch.Size([1, 1, 4, 4])
1
torch.Size([4, 1, 3, 3])
tensor([[[[ 0., 1., 2.],
[ 4., 5., 6.],
[ 8., 9., 10.]]],
[[[ 4., 5., 6.],
[ 8., 9., 10.],
[12., 13., 14.]]],
[[[ 1., 2., 3.],
[ 5., 6., 7.],
[ 9., 10., 11.]]],
[[[ 5., 6., 7.],
[ 9., 10., 11.],
[13., 14., 15.]]]])
Since there are no answers with 4-D tensors and nn.functional.unfold() only accepts 4-D tensor, I will would to explain this.由于 4-D 张量没有答案,而 nn.functional.unfold() 只接受 4-D 张量,我将对此进行解释。
Assuming the input tensor is of shape (batch_size, channels, height, width)
, and I have taken an example where batch_size = 1, channels = 2, height = 3, width = 3
.假设输入张量的形状为
(batch_size, channels, height, width)
,我举了一个例子,其中batch_size = 1, channels = 2, height = 3, width = 3
。
kernel_size = 2
which is nothing but a 2x2 kernel kernel_size = 2
这不过是一个 2x2 kernel
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.