[英]Strided convolution of 2D in numpy
I tried to implement strided convolution of a 2D array using for loop ie我尝试使用 for 循环实现二维数组的跨步卷积,即
arr = np.array([[2,3,7,4,6,2,9],
[6,6,9,8,7,4,3],
[3,4,8,3,8,9,7],
[7,8,3,6,6,3,4],
[4,2,1,8,3,4,6],
[3,2,4,1,9,8,3],
[0,1,3,9,2,1,4]])
arr2 = np.array([[3,4,4],
[1,0,2],
[-1,0,3]])
def stride_conv(arr1,arr2,s,p):
beg = 0
end = arr2.shape[0]
final = []
for i in range(0,arr1.shape[0]-1,s):
k = []
for j in range(0,arr1.shape[0]-1,s):
k.append(np.sum(arr1[beg+i : end+i, beg+j:end+j] * (arr2)))
final.append(k)
return np.array(final)
stride_conv(arr,arr2,2,0)
This results in 3*3 array:这导致 3*3 数组:
array([[ 91, 100, 88],
[ 69, 91, 117],
[ 44, 72, 74]])
Is there a numpy function or scipy function to do the same?是否有 numpy function 或 scipy function 来做同样的事情? My approach is not that good.
我的方法不是很好。 How can I vectorize this?
我怎样才能矢量化这个?
Ignoring the padding argument and trailing windows that won't have enough lengths for convolution against the second array, here's one way with np.lib.stride_tricks.as_strided
-忽略填充参数和尾随窗口,这些窗口的长度不足以与第二个数组进行卷积,这是使用
np.lib.stride_tricks.as_strided
的一种方法 -
def strided4D(arr,arr2,s):
strided = np.lib.stride_tricks.as_strided
s0,s1 = arr.strides
m1,n1 = arr.shape
m2,n2 = arr2.shape
out_shp = (1+(m1-m2)//s, m2, 1+(n1-n2)//s, n2)
return strided(arr, shape=out_shp, strides=(s*s0,s*s1,s0,s1))
def stride_conv_strided(arr,arr2,s):
arr4D = strided4D(arr,arr2,s=s)
return np.tensordot(arr4D, arr2, axes=((2,3),(0,1)))
Alternatively, we can use the scikit-image built-in view_as_windows
to get those windows elegantly , like so -或者,我们可以使用 scikit-image 内置的
view_as_windows
来优雅地获取这些窗口,就像这样 -
from skimage.util.shape import view_as_windows
def strided4D_v2(arr,arr2,s):
return view_as_windows(arr, arr2.shape, step=s)
I think we can do a "valid" fft convolution and pick out only those results at strided locations, like this:我认为我们可以做一个“有效”的 fft 卷积并只在跨步位置挑选那些结果,如下所示:
def strideConv(arr,arr2,s):
cc=scipy.signal.fftconvolve(arr,arr2[::-1,::-1],mode='valid')
idx=(np.arange(0,cc.shape[1],s), np.arange(0,cc.shape[0],s))
xidx,yidx=np.meshgrid(*idx)
return cc[yidx,xidx]
This gives same results as other people's answers.这给出了与其他人的答案相同的结果。 But I guess this only works if the kernel size is odd numbered.
但我想这仅在内核大小为奇数时才有效。
Also I've flipped the kernel in arr2[::-1,::-1]
just to stay consistent with others, you may want to omit it depending on context.此外,我在
arr2[::-1,::-1]
翻转了内核只是为了与其他人保持一致,您可能希望根据上下文省略它。
UPDATE:更新:
We currently have a few different ways of doing 2D or 3D convolution using numpy and scipy alone, and I thought about doing some comparisons to give some idea on which one is faster on data of different sizes.我们目前有几种不同的方法可以单独使用 numpy 和 scipy 进行 2D 或 3D 卷积,我考虑进行一些比较,以了解哪种方法在不同大小的数据上更快。 I hope this won't be regarded as off-topic.
我希望这不会被视为题外话。
Method 1: FFT convolution (using scipy.signal.fftconvolve
):方法 1:FFT 卷积(使用
scipy.signal.fftconvolve
):
def padArray(var,pad,method=1):
if method==1:
var_pad=numpy.zeros(tuple(2*pad+numpy.array(var.shape[:2]))+var.shape[2:])
var_pad[pad:-pad,pad:-pad]=var
else:
var_pad=numpy.pad(var,([pad,pad],[pad,pad])+([0,0],)*(numpy.ndim(var)-2),
mode='constant',constant_values=0)
return var_pad
def conv3D(var,kernel,stride=1,pad=0,pad_method=1):
'''3D convolution using scipy.signal.convolve.
'''
var_ndim=numpy.ndim(var)
kernel_ndim=numpy.ndim(kernel)
stride=int(stride)
if var_ndim<2 or var_ndim>3 or kernel_ndim<2 or kernel_ndim>3:
raise Exception("<var> and <kernel> dimension should be in 2 or 3.")
if var_ndim==2 and kernel_ndim==3:
raise Exception("<kernel> dimension > <var>.")
if var_ndim==3 and kernel_ndim==2:
kernel=numpy.repeat(kernel[:,:,None],var.shape[2],axis=2)
if pad>0:
var_pad=padArray(var,pad,pad_method)
else:
var_pad=var
conv=fftconvolve(var_pad,kernel,mode='valid')
if stride>1:
conv=conv[::stride,::stride,...]
return conv
Method 2: Special conv (see this anwser ):方法 2:特殊转换(请参阅此 anwser ):
def conv3D2(var,kernel,stride=1,pad=0):
'''3D convolution by sub-matrix summing.
'''
var_ndim=numpy.ndim(var)
ny,nx=var.shape[:2]
ky,kx=kernel.shape[:2]
result=0
if pad>0:
var_pad=padArray(var,pad,1)
else:
var_pad=var
for ii in range(ky*kx):
yi,xi=divmod(ii,kx)
slabii=var_pad[yi:2*pad+ny-ky+yi+1:1, xi:2*pad+nx-kx+xi+1:1,...]*kernel[yi,xi]
if var_ndim==3:
slabii=slabii.sum(axis=-1)
result+=slabii
if stride>1:
result=result[::stride,::stride,...]
return result
Method 3: Strided-view conv, as suggested by Divakar:方法 3:跨步视图转换,如 Divakar 所建议的:
def asStride(arr,sub_shape,stride):
'''Get a strided sub-matrices view of an ndarray.
<arr>: ndarray of rank 2.
<sub_shape>: tuple of length 2, window size: (ny, nx).
<stride>: int, stride of windows.
Return <subs>: strided window view.
See also skimage.util.shape.view_as_windows()
'''
s0,s1=arr.strides[:2]
m1,n1=arr.shape[:2]
m2,n2=sub_shape[:2]
view_shape=(1+(m1-m2)//stride,1+(n1-n2)//stride,m2,n2)+arr.shape[2:]
strides=(stride*s0,stride*s1,s0,s1)+arr.strides[2:]
subs=numpy.lib.stride_tricks.as_strided(arr,view_shape,strides=strides)
return subs
def conv3D3(var,kernel,stride=1,pad=0):
'''3D convolution by strided view.
'''
var_ndim=numpy.ndim(var)
kernel_ndim=numpy.ndim(kernel)
if var_ndim<2 or var_ndim>3 or kernel_ndim<2 or kernel_ndim>3:
raise Exception("<var> and <kernel> dimension should be in 2 or 3.")
if var_ndim==2 and kernel_ndim==3:
raise Exception("<kernel> dimension > <var>.")
if var_ndim==3 and kernel_ndim==2:
kernel=numpy.repeat(kernel[:,:,None],var.shape[2],axis=2)
if pad>0:
var_pad=padArray(var,pad,1)
else:
var_pad=var
view=asStride(var_pad,kernel.shape,stride)
#return numpy.tensordot(aa,kernel,axes=((2,3),(0,1)))
if numpy.ndim(kernel)==2:
conv=numpy.sum(view*kernel,axis=(2,3))
else:
conv=numpy.sum(view*kernel,axis=(2,3,4))
return conv
I did 3 sets of comparisons:我做了3组比较:
So "FFT conv" is in general the fastest.所以“FFT conv”通常是最快的。 "Special conv" and "Stride-view conv" get slow as kernel size increases, but decreases again as it approaches the size of input data.
“Special conv”和“Stride-view conv”随着内核大小的增加而变慢,但随着接近输入数据的大小而再次减小。 The last subplot shows the fastest method, so the big triangle of purple indicates FFT being the winner, but note there is a thin green column on the left side (probably too small to see, but it's there), suggesting that "Special conv" has advantage for very small kernels (smaller than about 5x5).
最后一个子图显示了最快的方法,所以紫色的大三角形表示 FFT 是赢家,但请注意左侧有一个细的绿色列(可能太小看不到,但它在那里),表明“特殊转换”对于非常小的内核(小于大约 5x5)具有优势。 And when kernel size approaches input, "stride-view conv" is fastest (see the diagonal line).
当内核大小接近输入时,“stride-view conv”是最快的(见对角线)。
Comparison 2: convolution on 3D data.比较 2:对 3D 数据进行卷积。
Setup: pad=0, stride=2, input dimension= nxnx5
, kernel shape= fxfx5
.设置:pad=0,stride=2,输入维度=
nxnx5
,内核形状= fxfx5
。
I skipped computations of "Special Conv" and "Stride-view conv" when kernel size is in the mid of input.当内核大小在输入的中间时,我跳过了“Special Conv”和“Stride-view conv”的计算。 Basically "Special Conv" shows no advantage now, and "Stride-view" is faster than FFT for both small and large kernels.
基本上“Special Conv”现在没有任何优势,“Stride-view”对于小内核和大内核都比 FFT 快。
One additional note: when sizes goes above 350, I notice considerable memory usage peaks for the "Stride-view conv".另一个注意事项:当大小超过 350 时,我注意到“跨步视图转换”的内存使用量达到了可观的峰值。
Comparison 3: convolution on 3D data with larger stride.比较 3:对 3D 数据进行更大步幅的卷积。
Setup: pad=0, stride=5, input dimension= nxnx10
, kernel shape= fxfx10
.设置:pad=0,stride=5,输入维度=
nxnx10
,内核形状= fxfx10
。
This time I omitted the "Special Conv".这次我省略了“特别会议”。 For a larger area "Stride-view conv" surpasses FFT, and last subplots shows that the difference approaches 100 %.
对于更大的区域,“Stride-view conv”超过了 FFT,最后的子图显示差异接近 100%。 Probably because as the stride goes up, the FFT approach will have more wasted numbers so the "stride-view" gains more advantages for small and large kernels.
可能是因为随着步幅的增加,FFT 方法会浪费更多的数字,因此“步幅视图”对于小型和大型内核获得更多优势。
How about using signal.convolve2d
from scipy
?如何使用
signal.convolve2d
从scipy
?
My approach is similar to Jason's one but using indexing.我的方法类似于 Jason 的方法,但使用索引。
def strideConv(arr, arr2, s):
return signal.convolve2d(arr, arr2[::-1, ::-1], mode='valid')[::s, ::s]
Note that the kernal has to be reversed.请注意,内核必须反转。 For details, please see discussion here and here .
有关详细信息,请参阅此处和此处的讨论。 Otherwise use
signal.correlate2d
.否则使用
signal.correlate2d
。
Examples:例子:
>>> strideConv(arr, arr2, 1)
array([[ 91, 80, 100, 84, 88],
[ 99, 106, 126, 92, 77],
[ 69, 98, 91, 93, 117],
[ 80, 79, 87, 93, 61],
[ 44, 72, 72, 63, 74]])
>>> strideConv(arr, arr2, 2)
array([[ 91, 100, 88],
[ 69, 91, 117],
[ 44, 72, 74]])
Here is an O(N^d (log N)^d) fft-based approach.这是一种基于 O(N^d (log N)^d) 的基于 fft 的方法。 The idea is to chop up both operands into strides-spaced grids at all offsets modulo strides, do the conventional fft convolution between grids of corresponding offsets and then pointwise sum the results.
这个想法是在所有偏移模步长处将两个操作数切成步幅间隔的网格,在相应偏移量的网格之间进行常规的 fft 卷积,然后对结果进行逐点求和。 It is a bit index-heavy but I'm afraid that can't be helped:
它有点索引重,但恐怕无济于事:
import numpy as np
from numpy.fft import fftn, ifftn
def strided_conv_2d(x, y, strides):
s, t = strides
# consensus dtype
cdt = (x[0, 0, ...] + y[0, 0, ...]).dtype
xi, xj = x.shape
yi, yj = y.shape
# round up modulo strides
xk, xl, yk, yl = map(lambda a, b: -a//b * -b, (xi,xj,yi,yj), (s,t,s,t))
# zero pad to avoid circular convolution
xp, yp = (np.zeros((xk+yk, xl+yl), dtype=cdt) for i in range(2))
xp[:xi, :xj] = x
yp[:yi, :yj] = y
# fold out strides
xp = xp.reshape((xk+yk)//s, s, (xl+yl)//t, t)
yp = yp.reshape((xk+yk)//s, s, (xl+yl)//t, t)
# do conventional fft convolution
xf = fftn(xp, axes=(0, 2))
yf = fftn(yp, axes=(0, 2))
result = ifftn(xf * yf.conj(), axes=(0, 2)).sum(axis=(1, 3))
# restore dtype
if cdt in (int, np.int_, np.int64, np.int32):
result = result.real.round()
return result.astype(cdt)
arr = np.array([[2,3,7,4,6,2,9],
[6,6,9,8,7,4,3],
[3,4,8,3,8,9,7],
[7,8,3,6,6,3,4],
[4,2,1,8,3,4,6],
[3,2,4,1,9,8,3],
[0,1,3,9,2,1,4]])
arr2 = np.array([[3,4,4],
[1,0,2],
[-1,0,3]])
print(strided_conv_2d(arr, arr2, (2, 2)))
Result:结果:
[[ 91 100 88 23 0 29]
[ 69 91 117 19 0 38]
[ 44 72 74 17 0 22]
[ 16 53 26 12 0 0]
[ 0 0 0 0 0 0]
[ 19 11 21 -9 0 6]]
As far as I know, there is no direct implementation of convolution filter in numpy or scipy that supports stride and padding so I think it's better to use a DL package such as torch or tensorflow, then cast the final result to numpy.据我所知,在支持 stride 和 padding 的 numpy 或 scipy 中没有直接实现卷积过滤器,所以我认为最好使用诸如 torch 或 tensorflow 之类的 DL 包,然后将最终结果转换为 numpy。 a torch implementation might be:
火炬实施可能是:
import torch
import torch.nn.functional as F
arr = torch.tensor(np.expand_dims(arr, axis=(0,1))
arr2 = torch.tensor(np.expand_dims(arr2, axis=(0,1))
output = F.conv2d(arr, arr2, stride=2, padding=0)
output = output.numpy().squeeze()
output>
array([[ 91, 100, 88],
[ 69, 91, 117],
[ 44, 72, 74]])
Convolution which supports strides and dilation.支持步幅和扩张的卷积。
numpy.lib.stride_tricks.as_strided
is used.使用
numpy.lib.stride_tricks.as_strided
。
import numpy as np
from numpy.lib.stride_tricks import as_strided
def conv_view(X, F_s, dr, std):
X_s = np.array(X.shape)
F_s = np.array(F_s)
dr = np.array(dr)
Fd_s = (F_s - 1) * dr + 1
if np.any(Fd_s > X_s):
raise ValueError('(Dilated) filter size must be smaller than X')
std = np.array(std)
X_ss = np.array(X.strides)
Xn_s = (X_s - Fd_s) // std + 1
Xv_s = np.append(Xn_s, F_s)
Xv_ss = np.tile(X_ss, 2) * np.append(std, dr)
return as_strided(X, Xv_s, Xv_ss, writeable=False)
def convolve_stride(X, F, dr=None, std=None):
if dr is None:
dr = np.ones(X.ndim, dtype=int)
if std is None:
std = np.ones(X.ndim, dtype=int)
if not (X.ndim == F.ndim == len(dr) == len(std)):
raise ValueError('X.ndim, F.ndim, len(dr), len(std) must be the same')
Xv = conv_view(X, F.shape, dr, std)
return np.tensordot(Xv, F, axes=X.ndim)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.