简体   繁体   English

加速 PyTorch 中的一维卷积

[英]speeding up 1d convolution in PyTorch

For my project I am using pytorch as a linear algebra backend.对于我的项目,我使用pytorch作为线性代数后端。 For the performance part of my code, I need to do 1D convolutions of 2 small (length between 2 and 9) vectors (1D tensors) a very large number of times.对于代码的性能部分,我需要对 2 个小(长度在 2 到 9 之间)向量(一维张量)进行多次一维卷积。 My code allows for batch-processing of inputs and thus I can stack a couple of input vectors to create matrices that can then be convolved all at the same time.我的代码允许对输入进行批处理,因此我可以堆叠几个输入向量来创建可以同时进行卷积的矩阵。 Since torch.conv1d does not allow for convolving along a single dimension for 2D inputs, I had to write my own convolution function called convolve .由于torch.conv1d不允许沿单个维度对 2D 输入进行卷积,因此我不得不编写自己的卷积 function,称为convolve This new function however consists of a double for-loop and is therefore very very slow.然而,这个新的 function 包含一个双 for 循环,因此非常非常慢。

Question: how can I make the convolve function perform faster through better code-design and let it be able to deal with batched inputs (=2D tensors)?问题:如何通过更好的代码设计使convolve function 执行得更快,并让它能够处理批量输入(=2D 张量)?

Partial answer: somehow avoid the double for-loop部分答案:以某种方式避免双重 for 循环

Below are three jupyter notebook cells that recreate a minimal example.下面是三个重新创建最小示例的 jupyter notebook 单元格。 Note that the you need line_profiler and the %%writefile magic command to make this work!请注意,您需要line_profiler%%writefile魔术命令才能完成这项工作!

%%writefile SO_CONVOLVE_QUESTION.py
import torch

def conv1d(a, v):
    padding = v.shape[-1] - 1
    return torch.conv1d(
        input=a.view(1, 1, -1), weight=v.flip(0).view(1, 1, -1), padding=padding, stride=1
    ).squeeze()

def convolve(a, v):
    if a.ndim == 1:
        a = a.view(1, -1)
        v = v.view(1, -1) 

    nrows, vcols = v.shape
    acols = a.shape[1]

    expanded = a.view((nrows, acols, 1)) * v.view((nrows, 1, vcols))
    noutdim = vcols + acols - 1
    out = torch.zeros((nrows, noutdim))
    for i in range(acols):  
        for j in range(vcols):
            out[:, i+j] += expanded[:, i, j]  
    return out.squeeze()
    
x = torch.randn(5)
y = torch.randn(7)

I write the code to the SO_CONVOLVE_QUESTION.py because that is necessary for line_profiler and to use as a setup for timeit.timeit .我将代码写入SO_CONVOLVE_QUESTION.py ,因为这是 line_profiler 所必需的,并且用作line_profilertimeit.timeit

Now we can evaluate the output and performance of the code above on non-batch input ( x, y ) and batched input ( x_batch, y_batch ):现在我们可以评估 output 以及上述代码在非批处理输入 ( x, y ) 和批处理输入 ( x_batch, y_batch ) 上的性能:

from SO_CONVOLVE_QUESTION import *
# Without batch processing
res1 = conv1d(x, y)
res = convolve(x, y)
print(torch.allclose(res1, res)) # True

# With batch processing, NB first dimension!
x_batch = torch.randn(5, 5)
y_batch = torch.randn(5, 7)

results = []
for i in range(5):
    results.append(conv1d(x_batch[i, :], y_batch[i, :]))
res1 = torch.stack(results)
res = convolve(x_batch, y_batch)
print(torch.allclose(res1, res))  # True

print(timeit.timeit('convolve(x, y)', setup=setup, number=10000)) # 4.83391789999996
print(timeit.timeit('conv1d(x, y)', setup=setup, number=10000))   # 0.2799923000000035

In the block above you can see that performing convolution 5 times using the conv1d function produces the same result as convolve on batched inputs.在上面的块中,您可以看到使用conv1d function 执行卷积 5 次产生与对批处理输入进行convolve相同的结果。 We can also see that convolve (= 4.8s) is much slower than the conv1d (= 0.28s).我们还可以看到convolve (= 4.8s) 比conv1d (= 0.28s) 慢得多。 Below we assess the slow part of the convolve function WITHOUT batch processing using line_profiler :下面我们评估convolve function 的缓慢部分,而无需使用line_profiler进行批处理:

%load_ext line_profiler
%lprun -f convolve convolve(x, y)  # evaluated without batch-processing!

Output: Output:

Timer unit: 1e-07 s

Total time: 0.0010383 s
File: C:\python_projects\pysumo\SO_CONVOLVE_QUESTION.py
Function: convolve at line 9

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     9                                           def convolve(a, v):
    10         1         68.0     68.0      0.7      if a.ndim == 1:
    11         1        271.0    271.0      2.6          a = a.view(1, -1)
    12         1         44.0     44.0      0.4          v = v.view(1, -1) 
    13                                           
    14         1         28.0     28.0      0.3      nrows, vcols = v.shape
    15         1         12.0     12.0      0.1      acols = a.shape[1]
    16                                           
    17         1       4337.0   4337.0     41.8      expanded = a.view((nrows, acols, 1)) * v.view((nrows, 1, vcols))
    18         1         12.0     12.0      0.1      noutdim = vcols + acols - 1
    19         1        127.0    127.0      1.2      out = torch.zeros((nrows, noutdim))
    20         6         32.0      5.3      0.3      for i in range(acols):  
    21        40        209.0      5.2      2.0          for j in range(vcols):
    22        35       5194.0    148.4     50.0              out[:, i+j] += expanded[:, i, j]  
    23         1         49.0     49.0      0.5      return out.squeeze()

Obviously a double for-loop and the line creating the expanded tensor are the slowest.显然,双 for 循环和创建expanded张量的线是最慢的。 Are these parts avoidable with better code-design?通过更好的代码设计可以避免这些部分吗?

Pytorch has a batch analyzing tool called torch.nn.functional and there you have a conv1d function (obviously 2d as well and much much more). Pytorch 有一个名为torch.nn.functional的批量分析工具,你有一个conv1d function(显然还有 2d 等等)。 we will use conv1d .我们将使用conv1d

Suppose you want to convolve 100 vectors given in v1 with 1 another vector given in v2 .假设您想将v1中给出的 100 个向量与v2中给出的另一个向量 1 进行卷积。 v1 has dimension of (minibatch, in channels, weights) and you need 1 channel by default. v1的维度为(minibatch, in channels, weights) ,默认需要 1 个通道。 In addition, v2 has dimensions of * (\text{out_channels}, (out_channels,groups / in_channels,kW)*. You are using 1 channel and therefore 1 group so v1 and v2 will be given by:此外, v2的尺寸为 * (\text{out_channels}, (out_channels,groups / in_channels,kW)*。您使用的是 1 个通道,因此有 1 个组,因此v1v2将由以下公式给出:

import torch
from torch.nn import functional as F

num_vectors = 100
len_vectors = 9
v1 = torch.rand((num_vectors, 1, len_vectors))
v2 = torch.rand(1, 1, 6)

now we can simply compute the necessary padding via现在我们可以通过简单地计算必要的填充

padding = torch.min(torch.tensor([v1.shape[-1], v2.shape[-1]])).item() - 1

and the convolution can be done using卷积可以使用

conv_result = temp = F.conv1d(v1, v2, padding=padding)

I did not time it but it should be considerably faster than your initial double for loop.我没有计时,但它应该比你最初的双 for 循环快得多。

Turns out that there is a way to do it without for-loops via grouping of the inputs along a dimension:事实证明,有一种方法可以在没有 for 循环的情况下通过沿维度对输入进行分组来做到这一点:

out = torch.conv1d(x_batch.unsqueeze(0), y_batch.unsqueeze(1).flip(2), padding=y_batch.size(1)-1, groups=x_batch.size(0))
print(torch.allclose(out, res1))  # True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM