np.tile 的 Numba 兼容实现？

Question

根据这篇论文，我正在编写一些用于去雾图像的代码，并且我从一个废弃的Py2.7 实现开始。 从那时起，尤其是使用 Numba，我已经进行了一些真正的性能改进（很重要，因为我必须在 8K 图像上运行它）。

我非常确信我最后一个显着的性能瓶颈是在执行框过滤器步骤时（我已经为每张图像减少了近一分钟，但最后一个缓慢的步骤是 ~30 秒/图像），而且我快要做到了在 Numba 中作为nopython运行：

@njit # Row dependencies means can't be parallel
def yCumSum(a):
    """
    Numba based computation of y-direction
    cumulative sum. Can't be parallel!
    """
    out = np.empty_like(a)
    out[0, :] = a[0, :]
    for i in prange(1, a.shape[0]):
        out[i, :] = a[i, :] + out[i - 1, :]
    return out

@njit(parallel= True)
def xCumSum(a):
    """
    Numba-based parallel computation
    of X-direction cumulative sum
    """
    out = np.empty_like(a)
    for i in prange(a.shape[0]):
        out[i, :] = np.cumsum(a[i, :])
    return out

@jit
def _boxFilter(m, r, gpu= hasGPU):
    if gpu:
        m = cp.asnumpy(m)
    out = __boxfilter__(m, r)
    if gpu:
        return cp.asarray(out)
    return out

@jit(fastmath= True)
def __boxfilter__(m, r):
    """
    Fast box filtering implementation, O(1) time.
    Parameters
    ----------
    m:  a 2-D matrix data normalized to [0.0, 1.0]
    r:  radius of the window considered
    Return
    -----------
    The filtered matrix m'.
    """
    #H: height, W: width
    H, W = m.shape
    #the output matrix m'
    mp = np.empty(m.shape)

    #cumulative sum over y axis
    ySum = yCumSum(m) #np.cumsum(m, axis=0)
    #copy the accumulated values of the windows in y
    mp[0:r+1,: ] = ySum[r:(2*r)+1,: ]
    #differences in y axis
    mp[r+1:H-r,: ] = ySum[(2*r)+1:,: ] - ySum[ :H-(2*r)-1,: ]
    mp[(-r):,: ] = np.tile(ySum[-1,: ], (r, 1)) - ySum[H-(2*r)-1:H-r-1,: ]

    #cumulative sum over x axis
    xSum = xCumSum(mp) #np.cumsum(mp, axis=1)
    #copy the accumulated values of the windows in x
    mp[:, 0:r+1] = xSum[:, r:(2*r)+1]
    #difference over x axis
    mp[:, r+1:W-r] = xSum[:, (2*r)+1: ] - xSum[:, :W-(2*r)-1]
    mp[:, -r: ] = np.tile(xSum[:, -1][:, None], (1, r)) - xSum[:, W-(2*r)-1:W-r-1]
    return mp

边缘有很多事情要做，但是如果我可以将 tile 操作作为 nopython 调用，我可以 nopython 整个 boxfilter 步骤并获得很大的性能提升。 我不太愿意做一些非常具体的事情，因为我很想在其他地方重用这段代码，但我不会特别 object 将它限制为 2D scope。 无论出于何种原因，我只是盯着这个，并不确定从哪里开始。

Answer 1

np.tile 有点太复杂，无法完全重新实现，但除非我误读，否则看起来你只需要获取一个向量，然后沿着不同的轴r次重复它。

一种兼容 Numba 的方法是编写

y = x.repeat(r).reshape((-1, r))

然后x将沿第二维重复r次，因此y[i, j] == x[i] 。

例子：

In [2]: x = np.arange(5)                                                                                                

In [3]: x.repeat(3).reshape((-1, 3))                                                                                                                                  
Out[3]: 
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4]])

如果您希望x沿第一个维度重复，只需转置yT 。

np.tile 的 Numba 兼容实现？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-09 20:46:21

np.tile 的 Numba 兼容实现？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-09 20:46:21

解决方案1
2 已采纳 2020-05-09 20:46:21