简体   繁体   English

np.tile 的 Numba 兼容实现?

[英]Numba-compatible implementation of np.tile?

I'm working on some code for dehazing images, based on this paper , and I started with an abandoned Py2.7 implementation .根据这篇论文,我正在编写一些用于去雾图像的代码,并且我从一个废弃的Py2.7 实现开始。 Since then, particularly with Numba, I've made some real performance improvements (important since I'll have to run this on 8K images).从那时起,尤其是使用 Numba,我已经进行了一些真正的性能改进(很重要,因为我必须在 8K 图像上运行它)。

I'm pretty convinced my last significant performance bottleneck is in performing the box filter step (I've already shaved off almost a minute per image, but this last slow step is ~30s/image), and I'm close to getting it to run as nopython in Numba:我非常确信我最后一个显着的性能瓶颈是在执行框过滤器步骤时(我已经为每张图像减少了近一分钟,但最后一个缓慢的步骤是 ~30 秒/图像),而且我快要做到了在 Numba 中作为nopython运行:

@njit # Row dependencies means can't be parallel
def yCumSum(a):
    """
    Numba based computation of y-direction
    cumulative sum. Can't be parallel!
    """
    out = np.empty_like(a)
    out[0, :] = a[0, :]
    for i in prange(1, a.shape[0]):
        out[i, :] = a[i, :] + out[i - 1, :]
    return out

@njit(parallel= True)
def xCumSum(a):
    """
    Numba-based parallel computation
    of X-direction cumulative sum
    """
    out = np.empty_like(a)
    for i in prange(a.shape[0]):
        out[i, :] = np.cumsum(a[i, :])
    return out

@jit
def _boxFilter(m, r, gpu= hasGPU):
    if gpu:
        m = cp.asnumpy(m)
    out = __boxfilter__(m, r)
    if gpu:
        return cp.asarray(out)
    return out

@jit(fastmath= True)
def __boxfilter__(m, r):
    """
    Fast box filtering implementation, O(1) time.
    Parameters
    ----------
    m:  a 2-D matrix data normalized to [0.0, 1.0]
    r:  radius of the window considered
    Return
    -----------
    The filtered matrix m'.
    """
    #H: height, W: width
    H, W = m.shape
    #the output matrix m'
    mp = np.empty(m.shape)

    #cumulative sum over y axis
    ySum = yCumSum(m) #np.cumsum(m, axis=0)
    #copy the accumulated values of the windows in y
    mp[0:r+1,: ] = ySum[r:(2*r)+1,: ]
    #differences in y axis
    mp[r+1:H-r,: ] = ySum[(2*r)+1:,: ] - ySum[ :H-(2*r)-1,: ]
    mp[(-r):,: ] = np.tile(ySum[-1,: ], (r, 1)) - ySum[H-(2*r)-1:H-r-1,: ]

    #cumulative sum over x axis
    xSum = xCumSum(mp) #np.cumsum(mp, axis=1)
    #copy the accumulated values of the windows in x
    mp[:, 0:r+1] = xSum[:, r:(2*r)+1]
    #difference over x axis
    mp[:, r+1:W-r] = xSum[:, (2*r)+1: ] - xSum[:, :W-(2*r)-1]
    mp[:, -r: ] = np.tile(xSum[:, -1][:, None], (1, r)) - xSum[:, W-(2*r)-1:W-r-1]
    return mp

There's plenty to do around the edges, but if I can get the tile operation as a nopython call, I can nopython the whole boxfilter step and get a big performance boost.边缘有很多事情要做,但是如果我可以将 tile 操作作为 nopython 调用,我可以 nopython 整个 boxfilter 步骤并获得很大的性能提升。 I'm not super inclined to do something really really specific as I'd love to reuse this code elsewhere, but I wouldn't particularly object to it being limited to a 2D scope.我不太愿意做一些非常具体的事情,因为我很想在其他地方重用这段代码,但我不会特别 object 将它限制为 2D scope。 For whatever reason I'm just staring at this and not really sure where to start.无论出于何种原因,我只是盯着这个,并不确定从哪里开始。

np.tile is a bit too complicated to reimplement in full, but unless I'm misreading it looks like you only need to take a vector and then repeat it along a different axis r times. np.tile 有点太复杂,无法完全重新实现,但除非我误读,否则看起来你只需要获取一个向量,然后沿着不同的轴r次重复它。

A Numba-compatible way to do this is to write一种兼容 Numba 的方法是编写

y = x.repeat(r).reshape((-1, r))

Then x will be repeated r times along the second dimension, so that y[i, j] == x[i] .然后x将沿第二维重复r次,因此y[i, j] == x[i]

Example:例子:

In [2]: x = np.arange(5)                                                                                                

In [3]: x.repeat(3).reshape((-1, 3))                                                                                                                                  
Out[3]: 
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4]])

If you want x to be repeated along the first dimension instead, just take the transpose yT .如果您希望x沿第一个维度重复,只需转置yT

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM