简体   繁体   English

替代'numpy.tile`用于周期性掩模

[英]Alternative to `numpy.tile` for periodic mask

I have an image, stored in a numpy array of uint8 s, of shape (planes, rows, cols) . 我有一个图像,存储在uint8的numpy数组中,形状(planes, rows, cols) I need to compare it to the values stored in a mask, also of uint8 s, of shape (mask_rows, mask_cols) . 我需要将它与存储在掩码(也是uint8 s)中的值(mask_rows, mask_cols) While the image may be very large, the mask is usually smallish, normally (256, 256) and is to be tiled over image . 虽然图像可能非常大,但是掩模通常很小,通常(256, 256) 256,256 (256, 256)并且要在image上平铺。 To simplify the code, lets pretend that rows = 100 * mask_rows and cols = 100 * mask_cols . 为简化代码,我们假设rows = 100 * mask_rowscols = 100 * mask_cols

The way I am currently handling this thresholding is something like this: 我目前处理这个阈值的方式是这样的:

out = image >= np.tile(mask, (image.shape[0], 100, 100))

The largest array I can process this way before getting slapped in the face with a MemoryError is a little larger than (3, 11100, 11100) . 我可以用这种方式处理的最大数组,然后使用MemoryError(3, 11100, 11100)略大。 The way I figured it, doing things this way I have up to three ginormous arrays coexisting in memory: image , the tiled mask , and my return out . 就我想通了,这样做,让我有多达三个极大的相阵列内存共存: image ,平铺mask ,我的回报out But the tiled mask is the same little array copied over and over 10,000 times. 但是,平铺的掩码是相同的小数组,复制超过10,000次。 So if I could spare that memory, I would use only 2/3 the memory, and should be able to process images 3/2 larger, so of size around (3, 13600, 13600) . 因此,如果我可以节省内存,我将只使用2/3的内存,并且应该能够处理大3/2的图像,大小(3, 13600, 13600) This is, by the way, consistent with what I get if I do the thresholding in place with 顺便说一句,这与我得到的阈值一致

np.greater_equal(image, (image.shape[0], 100, 100), out=image)

My (failed) attempt at exploiting the periodic nature of mask to process larger arrays has been to index mask with periodic linear arrays: 我(失败)尝试利用mask的周期性来处理更大的数组已经用周期性线性数组索引mask

mask = mask[None, ...]
rows = np.tile(np.arange(mask.shape[1], (100,))).reshape(1, -1, 1)
cols = np.tile(np.arange(mask.shape[2], (100,))).reshape(1, 1, -1)
out = image >= mask[:, rows, cols]

For small arrays it does produce the same result as the other one, although with something of a 20x slowdown(!!!), but it terribly fails to perform for the larger sizes. 对于小型阵列,它确实产生与另一个阵列相同的结果,尽管有20倍的减速(!!!),但是对于较大的尺寸来说却非常糟糕。 Instead of a MemoryError it eventually crashes python, even for values that the other method handles with no problems. 它最终会崩溃python而不是MemoryError ,即使对于其他方法处理没有问题的值也是如此。

What I think is happening is that numpy is actually constructing the (planes, rows, cols) array to index mask , so not only is there no memory saving, but since it is an array of int32 s, it is actually taking four times more space to store... 我认为正在发生的是numpy实际上构造了(planes, rows, cols)数组到索引mask ,所以不仅没有内存保存,而且因为它是一个int32的数组,它实际上需要多四倍存储空间......

Any ideas on how to go about this? 关于如何解决这个问题的任何想法? To spare you the trouble, find below some sandbox code to play around with: 为了免除麻烦,请在下面找到一些沙盒代码:

import numpy as np

def halftone_1(image, mask) :
    return np.greater_equal(image, np.tile(mask, (image.shape[0], 100, 100)))

def halftone_2(image, mask) :
    mask = mask[None, ...]
    rows = np.tile(np.arange(mask.shape[1]),
                   (100,)).reshape(1, -1, 1)
    cols = np.tile(np.arange(mask.shape[2]),
                   (100,)).reshape(1, 1, -1)
    return np.greater_equal(image, mask[:, rows, cols])

rows, cols, planes = 6000, 6000, 3
image = np.random.randint(-2**31, 2**31 - 1, size=(planes * rows * cols // 4))
image = image.view(dtype='uint8').reshape(planes, rows, cols)
mask = np.random.randint(256,
                         size=(1, rows // 100, cols // 100)).astype('uint8')

#np.all(halftone_1(image, mask) == halftone_2(image, mask))
#halftone_1(image, mask)
#halftone_2(image, mask)

import timeit
print timeit.timeit('halftone_1(image, mask)',
                    'from __main__ import halftone_1, image, mask',
                    number=1)
print timeit.timeit('halftone_2(image, mask)',
                    'from __main__ import halftone_2, image, mask',
                    number=1)

I would almost have pointed you to a rolling window type of trick, but for this simple non-overlapping thing, normal reshape does it just as well. 我几乎已经指出了一种滚动窗口类型的技巧,但对于这个简单的非重叠的东西,正常的重塑也是如此。 (the reshapes here are safe, numpy will never make a copy for them) (这里的重塑是安全的,numpy 永远不会为他们复制)

def halftone_reshape(image, mask):
    # you can make up a nicer reshape code maybe, it is a bit ugly. The
    # rolling window code can do this too (but much more general then reshape).
    new_shape = np.array(zip(image.shape, mask.shape))
    new_shape[:,0] /= new_shape[:,1]
    reshaped_image = image.reshape(new_shape.ravel())

    reshaped_mask = mask[None,:,None,:,None,:]

    # and now they just broadcast:
    result_funny_shaped = reshaped_image >= reshaped_mask

    # And you can just reshape it back:
    return result_funny_shaped.reshape(image.shape)

And since timings are everything (not really but...): 因为时间是一切(不是真的,但是......):

In [172]: %timeit halftone_reshape(image, mask)
1 loops, best of 3: 280 ms per loop

In [173]: %timeit halftone_1(image, mask)
1 loops, best of 3: 354 ms per loop

In [174]: %timeit halftone_2(image, mask)
1 loops, best of 3: 3.1 s per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM