简体   繁体   English

在Python中快速在图片上运行滑动窗口方法的技巧

[英]Tricks to run sliding window approach on images fast in Python

The Haar cascade classifier uses sliding window approach with pyramid to detect objects. Haar级联分类器使用带有金字塔的滑动窗口方法来检测对象。 For me it takes about 0.01s to detect objects in an image. 对我来说,检测图像中的物体大约需要0.01s。 However my question is that how it can be so fast while uses sliding window approach? 但是我的问题是,在使用滑动窗口方法时如何如此之快? (I implemented a CNN for detect object which used sliding window for detect objects with no pyramids, although it took 2 seconds to detect objects). (我实现了一个用于检测对象的CNN,它使用滑动窗口来检测没有金字塔的对象,尽管检测对象花费了2秒钟)。 I want to know what are the tricks to run sliding window approach faster? 我想知道更快地运行滑动窗口方法的诀窍是什么? I used two loops for sliding whole image with some strides and also made it parallel, but it is still much slower than OpenCV implementation. 我使用了两个循环来使整个图像滑动一些步伐,并使其平行,但它仍然比OpenCV实施慢得多。

The quickest way (in my experience) is to use the numpy.lib.stride_tricks.as_strided function. (以我的经验)最快的方法是使用numpy.lib.stride_tricks.as_strided函数。 Effectively what we do is first use the numpy function to generate and store all of the patches (sliding window positions) in one big array. 实际上,我们要做的是首先使用numpy函数将所有补丁(滑动窗口位置)生成并存储在一个大数组中。 Then we can just map that array to our function. 然后我们可以将该数组映射到我们的函数。

First, define the shape which is defined as (image height, image width, kernel height, kernel width). 首先,定义定义为的形状(图像高度,图像宽度,内核高度,内核宽度)。 Then you can stride across the bits of the image (ie 8bit image each pixel is an 8bit stride). 然后,您可以跨越图像的各个位(即8位图像,每个像素为8位跨度)。 In this case the patches will be a repeat of the strides of the image twice. 在这种情况下,补丁将是图像步幅的两次重复。 You can check the stride with img.strides . 您可以使用img.strides检查跨度。

def some_func(roi):
    '''
    simple function to return the mean of the region
    of interest
    '''
    return np.mean(roi)

img = np.zeros((30000,30000), dtype=np.uint8)
img_shape = img.shape

size = 3 # window size i.e. here is 3x3 window

shape = (img.shape[0] - size + 1, img.shape[1] - size + 1, size, size)
strides = 2 * img.strides
patches = np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)
patches = patches.reshape(-1, size, size)

output_img = np.array([some_func(roi) for roi in patches])
output_img.reshape(img_size)

There are other increases you could do like vectorizing your function np.vectorize() in certain cases. 在某些情况下,还可以执行其他一些操作,例如对函数np.vectorize()进行矢量化处理。 If you wanted to calculate the mean you could have also just used output_img = patches.mean(axis=(-1, -2)) and avoid the need to map to a function, or the need to reshape. 如果要计算均值,则还可以只使用output_img = patches.mean(axis=(-1, -2)) ,而无需映射到函数或重塑形状。 There are also potentially quicker ways to map an array to a function see this post . 还有可能更快地将数组映射到函数的方法, 请参见本文 I've given this solution as any procedure can be added into the function and the question seemed pretty general. 我已经给出了这种解决方案,因为可以将任何过程添加到函数中,而且这个问题似乎很笼统。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM