简体   繁体   English


[英]Finding matching submatrices inside a matrix

I have a 100x200 2D array expressed as a numpy array consisting of black (0) and white (255) cells. 我有一个100x200的2D数组,表示为由黑色(0)和白色(255)单元组成的numpy数组。 It is a bitmap file. 它是一个位图文件。 I then have 2D shapes (it's easiest to think of them as letters) that are also 2D black and white cells. 然后我有2D形状(最容易将它们视为字母),它们也是2D黑白细胞。

I know I can naively iterate through the matrix but this is going to be a 'hot' portion of my code so speed is an concern. 我知道我可以天真地遍历矩阵,但这将成为我的代码的“热门”部分,因此速度是一个问题。 Is there a fast way to perform this in numpy/scipy? 在numpy / scipy中有没有快速的方法来执行此操作?

I looked briefly at Scipy's correlate function. 我简要地看了看Scipy的相关功能。 I am not interested in 'fuzzy matches', only exact matches. 我对'模糊匹配'不感兴趣,只对完全匹配感兴趣。 I also looked at some academic papers but they are above my head. 我也看了一些学术论文,但它们都在我的头上。

You can use correlate. 可以使用correlate。 You'll need to set your black values to -1 and your white values to 1 (or vice-versa) so that you know the value of the peak of the correlation, and that it only occurs with the correct letter. 您需要将黑色值设置为-1,将白色值设置为1(反之亦然),以便您知道相关峰值的值,并且只有正确的字母才会出现。

The following code does what I think you want. 以下代码执行我认为您想要的。

import numpy
from scipy import signal

# Set up the inputs
a = numpy.random.randn(100, 200)
a[a<0] = 0
a[a>0] = 255

b = numpy.random.randn(20, 20)
b[b<0] = 0
b[b>0] = 255

# put b somewhere in a
a[37:37+b.shape[0], 84:84+b.shape[1]] = b

# Now the actual solution...

# Set the black values to -1
a[a==0] = -1
b[b==0] = -1

# and the white values to 1
a[a==255] = 1
b[b==255] = 1

max_peak = numpy.prod(b.shape)

# c will contain max_peak where the overlap is perfect
c = signal.correlate(a, b, 'valid')

overlaps = numpy.where(c == max_peak)

print overlaps

This outputs (array([37]), array([84])) , the locations of the offsets set in the code. 这输出(array([37]), array([84])) ,代码中设置的偏移的位置。

You will likely find that if your letter size multiplied by your big array size is bigger than roughly Nlog(N), where N is corresponding size of the big array in which you're searching (for each dimension), then you will probably get a speed up by using an fft based algorithm like scipy.signal.fftconvolve (bearing in mind that you'll need to flip each axis of one of the datasets if you're using a convolution rather than a correlation - flipud and fliplr ). 您可能会发现,如果您的字母大小乘以您的大数组大小大于Nlog(N),其中N是您正在搜索的大数组的相应大小(对于每个维度),那么您可能会得到通过使用基于fft的算法(如scipy.signal.fftconvolve加快速度(请记住,如果使用卷积而不是相关,则需要翻转其中一个数据集的每个轴 - flipudfliplr )。 The only modification would be to assigning c: 唯一的修改是分配c:

c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')

Comparing the timings on the sizes above: 比较上述尺寸的时间:

In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')
100 loops, best of 3: 6.78 ms per loop

In [6]: timeit c = signal.correlate(a, b, 'valid')
10 loops, best of 3: 151 ms per loop

Here is a method you may be able to use, or adapt, depending upon the details of your requirements. 以下是您可以使用或调整的方法,具体取决于您的要求的详细信息。 It uses ndimage.label and ndimage.find_objects : 它使用ndimage.labelndimage.find_objects

  1. label the image using ndimage.label this finds all blobs in the array and labels them to integers. 使用ndimage.label标记图像,这将查找数组中的所有blob并将它们标记为整数。
  2. Get the slices of these blobs using ndimage.find_objects 使用ndimage.find_objects获取这些blob的切片
  3. Then use set intersection to see if the found blobs correspond with your wanted blobs 然后使用set intersection查看found blobs是否与您wanted blobs相对应

Code for 1. and 2. : 代码为1.2. .:

import scipy
from scipy import ndimage
import matplotlib.pyplot as plt

#flatten to ensure greyscale.
im = scipy.misc.imread('letters.png',flatten=1)
objects, number_of_objects = ndimage.label(im)
letters = ndimage.find_objects(objects)

#to save the images for illustrative purposes only:
for i,j in enumerate(letters):

example input: 示例输入:


labelled: 标记:


isolated blobs to test against: 孤立的blob测试:


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM