简体   繁体   English

2D numpy ndarray的交叉点

[英]Intersection of 2D numpy ndarrays

I have a question. 我有个问题。

I have two numpy arrays that are OpenCV convex hulls and I want to check for intersection without creating for loops or creating images and performing numpy.bitwise_and on them, both of which are quite slow in Python. 我有两个numpy数组是OpenCV凸包,我想检查交集而不创建for循环或创建图像并对它们执行numpy.bitwise_and ,这两者在Python中都很慢。 The arrays look like this: 数组看起来像这样:

[[[x1 y1]]
 [[x2 y2]]
 [[x3 y3]]
...
 [[xn yn]]]

Considering [[x1 y1]] as one single element, I want to perform intersection between two numpy ndarrays. 考虑[[x1 y1]]作为一个单独的元素,我想在两个numpy ndarray之间执行交集。 How can I do that? 我怎样才能做到这一点? I have found a few questions of similar nature, but I could not figure out the solution to this from there. 我发现了几个类似性质的问题,但我无法从中找出解决方法。

Thanks in advance! 提前致谢!

You can use a view of the array as a single dimension to the intersect1d function like this: 您可以使用数组视图作为intersect1d函数的单个维度,如下所示:

def multidim_intersect(arr1, arr2):
    arr1_view = arr1.view([('',arr1.dtype)]*arr1.shape[1])
    arr2_view = arr2.view([('',arr2.dtype)]*arr2.shape[1])
    intersected = numpy.intersect1d(arr1_view, arr2_view)
    return intersected.view(arr1.dtype).reshape(-1, arr1.shape[1])

This creates a view of each array, changing each row to a tuple of values. 这将创建每个数组的视图,将每行更改为值的元组。 It then performs the intersection, and changes the result back to the original format. 然后它执行交集,并将结果更改回原始格式。 Here's an example of using it: 以下是使用它的示例:

test_arr1 = numpy.array([[0, 2],
                         [1, 3],
                         [4, 5],
                         [0, 2]])

test_arr2 = numpy.array([[1, 2],
                         [0, 2],
                         [3, 1],
                         [1, 3]])

print multidim_intersect(test_arr1, test_arr2)

This prints: 这打印:

[[0 2]
 [1 3]]

you can use http://pypi.python.org/pypi/Polygon/2.0.4 , here is an example: 你可以使用http://pypi.python.org/pypi/Polygon/2.0.4 ,这是一个例子:

>>> import Polygon
>>> a = Polygon.Polygon([(0,0),(1,0),(0,1)])
>>> b = Polygon.Polygon([(0.3,0.3), (0.3, 0.6), (0.6, 0.3)])
>>> a & b
Polygon:
  <0:Contour: [0:0.60, 0.30] [1:0.30, 0.30] [2:0.30, 0.60]>

To convert the result of cv2.findContours to Polygon point format, you can: 要将cv2.findContours的结果转换为Polygon点格式,您可以:

points1 = contours[0].reshape(-1,2)

This will convert the shape from (N, 1, 2) to (N, 2) 这会将形状从(N,1,2)转换为(N,2)

Following is a full example: 以下是一个完整的例子:

import Polygon
import cv2
import numpy as np
from scipy.misc import bytescale

y, x = np.ogrid[-2:2:100j, -2:2:100j]

f1 = bytescale(np.exp(-x**2 - y**2), low=0, high=255)
f2 = bytescale(np.exp(-(x+1)**2 - y**2), low=0, high=255)


c1, hierarchy = cv2.findContours((f1>120).astype(np.uint8), 
                                       cv2.cv.CV_RETR_EXTERNAL, 
                                       cv2.CHAIN_APPROX_SIMPLE)

c2, hierarchy = cv2.findContours((f2>120).astype(np.uint8), 
                                       cv2.cv.CV_RETR_EXTERNAL, 
                                       cv2.CHAIN_APPROX_SIMPLE)


points1 = c1[0].reshape(-1,2) # convert shape (n, 1, 2) to (n, 2)
points2 = c2[0].reshape(-1,2)

import pylab as pl
poly1 = pl.Polygon(points1, color="blue", alpha=0.5)
poly2 = pl.Polygon(points2, color="red", alpha=0.5)
pl.figure(figsize=(8,3))
ax = pl.subplot(121)
ax.add_artist(poly1)
ax.add_artist(poly2)
pl.xlim(0, 100)
pl.ylim(0, 100)

a = Polygon.Polygon(points1)
b = Polygon.Polygon(points2)
intersect = a&b # calculate the intersect polygon

poly3 = pl.Polygon(intersect[0], color="green") # intersect[0] are the points of the polygon
ax = pl.subplot(122)
ax.add_artist(poly3)
pl.xlim(0, 100)
pl.ylim(0, 100)
pl.show()

Output: 输出:

在此输入图像描述

So this is what I did to get the job done: 所以这就是我为完成工作所做的事情:

import Polygon, numpy

# Here I extracted and combined some contours and created a convex hull from it.
# Now I wanna check whether a contour acquired differently intersects with this hull or not.

for contour in contours:  # The result of cv2.findContours is a list of contours
    contour1 = contour.flatten()
    contour1 = numpy.reshape(contour1, (int(contour1.shape[0]/2),-1))
    poly1 = Polygon.Polygon(contour1)

    hull = hull.flatten()  # This is the hull is previously constructued
    hull = numpy.reshape(hull, (int(hull.shape[0]/2),-1))
    poly2 = Polygon.Polygon(hull)

    if (poly1 & poly2).area()<= some_max_val:
        some_operations

I had to use for loop, and this altogether looks a bit tedious, although it gives me expected results. 我不得不使用for循环,这看起来有点单调乏味,虽然它给了我预期的结果。 Any better methods would be greatly appreciated! 任何更好的方法将不胜感激!

inspired by jiterrace's answer 灵感来自jiterrace的答案

I came across this post while working with Udacity deep learning class( trying to find the overlap between training and test data). 我在与Udacity深度学习课程(试图找到训练和测试数据之间的重叠)时遇到了这篇文章。

I am not familiar with "view" and found the syntax a bit hard to understand, probably the same when I try to communicate to my friends who think in "table". 我不熟悉“视图”,发现语法有点难以理解,当我尝试与在“表格”中思考的朋友交流时,可能是相同的。 My approach is basically to flatten/reshape the ndarray of shape (N, X, Y) into shape (N, X*Y, 1). 我的方法基本上是将形状(N,X,Y)的形状(N,X,Y,1)展平/重塑成形状(N,X * Y,1)。

print(train_dataset.shape)
print(test_dataset.shape)
#(200000L, 28L, 28L)
#(10000L, 28L, 28L)

1). 1)。 INNER JOIN (easier to understand, slow) INNER JOIN(更容易理解,慢)

import pandas as pd

%%timeit -n 1 -r 1
def multidim_intersect_df(arr1, arr2):
    p1 = pd.DataFrame([r.flatten() for r in arr1]).drop_duplicates()
    p2 = pd.DataFrame([r.flatten() for r in arr2]).drop_duplicates()
    res = p1.merge(p2)
    return res
inters_df = multidim_intersect_df(train_dataset, test_dataset)
print(inters_df.shape)
#(1153, 784)
#1 loop, best of 1: 2min 56s per loop

2). 2)。 SET INTERSECTION (fast) 设置交叉(快速)

%%timeit -n 1 -r 1
def multidim_intersect(arr1, arr2):
    arr1_new = arr1.reshape((-1, arr1.shape[1]*arr1.shape[2])) # -1 means row counts are inferred from other dimensions
    arr2_new = arr2.reshape((-1, arr2.shape[1]*arr2.shape[2]))
    intersected = set(map(tuple, arr1_new)).intersection(set(map(tuple, arr2_new)))  # list is not hashable, go tuple
    return list(intersected)  # in shape of (N, 28*28)

inters = multidim_intersect(train_dataset, test_dataset)
print(len(inters))
# 1153
#1 loop, best of 1: 34.6 s per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM