简体   繁体   English

python:检查numpy数组是否包含另一个数组的任何元素

[英]python: check if an numpy array contains any element of another array

What is the best way to check if an numpy array contains any element of another array? 检查numpy数组是否包含另一个数组的任何元素的最佳方法是什么?

example: 例:

array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]`

I want to get a True if array1 contains any value of array2 , otherwise a False . 如果array1包含array2任何值,我想得到一个True ,否则为False

Using Pandas, you can use isin : 使用Pandas,您可以使用isin

a1 = np.array([10,5,4,13,10,1,1,22,7,3,15,9])
a2 = np.array([3,4,9,10,13,15,16,18,19,20,21,22,23])

>>> pd.Series(a1).isin(a2).any()
True

And using the in1d numpy function(per the comment from @Norman): 并使用in1d numpy函数(根据@Norman的评论):

>>> np.any(np.in1d(a1, a2))
True

For small arrays such as those in this example, the solution using set is the clear winner. 对于小例如本例中的数组,使用set的解决方案显然是赢家。 For larger, dissimilar arrays (ie no overlap), the Pandas and Numpy solutions are faster. 对于较大的,不相似的阵列(即没有重叠),Pandas和Numpy解决方案更快。 However, np.intersect1d appears to excel for larger arrays. 但是, np.intersect1d似乎比较大的数组更出色。

Small arrays (12-13 elements) 小阵列(12-13个元素)

%timeit set(array1) & set(array2)
The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.69 µs per loop

%timeit any(i in a1 for i in a2)
The slowest run took 12.29 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 1.88 µs per loop

%timeit np.intersect1d(a1, a2)
The slowest run took 10.29 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 15.6 µs per loop

%timeit np.any(np.in1d(a1, a2))
10000 loops, best of 3: 27.1 µs per loop

%timeit pd.Series(a1).isin(a2).any()
10000 loops, best of 3: 135 µs per loop

Using an array with 100k elements (no overlap) : 使用具有100k元素的数组(无重叠)

a3 = np.random.randint(0, 100000, 100000)
a4 = a3 + 100000

%timeit np.intersect1d(a3, a4)
100 loops, best of 3: 13.8 ms per loop    

%timeit pd.Series(a3).isin(a4).any()
100 loops, best of 3: 18.3 ms per loop

%timeit np.any(np.in1d(a3, a4))
100 loops, best of 3: 18.4 ms per loop

%timeit set(a3) & set(a4)
10 loops, best of 3: 23.6 ms per loop

%timeit any(i in a3 for i in a4)
1 loops, best of 3: 34.5 s per loop

You can try this 你可以试试这个

>>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
>>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
>>> set(array1) & set(array2)
set([3, 4, 9, 10, 13, 15, 22])

If you get result means there are common elements in both array. 如果得到结果意味着两个数组中都有共同的元素。

If result is empty means no common elements. 如果结果为空则表示没有共同的元素。

You can use any built-in function and list comprehension: 您可以使用any内置函数和列表理解:

>>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
>>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
>>> any(i in array2 for i in array1)
True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM