[英]python: check if an numpy array contains any element of another array
What is the best way to check if an numpy array contains any element of another array? 检查numpy数组是否包含另一个数组的任何元素的最佳方法是什么?
example: 例:
array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]`
I want to get a True
if array1
contains any value of array2
, otherwise a False
. 如果array1
包含array2
任何值,我想得到一个True
,否则为False
。
Using Pandas, you can use isin
: 使用Pandas,您可以使用isin
:
a1 = np.array([10,5,4,13,10,1,1,22,7,3,15,9])
a2 = np.array([3,4,9,10,13,15,16,18,19,20,21,22,23])
>>> pd.Series(a1).isin(a2).any()
True
And using the in1d numpy function(per the comment from @Norman): 并使用in1d numpy函数(根据@Norman的评论):
>>> np.any(np.in1d(a1, a2))
True
For small arrays such as those in this example, the solution using set is the clear winner. 对于小例如本例中的数组,使用set的解决方案显然是赢家。 For larger, dissimilar arrays (ie no overlap), the Pandas and Numpy solutions are faster. 对于较大的,不相似的阵列(即没有重叠),Pandas和Numpy解决方案更快。 However, np.intersect1d
appears to excel for larger arrays. 但是, np.intersect1d
似乎比较大的数组更出色。
Small arrays (12-13 elements) 小阵列(12-13个元素)
%timeit set(array1) & set(array2)
The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.69 µs per loop
%timeit any(i in a1 for i in a2)
The slowest run took 12.29 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 1.88 µs per loop
%timeit np.intersect1d(a1, a2)
The slowest run took 10.29 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 15.6 µs per loop
%timeit np.any(np.in1d(a1, a2))
10000 loops, best of 3: 27.1 µs per loop
%timeit pd.Series(a1).isin(a2).any()
10000 loops, best of 3: 135 µs per loop
Using an array with 100k elements (no overlap) : 使用具有100k元素的数组(无重叠) :
a3 = np.random.randint(0, 100000, 100000)
a4 = a3 + 100000
%timeit np.intersect1d(a3, a4)
100 loops, best of 3: 13.8 ms per loop
%timeit pd.Series(a3).isin(a4).any()
100 loops, best of 3: 18.3 ms per loop
%timeit np.any(np.in1d(a3, a4))
100 loops, best of 3: 18.4 ms per loop
%timeit set(a3) & set(a4)
10 loops, best of 3: 23.6 ms per loop
%timeit any(i in a3 for i in a4)
1 loops, best of 3: 34.5 s per loop
You can try this 你可以试试这个
>>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
>>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
>>> set(array1) & set(array2)
set([3, 4, 9, 10, 13, 15, 22])
If you get result means there are common elements in both array. 如果得到结果意味着两个数组中都有共同的元素。
If result is empty means no common elements. 如果结果为空则表示没有共同的元素。
You can use any
built-in function and list comprehension: 您可以使用any
内置函数和列表理解:
>>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
>>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
>>> any(i in array2 for i in array1)
True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.