检查 NumPy 数组中是否存在值的最有效方法是什么？

Question

I have a very large NumPy array我有一个非常大的 NumPy 阵列

I want to check to see if a value exists in the 1st column of the array.我想检查数组的第一列中是否存在值。 I've got a bunch of homegrown ways (eg iterating through each row and checking), but given the size of the array I'd like to find the most efficient method.我有很多本土方法（例如遍历每一行并检查），但考虑到数组的大小，我想找到最有效的方法。

Thanks!谢谢！

Answer 1

How about怎么样

if value in my_array[:, col_num]:
    do_whatever

Edit: I think __contains__ is implemented in such a way that this is the same as @detly's version编辑：我认为__contains__的实现方式与@detly 的版本相同

Answer 2

The most obvious to me would be:对我来说最明显的是：

np.any(my_array[:, 0] == value)

Answer 3

To check multiple values, you can use numpy.in1d(), which is an element-wise function version of the python keyword in. If your data is sorted, you can use numpy.searchsorted(): To check multiple values, you can use numpy.in1d(), which is an element-wise function version of the python keyword in. If your data is sorted, you can use numpy.searchsorted():

import numpy as np
data = np.array([1,4,5,5,6,8,8,9])
values = [2,3,4,6,7]
print np.in1d(values, data)

index = np.searchsorted(data, values)
print data[index] == values

Answer 4

Fascinating.迷人。 I needed to improve the speed of a series of loops that must perform matching index determination in this same way.我需要提高必须以相同方式执行匹配索引确定的一系列循环的速度。 So I decided to time all the solutions here, along with some riff's.所以我决定在这里计算所有解决方案的时间，以及一些即兴演奏。

Here are my speed tests for Python 2.7.10:这是我对 Python 2.7.10 的速度测试：

import timeit
timeit.timeit('N.any(N.in1d(sids, val))', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

18.86137104034424 18.86137104034424

timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = [20010401010101+x for x in range(1000)]')

15.061666011810303 15.061666011810303

timeit.timeit('N.in1d(sids, val)', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

11.613027095794678 11.613027095794678

timeit.timeit('N.any(val == sids)', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

7.670552015304565 7.670552015304565

timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

5.610057830810547 5.610057830810547

timeit.timeit('val == sids', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

1.6632978916168213 1.6632978916168213

timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = set([20010401010101+x for x in range(1000)])')

0.0548710823059082 0.0548710823059082

timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = dict(zip([20010401010101+x for x in range(1000)],[True,]*1000))')

0.054754018783569336 0.054754018783569336

Very surprising!非常令人惊讶！ Orders of magnitude difference!数量级差异！

To summarize, if you just want to know whether something's in a 1D list or not:总而言之，如果您只想知道某物是否在一维列表中：

19s N.any(N.in1d(numpy array)) 19s N.any(N.in1d(numpy 数组))
15s x in (list) 15s x in (列表)
8s N.any(x == numpy array) 8s N.any(x == numpy 阵列)
6s x in (numpy array) 6s x in (numpy 数组)
.1s x in (set or a dictionary) .1s x in（集合或字典）

If you want to know where something is in the list as well (order is important):如果您还想知道列表中的某些内容（顺序很重要）：

12s N.in1d(x, numpy array) 12s N.in1d(x, numpy 阵列)
2s x == (numpy array) 2s x == (numpy 数组)

Answer 5

Adding to @HYRY's answer in1d seems to be fastest for numpy.对于 numpy，添加到@HYRY 的答案 in1d 似乎是最快的。 This is using numpy 1.8 and python 2.7.6.这是使用 numpy 1.8 和 python 2.7.6。

In this test in1d was fastest, however 10 in a look cleaner:在这个测试中 in1d 是最快的，但是10 in a看起来更干净：

a = arange(0,99999,3)
%timeit 10 in a
%timeit in1d(a, 10)

10000 loops, best of 3: 150 µs per loop
10000 loops, best of 3: 61.9 µs per loop

Constructing a set is slower than calling in1d, but checking if the value exists is a bit faster:构造一个集合比调用 in1d慢，但是检查值是否存在要快一点：

s = set(range(0, 99999, 3))
%timeit 10 in s

10000000 loops, best of 3: 47 ns per loop

Answer 6

The most convenient way according to me is:据我所知，最方便的方法是：

(Val in X[:, col_num])

where Val is the value that you want to check for and X is the array.其中 Val 是您要检查的值， X 是数组。 In your example, suppose you want to check if the value 8 exists in your the third column.在您的示例中，假设您要检查值 8 是否存在于您的第三列中。 Simply write简单地写

(8 in X[:, 2])

This will return True if 8 is there in the third column, else False.如果第三列中有 8，这将返回 True，否则返回 False。

Answer 7

If you are looking for a list of integers, you may use indexing for doing the work.如果您正在寻找整数列表，您可以使用索引来完成这项工作。 This also works with nd-arrays, but seems to be slower.这也适用于 nd-arrays，但似乎更慢。 It may be better when doing this more than once.不止一次这样做可能会更好。

def valuesInArray(values, array):
    values = np.asanyarray(values)
    array = np.asanyarray(array)
    assert array.dtype == np.int and values.dtype == np.int
    
    matches = np.zeros(array.max()+1, dtype=np.bool_)
    matches[values] = True
    
    res = matches[array]
    
    return np.any(res), res
    
    
array = np.random.randint(0, 1000, (10000,3))
values = np.array((1,6,23,543,222))

matched, matches = valuesInArray(values, array)

By using numba and njit, I could get a speedup of this by ~x10.通过使用 numba 和 njit，我可以将这个速度提高 ~x10。

Answer 8

If you want to check whether list a is in numpy array b then use the following syntax:如果要检查列表a是否在 numpy 数组b中，请使用以下语法：

np.any(np.equal(a, b).all(axis=1))

Putting axis = 1 considering numpy array is of shape n*2考虑到 numpy 阵列的形状为n*2 ，放置axis = 1

检查 NumPy 数组中是否存在值的最有效方法是什么？

问题描述

7 个解决方案

解决方案1
84 已采纳 2011-08-17 06:19:55

解决方案2
51 2011-08-17 06:18:51

解决方案3
43 2011-08-17 07:55:59

解决方案4
23 2016-08-10 22:41:50

解决方案5
3 2014-09-28 20:41:57

解决方案6
0 2018-11-20 11:14:10

解决方案7
0 2021-03-01 10:49:11

解决方案8
-1 2018-08-08 08:06:29

检查 NumPy 数组中是否存在值的最有效方法是什么？

问题描述

7 个解决方案

解决方案1 84 已采纳 2011-08-17 06:19:55

解决方案2 51 2011-08-17 06:18:51

解决方案3 43 2011-08-17 07:55:59

解决方案4 23 2016-08-10 22:41:50

解决方案5 3 2014-09-28 20:41:57

解决方案6 0 2018-11-20 11:14:10

解决方案7 0 2021-03-01 10:49:11

解决方案8 -1 2018-08-08 08:06:29

解决方案1
84 已采纳 2011-08-17 06:19:55

解决方案2
51 2011-08-17 06:18:51

解决方案3
43 2011-08-17 07:55:59

解决方案4
23 2016-08-10 22:41:50

解决方案5
3 2014-09-28 20:41:57

解决方案6
0 2018-11-20 11:14:10

解决方案7
0 2021-03-01 10:49:11

解决方案8
-1 2018-08-08 08:06:29