[英]What is the most efficient way to check if a value exists in a NumPy array?
I have a very large NumPy array我有一个非常大的 NumPy 阵列
1 40 3
4 50 4
5 60 7
5 49 6
6 70 8
8 80 9
8 72 1
9 90 7
....
I want to check to see if a value exists in the 1st column of the array.我想检查数组的第一列中是否存在值。 I've got a bunch of homegrown ways (eg iterating through each row and checking), but given the size of the array I'd like to find the most efficient method.
我有很多本土方法(例如遍历每一行并检查),但考虑到数组的大小,我想找到最有效的方法。
Thanks!谢谢!
How about怎么样
if value in my_array[:, col_num]:
do_whatever
Edit: I think __contains__
is implemented in such a way that this is the same as @detly's version编辑:我认为
__contains__
的实现方式与@detly 的版本相同
The most obvious to me would be:对我来说最明显的是:
np.any(my_array[:, 0] == value)
To check multiple values, you can use numpy.in1d(), which is an element-wise function version of the python keyword in. If your data is sorted, you can use numpy.searchsorted(): To check multiple values, you can use numpy.in1d(), which is an element-wise function version of the python keyword in. If your data is sorted, you can use numpy.searchsorted():
import numpy as np
data = np.array([1,4,5,5,6,8,8,9])
values = [2,3,4,6,7]
print np.in1d(values, data)
index = np.searchsorted(data, values)
print data[index] == values
Fascinating.迷人。 I needed to improve the speed of a series of loops that must perform matching index determination in this same way.
我需要提高必须以相同方式执行匹配索引确定的一系列循环的速度。 So I decided to time all the solutions here, along with some riff's.
所以我决定在这里计算所有解决方案的时间,以及一些即兴演奏。
Here are my speed tests for Python 2.7.10:这是我对 Python 2.7.10 的速度测试:
import timeit
timeit.timeit('N.any(N.in1d(sids, val))', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
18.86137104034424 18.86137104034424
timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = [20010401010101+x for x in range(1000)]')
15.061666011810303 15.061666011810303
timeit.timeit('N.in1d(sids, val)', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
11.613027095794678 11.613027095794678
timeit.timeit('N.any(val == sids)', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
7.670552015304565 7.670552015304565
timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
5.610057830810547 5.610057830810547
timeit.timeit('val == sids', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')
1.6632978916168213 1.6632978916168213
timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = set([20010401010101+x for x in range(1000)])')
0.0548710823059082 0.0548710823059082
timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = dict(zip([20010401010101+x for x in range(1000)],[True,]*1000))')
0.054754018783569336 0.054754018783569336
Very surprising!非常令人惊讶! Orders of magnitude difference!
数量级差异!
To summarize, if you just want to know whether something's in a 1D list or not:总而言之,如果您只想知道某物是否在一维列表中:
If you want to know where something is in the list as well (order is important):如果您还想知道列表中的某些内容(顺序很重要):
Adding to @HYRY's answer in1d seems to be fastest for numpy.对于 numpy,添加到@HYRY 的答案 in1d 似乎是最快的。 This is using numpy 1.8 and python 2.7.6.
这是使用 numpy 1.8 和 python 2.7.6。
In this test in1d was fastest, however 10 in a
look cleaner:在这个测试中 in1d 是最快的,但是
10 in a
看起来更干净:
a = arange(0,99999,3)
%timeit 10 in a
%timeit in1d(a, 10)
10000 loops, best of 3: 150 µs per loop
10000 loops, best of 3: 61.9 µs per loop
Constructing a set is slower than calling in1d, but checking if the value exists is a bit faster:构造一个集合比调用 in1d慢,但是检查值是否存在要快一点:
s = set(range(0, 99999, 3))
%timeit 10 in s
10000000 loops, best of 3: 47 ns per loop
The most convenient way according to me is:据我所知,最方便的方法是:
(Val in X[:, col_num])
where Val is the value that you want to check for and X is the array.其中 Val 是您要检查的值, X 是数组。 In your example, suppose you want to check if the value 8 exists in your the third column.
在您的示例中,假设您要检查值 8 是否存在于您的第三列中。 Simply write
简单地写
(8 in X[:, 2])
This will return True if 8 is there in the third column, else False.如果第三列中有 8,这将返回 True,否则返回 False。
If you are looking for a list of integers, you may use indexing for doing the work.如果您正在寻找整数列表,您可以使用索引来完成这项工作。 This also works with nd-arrays, but seems to be slower.
这也适用于 nd-arrays,但似乎更慢。 It may be better when doing this more than once.
不止一次这样做可能会更好。
def valuesInArray(values, array):
values = np.asanyarray(values)
array = np.asanyarray(array)
assert array.dtype == np.int and values.dtype == np.int
matches = np.zeros(array.max()+1, dtype=np.bool_)
matches[values] = True
res = matches[array]
return np.any(res), res
array = np.random.randint(0, 1000, (10000,3))
values = np.array((1,6,23,543,222))
matched, matches = valuesInArray(values, array)
By using numba and njit, I could get a speedup of this by ~x10.通过使用 numba 和 njit,我可以将这个速度提高 ~x10。
If you want to check whether list a
is in numpy array b
then use the following syntax:如果要检查列表
a
是否在 numpy 数组b
中,请使用以下语法:
np.any(np.equal(a, b).all(axis=1))
Putting axis = 1
considering numpy array is of shape n*2
考虑到 numpy 阵列的形状为
n*2
,放置axis = 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.