简体   繁体   English

检查 NumPy 数组中是否存在值的最有效方法是什么?

[英]What is the most efficient way to check if a value exists in a NumPy array?

I have a very large NumPy array我有一个非常大的 NumPy 阵列

1 40 3
4 50 4
5 60 7
5 49 6
6 70 8
8 80 9
8 72 1
9 90 7
.... 

I want to check to see if a value exists in the 1st column of the array.我想检查数组的第一列中是否存在值。 I've got a bunch of homegrown ways (eg iterating through each row and checking), but given the size of the array I'd like to find the most efficient method.我有很多本土方法(例如遍历每一行并检查),但考虑到数组的大小,我想找到最有效的方法。

Thanks!谢谢!

How about怎么样

if value in my_array[:, col_num]:
    do_whatever

Edit: I think __contains__ is implemented in such a way that this is the same as @detly's version编辑:我认为__contains__的实现方式与@detly 的版本相同

The most obvious to me would be:对我来说最明显的是:

np.any(my_array[:, 0] == value)

To check multiple values, you can use numpy.in1d(), which is an element-wise function version of the python keyword in. If your data is sorted, you can use numpy.searchsorted(): To check multiple values, you can use numpy.in1d(), which is an element-wise function version of the python keyword in. If your data is sorted, you can use numpy.searchsorted():

import numpy as np
data = np.array([1,4,5,5,6,8,8,9])
values = [2,3,4,6,7]
print np.in1d(values, data)

index = np.searchsorted(data, values)
print data[index] == values

Fascinating.迷人。 I needed to improve the speed of a series of loops that must perform matching index determination in this same way.我需要提高必须以相同方式执行匹配索引确定的一系列循环的速度。 So I decided to time all the solutions here, along with some riff's.所以我决定在这里计算所有解决方案的时间,以及一些即兴演奏。

Here are my speed tests for Python 2.7.10:这是我对 Python 2.7.10 的速度测试:

import timeit
timeit.timeit('N.any(N.in1d(sids, val))', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

18.86137104034424 18.86137104034424

timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = [20010401010101+x for x in range(1000)]')

15.061666011810303 15.061666011810303

timeit.timeit('N.in1d(sids, val)', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

11.613027095794678 11.613027095794678

timeit.timeit('N.any(val == sids)', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

7.670552015304565 7.670552015304565

timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

5.610057830810547 5.610057830810547

timeit.timeit('val == sids', setup = 'import numpy as N; val = 20010401020091; sids = N.array([20010401010101+x for x in range(1000)])')

1.6632978916168213 1.6632978916168213

timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = set([20010401010101+x for x in range(1000)])')

0.0548710823059082 0.0548710823059082

timeit.timeit('val in sids', setup = 'import numpy as N; val = 20010401020091; sids = dict(zip([20010401010101+x for x in range(1000)],[True,]*1000))')

0.054754018783569336 0.054754018783569336

Very surprising!非常令人惊讶! Orders of magnitude difference!数量级差异!

To summarize, if you just want to know whether something's in a 1D list or not:总而言之,如果您只想知道某物是否在一维列表中:

  • 19s N.any(N.in1d(numpy array)) 19s N.any(N.in1d(numpy 数组))
  • 15s x in (list) 15s x in (列表)
  • 8s N.any(x == numpy array) 8s N.any(x == numpy 阵列)
  • 6s x in (numpy array) 6s x in (numpy 数组)
  • .1s x in (set or a dictionary) .1s x in(集合或字典)

If you want to know where something is in the list as well (order is important):如果您还想知道列表中的某些内容(顺序很重要):

  • 12s N.in1d(x, numpy array) 12s N.in1d(x, numpy 阵列)
  • 2s x == (numpy array) 2s x == (numpy 数组)

Adding to @HYRY's answer in1d seems to be fastest for numpy.对于 numpy,添加到@HYRY 的答案 in1d 似乎是最快的。 This is using numpy 1.8 and python 2.7.6.这是使用 numpy 1.8 和 python 2.7.6。

In this test in1d was fastest, however 10 in a look cleaner:在这个测试中 in1d 是最快的,但是10 in a看起来更干净:

a = arange(0,99999,3)
%timeit 10 in a
%timeit in1d(a, 10)

10000 loops, best of 3: 150 µs per loop
10000 loops, best of 3: 61.9 µs per loop

Constructing a set is slower than calling in1d, but checking if the value exists is a bit faster:构造一个集合比调用 in1d,但是检查值是否存在要快一点:

s = set(range(0, 99999, 3))
%timeit 10 in s

10000000 loops, best of 3: 47 ns per loop

The most convenient way according to me is:据我所知,最方便的方法是:

(Val in X[:, col_num])

where Val is the value that you want to check for and X is the array.其中 Val 是您要检查的值, X 是数组。 In your example, suppose you want to check if the value 8 exists in your the third column.在您的示例中,假设您要检查值 8 是否存在于您的第三列中。 Simply write简单地写

(8 in X[:, 2])

This will return True if 8 is there in the third column, else False.如果第三列中有 8,这将返回 True,否则返回 False。

If you are looking for a list of integers, you may use indexing for doing the work.如果您正在寻找整数列表,您可以使用索引来完成这项工作。 This also works with nd-arrays, but seems to be slower.这也适用于 nd-arrays,但似乎更慢。 It may be better when doing this more than once.不止一次这样做可能会更好。

def valuesInArray(values, array):
    values = np.asanyarray(values)
    array = np.asanyarray(array)
    assert array.dtype == np.int and values.dtype == np.int
    
    matches = np.zeros(array.max()+1, dtype=np.bool_)
    matches[values] = True
    
    res = matches[array]
    
    return np.any(res), res
    
    
array = np.random.randint(0, 1000, (10000,3))
values = np.array((1,6,23,543,222))

matched, matches = valuesInArray(values, array)

By using numba and njit, I could get a speedup of this by ~x10.通过使用 numba 和 njit,我可以将这个速度提高 ~x10。

If you want to check whether list a is in numpy array b then use the following syntax:如果要检查列表a是否在 numpy 数组b中,请使用以下语法:

np.any(np.equal(a, b).all(axis=1))

Putting axis = 1 considering numpy array is of shape n*2考虑到 numpy 阵列的形状为n*2 ,放置axis = 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 访问存储在 NumPy 数组中的树节点的最有效方法是什么 - What is most efficient way to access nodes of a tree stored in a NumPy array 在另一个数组中找到一个 numpy 数组的行列的最有效方法是什么? - What is the most efficient way of finding the ranks of one numpy array in another? 在Numpy数组中匹配模板的最有效方法是什么? - What is the most efficient way to match templates in a Numpy array? 比较 2 个 numpy 矩阵的每个值的最有效方法是什么? - What is the most efficient way to compare every value of 2 numpy matrices? 检查 selenium 中是否存在元素的最有效方法 - Most efficient way to check if element exists in selenium 使用numpy测试一个数组中的每个元素是否存在于另一个数组中的最有效方法 - Most efficient way to test whether each element from one array exists in a another array, using numpy 索引 Numpy 矩阵的最有效方法是什么? - What is the most efficient way of indexing Numpy matrices? 反转 numpy 阵列的最有效方法 - Most efficient way to reverse a numpy array 将 numpy 数组转换为字典的最有效方法 - Most efficient way to convert numpy array to dict 将numpy数组转换为字符串的最有效方法 - Most efficient way to convert numpy array to string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM