[英]numpy.searchsorted with more than one source
Let's say that I have two arrays in the form 假设我在表单中有两个数组
a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
As you can see, the above arrays are sorted, when considered a
and b
as columns of a super array. 如您所见,当将a
和b
视为超级数组的列时,将对上述数组进行排序。
Now, I want to do a searchsorted on this array. 现在,我想对这个数组进行搜索。 For instance, if I search for (3, 7) (a = 3 and b = 7), I should get 6. 例如,如果我搜索(3,7)(a = 3和b = 7),我应该得到6。
Whenever there are duplicate values in a
, the search should continue with values in b
. 每当有重复的值a
,搜索应继续在值b
。
Is there a built-in numpy method to do it? 有没有一个内置的numpy方法来做到这一点? Or what could be the efficient way to do it, assuming that I have million entries in my array. 或者可能是有效的方法,假设我的数组中有数百万个条目。
I tried with numpy.recarray, to create one recarray with a
and b
and tried searching in it, but I am getting the following error. 我尝试使用numpy.recarray,用a
和b
创建一个recarray并尝试在其中搜索,但是我收到以下错误。
TypeError: expected a readable buffer object
Any help is much appreciated. 任何帮助深表感谢。
You could use a repeated searchsorted
from left and right: 你可以使用左右重复searchsorted
:
left, right = np.searchsorted(a, 3, side='left'), np.searchsorted(a, 3, side='right')
index = left + np.searchsorted(b[left:right], 7)
You're almost there. 你快到了。 It's just that numpy.record
(which is what I assume you used, given the error message you received) isn't really what you want; 这就是numpy.record
(这是我假设您使用的,鉴于您收到的错误消息)并不是您想要的; just create a one-item record array: 只需创建一项记录数组:
>>> a_b = numpy.rec.fromarrays((a, b))
>>> a_b
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
(4, 4), (4, 8), (5, 1), (6, 1)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
>>> numpy.searchsorted(a_b, numpy.array((3, 7), dtype=a_b.dtype))
6
It might also be useful to know that sort
and argsort
sort record arrays lexically, and there is also lexsort . 知道sort
和argsort
排序记录数组的词法也很有用,而且还有lexsort 。 An example using lexsort
: 使用lexsort
的示例:
>>> random_idx = numpy.random.permutation(range(12))
>>> a = numpy.array(a)[random_idx]
>>> b = numpy.array(b)[random_idx]
>>> sorted_idx = numpy.lexsort((b, a))
>>> a[sorted_idx]
array([0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6])
>>> b[sorted_idx]
array([1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1])
Sorting record arrays: 排序记录数组:
>>> a_b = numpy.rec.fromarrays((a, b))
>>> a_b[a_b.argsort()]
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
(4, 4), (4, 8), (5, 1), (6, 1)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
>>> a_b.sort()
>>> a_b
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
(4, 4), (4, 8), (5, 1), (6, 1)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
This works for me: 这对我有用:
>>> a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
>>> b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
>>> Z = numpy.array(zip(a, b), dtype=[('a','int'), ('b','int')])
>>> Z.searchsorted(numpy.asarray((3,7), dtype=Z.dtype))
6
I think the trick might be to make sure the argument to searchsorted has the same dtype as the array. 我认为诀窍可能是确保searchsorted的参数与数组具有相同的dtype。 When I try Z.searchsorted((3, 7))
I get a segfault. 当我尝试Z.searchsorted((3, 7))
我得到了一个段错误。
n arrays extension : n数组扩展:
import numpy as np
def searchsorted_multi(*args):
v = args[-1]
if len(v) != len(args[:-1]):
raise ValueError
l, r = 0, len(args[0])
ind = 0
for vi, ai in zip(v, args[:-1]):
l, r = [np.searchsorted(ai[l:r], vi, side) for side in ('left', 'right')]
ind += l
return ind
if __name__ == "__main__":
a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
c = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 2]
assert(searchsorted_multi(a, b, (3, 7)) == 6)
assert(searchsorted_multi(a, b, (3, 0)) == 5)
assert(searchsorted_multi(a, b, c, (6, 1, 2)) == 12)
Here's an interesting way to do it (though it's not the most efficient way, as I believe it's O(n) rather than O(log(n)) as ecatmur's answer would be; it is, however, more compact): 这是一个有趣的方法(虽然它不是最有效的方式,因为我认为它是O(n)而不是O(log(n))因为ecatmur的答案会是;但它更紧凑):
np.searchsorted(a + 1j*b, a_val + 1j*b_val)
Example: 例:
>>> a = np.array([0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6])
>>> b = np.array([1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1])
>>> np.searchsorted(a + 1j*b, 4 + 1j*8)
9
Or without numpy: 或者没有numpy:
>>> import bisect
>>> a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
>>> b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
>>> bisect.bisect_left(zip(a,b), (3,7))
6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.