numpy.searchsorted with more than one source

Question

Let's say that I have two arrays in the form

a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]

As you can see, the above arrays are sorted, when considered a and b as columns of a super array.

Now, I want to do a searchsorted on this array. For instance, if I search for (3, 7) (a = 3 and b = 7), I should get 6.

Whenever there are duplicate values in a , the search should continue with values in b .

Is there a built-in numpy method to do it? Or what could be the efficient way to do it, assuming that I have million entries in my array.

I tried with numpy.recarray, to create one recarray with a and b and tried searching in it, but I am getting the following error.

TypeError: expected a readable buffer object

Any help is much appreciated.

Answer 1

You could use a repeated searchsorted from left and right:

left, right = np.searchsorted(a, 3, side='left'), np.searchsorted(a, 3, side='right')
index = left + np.searchsorted(b[left:right], 7)

Answer 2

You're almost there. It's just that numpy.record (which is what I assume you used, given the error message you received) isn't really what you want; just create a one-item record array:

>>> a_b = numpy.rec.fromarrays((a, b))
>>> a_b
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
       (4, 4), (4, 8), (5, 1), (6, 1)], 
      dtype=[('f0', '<i8'), ('f1', '<i8')])
>>> numpy.searchsorted(a_b, numpy.array((3, 7), dtype=a_b.dtype))
6

It might also be useful to know that sort and argsort sort record arrays lexically, and there is also lexsort . An example using lexsort :

>>> random_idx = numpy.random.permutation(range(12))
>>> a = numpy.array(a)[random_idx]
>>> b = numpy.array(b)[random_idx]
>>> sorted_idx = numpy.lexsort((b, a))
>>> a[sorted_idx]
array([0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6])
>>> b[sorted_idx]
array([1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1])

Sorting record arrays:

>>> a_b = numpy.rec.fromarrays((a, b))
>>> a_b[a_b.argsort()]
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
       (4, 4), (4, 8), (5, 1), (6, 1)], 
      dtype=[('f0', '<i8'), ('f1', '<i8')])
>>> a_b.sort()
>>> a_b
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
       (4, 4), (4, 8), (5, 1), (6, 1)], 
      dtype=[('f0', '<i8'), ('f1', '<i8')])

Answer 3

This works for me:

>>> a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
>>> b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
>>> Z = numpy.array(zip(a, b), dtype=[('a','int'), ('b','int')])
>>> Z.searchsorted(numpy.asarray((3,7), dtype=Z.dtype))
6

I think the trick might be to make sure the argument to searchsorted has the same dtype as the array. When I try Z.searchsorted((3, 7)) I get a segfault.

Answer 4

n arrays extension :

import numpy as np

def searchsorted_multi(*args):
    v = args[-1]
    if len(v) != len(args[:-1]):
        raise ValueError
    l, r = 0, len(args[0])
    ind = 0
    for vi, ai in zip(v, args[:-1]):
        l, r = [np.searchsorted(ai[l:r], vi, side) for side in ('left', 'right')]
        ind += l
    return ind

if __name__ == "__main__":
    a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
    b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
    c = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 2]

    assert(searchsorted_multi(a, b, (3, 7)) == 6)
    assert(searchsorted_multi(a, b, (3, 0)) == 5)
    assert(searchsorted_multi(a, b, c, (6, 1, 2)) == 12)

Answer 5

Here's an interesting way to do it (though it's not the most efficient way, as I believe it's O(n) rather than O(log(n)) as ecatmur's answer would be; it is, however, more compact):

np.searchsorted(a + 1j*b, a_val + 1j*b_val)

Example:

>>> a = np.array([0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6])
>>> b = np.array([1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1])
>>> np.searchsorted(a + 1j*b, 4 + 1j*8)
9

Answer 6

Or without numpy:

>>> import bisect
>>> a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
>>> b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
>>> bisect.bisect_left(zip(a,b), (3,7))
6

numpy.searchsorted with more than one source

Question

6 answers

solution1
4 2012-08-08 16:06:25

solution2
3 ACCPTED 2012-08-08 16:28:42

solution3
1 2012-08-08 17:29:52

solution4
0 2012-08-08 16:10:31

solution5
0 2012-08-08 16:27:56

solution6
0 2012-08-08 16:42:52

numpy.searchsorted with more than one source

Question

6 answers

solution1 4 2012-08-08 16:06:25

solution2 3 ACCPTED 2012-08-08 16:28:42

solution3 1 2012-08-08 17:29:52

solution4 0 2012-08-08 16:10:31

solution5 0 2012-08-08 16:27:56

solution6 0 2012-08-08 16:42:52

solution1
4 2012-08-08 16:06:25

solution2
3 ACCPTED 2012-08-08 16:28:42

solution3
1 2012-08-08 17:29:52

solution4
0 2012-08-08 16:10:31

solution5
0 2012-08-08 16:27:56

solution6
0 2012-08-08 16:42:52