简体   繁体   中英

What is the fastest way to get the result of matrix < matrix in numpy?

Suppose I have a matrix M_1 of dimension (M, A) and a matrix M_2 of dimension (M, B). The result of M_1 < M_2 should be a matrix of dimension (M, B, A) where by each row in M1 is being compared with each element of the corresponding row of M_2 and give a boolean vector (or 1,0-vector) for each comparison.

For example, if I have a matrix of

M1 = [[1,2,3]
      [3,4,5]]

M2 = [[1,2],
      [3,4]]

result should be [[[False, False, False],
                   [True, False, False]],
                  [[False, False, False], 
                   [True, False, False]]]

Currently, I am using for loops which is tremendously slow when I have to repeat this operations many times (taking months). Hopefully, there is a vectorized way to do this. If not, what else can I do?

I am looking at M_1 being (500, 3000000) and M_2 being (500, 500) and repeated about 10000 times.

For NumPy arrays, extend dims with None/np.newaxis such that the first axes are aligned, while the second ones are spread that lets them be compared in an elementwise fashion. Finally do the comparsion leveraging broadcasting for a vectorized solution -

M1[:,None,:] < M2[:,:,None]

Sample run -

In [19]: M1
Out[19]: 
array([[1, 2, 3],
       [3, 4, 5]])

In [20]: M2
Out[20]: 
array([[1, 2],
       [3, 4]])

In [21]: M1[:,None,:] < M2[:,:,None]
Out[21]: 
array([[[False, False, False],
        [ True, False, False]],

       [[False, False, False],
        [ True, False, False]]])

For lists as inputs, use numpy.expand_dims and then compare -

In [42]: M1 = [[1,2,3],
    ...:       [3,4,5]]
    ...: 
    ...: M2 = [[1,2],
    ...:       [3,4]]

In [43]: np.expand_dims(M1, axis=1) < np.expand_dims(M2, axis=2)
Out[43]: 
array([[[False, False, False],
        [ True, False, False]],

       [[False, False, False],
        [ True, False, False]]])

Further boost

Further boost upon leveraging multi-core with numexpr module for large data -

In [44]: import numexpr as ne

In [52]: M1 = np.random.randint(0,9,(500, 30000))

In [53]: M2 = np.random.randint(0,9,(500, 500))

In [55]: %timeit M1[:,None,:] < M2[:,:,None]
1 loop, best of 3: 3.32 s per loop

In [56]: %timeit ne.evaluate('M1e<M2e',{'M1e':M1[:,None,:],'M2e':M2[:,:,None]})
1 loop, best of 3: 1.53 s per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM