简体   繁体   中英

Broadcasting a python function on to numpy arrays

Let's say we have a particularly simple function like

import scipy as sp
def func(x, y):
   return x + y

This function evidently works for several builtin python datatypes of x and y like string, list, int, float, array, etc. Since we are particularly interested in arrays, we consider two arrays:

x = sp.array([-2, -1, 0, 1, 2])
y = sp.array([-2, -1, 0, 1, 2])

xx = x[:, sp.newaxis]
yy = y[sp.newaxis, :]

>>> func(xx, yy)

this returns

array([[-4, -3, -2, -1,  0],
  [-3, -2, -1,  0,  1],
  [-2, -1,  0,  1,  2],
  [-1,  0,  1,  2,  3],
  [ 0,  1,  2,  3,  4]])

just as we would expect.

Now what if one wants to throw in arrays as the inputs for the following function?

def func2(x, y):
  if x > y:
     return x + y
  else:
     return x - y

doing >>>func(xx, yy) would raise an error.

The first obvious method that one would come up with is the sp.vectorize function in scipy/numpy. This method, nevertheless has been proved to be not very efficient. Can anyone think of a more robust way of broadcasting any function in general on to numpy arrays?

If re-writing the code in an array-friendly fashion is the only way, it would help if you could mention it here too.

np.vectorize is a general way to convert Python functions that operate on numbers into numpy functions that operate on ndarrays.

However, as you point out, it isn't very fast, since it is using a Python loop "under the hood".

To achieve better speed, you have to hand-craft a function that expects numpy arrays as input and takes advantage of that numpy-ness:

import numpy as np

def func2(x, y):
    return np.where(x>y,x+y,x-y)      

x = np.array([-2, -1, 0, 1, 2])
y = np.array([-2, -1, 0, 1, 2])

xx = x[:, np.newaxis]
yy = y[np.newaxis, :]

print(func2(xx, yy))
# [[ 0 -1 -2 -3 -4]
#  [-3  0 -1 -2 -3]
#  [-2 -1  0 -1 -2]
#  [-1  0  1  0 -1]
#  [ 0  1  2  3  0]]

Regarding performance:

test.py :

import numpy as np

def func2a(x, y):
    return np.where(x>y,x+y,x-y)      

def func2b(x, y):
    ind=x>y
    z=np.empty(ind.shape,dtype=x.dtype)
    z[ind]=(x+y)[ind]
    z[~ind]=(x-y)[~ind]
    return z

def func2c(x, y):
    # x, y= x[:, None], y[None, :]
    A, L= x+ y, x<= y
    A[L]= (x- y)[L]
    return A

N=40
x = np.random.random(N)
y = np.random.random(N)

xx = x[:, np.newaxis]
yy = y[np.newaxis, :]

Running:

With N=30:

% python -mtimeit -s'import test' 'test.func2a(test.xx,test.yy)'
1000 loops, best of 3: 219 usec per loop

% python -mtimeit -s'import test' 'test.func2b(test.xx,test.yy)'
1000 loops, best of 3: 488 usec per loop

% python -mtimeit -s'import test' 'test.func2c(test.xx,test.yy)'
1000 loops, best of 3: 248 usec per loop

With N=1000:

% python -mtimeit -s'import test' 'test.func2a(test.xx,test.yy)'
10 loops, best of 3: 93.7 msec per loop

% python -mtimeit -s'import test' 'test.func2b(test.xx,test.yy)'
10 loops, best of 3: 367 msec per loop

% python -mtimeit -s'import test' 'test.func2c(test.xx,test.yy)'
10 loops, best of 3: 186 msec per loop

This seems to suggest that func2a is slightly faster than func2c (and func2b is horribly slow).

For this special case, you could also write a function that operates on both, NumPy arrays and plain Python floats:

def func2d(x, y):
    z = 2.0 * (x > y) - 1.0
    z *= y
    return x + z

This version is also more than four times as fast as unutbu's func2a() (tested with N = 100 ).

Just to get the basic idea, you may modify your function, for example this kind of way:

def func2(x, y):
    x, y= x[:, None], y[None, :]
    A= x+ y
    A[x<= y]= (x- y)[x<= y]
    return A

Thus with your case, something like this should be a very reasonable starting point:

In []: def func(x, y):
   ..:     x, y= x[:, None], y[None, :]
   ..:     return x+ y
   ..:
In []: def func2(x, y):
   ..:     x, y= x[:, None], y[None, :]
   ..:     A, L= x+ y, x<= y
   ..:     A[L]= (x- y)[L]
   ..:     return A
   ..:
In []: x, y= arange(-2, 3), arange(-2, 3)
In []: func(x, y)
Out[]:
array([[-4, -3, -2, -1,  0],
       [-3, -2, -1,  0,  1],
       [-2, -1,  0,  1,  2],
       [-1,  0,  1,  2,  3],
       [ 0,  1,  2,  3,  4]])
In []: func2(x, y)
Out[]:
array([[ 0, -1, -2, -3, -4],
       [-3,  0, -1, -2, -3],
       [-2, -1,  0, -1, -2],
       [-1,  0,  1,  0, -1],
       [ 0,  1,  2,  3,  0]])

Although this kind of processing may seem to waste resources, it's not necessarily the case. Always measure your programs actual performance and make then (not earlier) necessary changes.

IMHO for an additional advantage: this kind of 'vectorization' makes your code really consistent and readable eventually.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM