简体   繁体   English

当函数包含条件时,使用Numpy将函数应用于数组

[英]Applying a function to an array using Numpy when the function contains a condition

I am having a difficulty with applying a function to an array when the function contains a condition. 当函数包含条件时,我很难将函数应用于数组。 I have an inefficient workaround and am looking for an efficient (fast) approach. 我的解决方法效率低下,正在寻找一种高效(快速)的方法。 In a simple example: 在一个简单的示例中:

pts = np.linspace(0,1,11)
def fun(x, y):
    if x > y:
        return 0
    else:
        return 1

Now, if I run: 现在,如果我运行:

result = fun(pts, pts)

then I get the error 然后我得到错误

ValueError: The truth value of an array with more than one element is ambiguous. ValueError:具有多个元素的数组的真值不明确。 Use a.any() or a.all() 使用a.any()或a.all()

raised at the if x > y line. if x > y行处引发。 My inefficient workaround, which gives the correct result but is too slow is: 我的效率不高的解决方法是:给出正确的结果但太慢了:

result = np.full([len(pts)]*2, np.nan)
for i in range(len(pts)):
    for j in range(len(pts)):
        result[i,j] = fun(pts[i], pts[j])

What is the best way to obtain this in a nicer (and more importantly, faster) way? 以更好(更重要的是,更快)的方式获得此效果的最佳方法是什么?

I am having a difficulty with applying a function to an array when the function contains a condition. 当函数包含条件时,我很难将函数应用于数组。 I have an inefficient workaround and am looking for an efficient (fast) approach. 我的解决方法效率低下,正在寻找一种高效(快速)的方法。 In a simple example: 在一个简单的示例中:

pts = np.linspace(0,1,11)
def fun(x, y):
    if x > y:
        return 0
    else:
        return 1

Now, if I run: 现在,如果我运行:

result = fun(pts, pts)

then I get the error 然后我得到了错误

ValueError: The truth value of an array with more than one element is ambiguous. ValueError:具有多个元素的数组的真值不明确。 Use a.any() or a.all() 使用a.any()或a.all()

raised at the if x > y line. if x > y行处引发。 My inefficient workaround, which gives the correct result but is too slow is: 我的效率不高的解决方法是:给出正确的结果但太慢了:

result = np.full([len(pts)]*2, np.nan)
for i in range(len(pts)):
    for j in range(len(pts)):
        result[i,j] = fun(pts[i], pts[j])

What is the best way to obtain this in a nicer (and more importantly, faster) way? 以更好(更重要的是,更快)的方式获得此效果的最佳方法是什么?

EDIT : using 编辑 :使用

def fun(x, y):
    if x > y:
        return 0
    else:
        return 1
x = np.array(range(10))
y = np.array(range(10))
xv,yv = np.meshgrid(x,y)
result = fun(xv, yv)  

still raises the same ValueError . 仍然引发相同的ValueError

The error is quite explicit - suppose you have 该错误非常明显-假设您有

x = np.array([1,2])
y = np.array([2,1])

such that 这样

(x>y) == np.array([0,1])

what should be the result of your if np.array([0,1]) statement? if np.array([0,1])语句的结果应该是什么? is it true or false? 是真的还是假的? numpy is telling you this is ambiguous. numpy告诉您这是模棱两可的。 Using 运用

(x>y).all()

or 要么

(x>y).any()

is explicit, and thus numpy is offering you solutions - either any cell pair fulfills the condition, or all of them - both an unambiguous truth value. 是明确的,因此numpy可以为您提供解决方案-任何一个单元对都满足条件,或者全部满足-两者都是明确的真实值。 You have to define for yourself exactly what you meant by vector x is larger than vector y . 您必须自己定义向量x大于向量y的含义。

The numpy solution to operate on all pairs of x and y such that x[i]>y[j] is to use mesh grid to generate all pairs: 用于对所有xy对进行操作的numpy解决方案,以使x[i]>y[j]使用网格网格生成所有对:

>>> import numpy as np
>>> x=np.array(range(10))
>>> y=np.array(range(10))
>>> xv,yv=np.meshgrid(x,y)
>>> xv[xv>yv]
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5, 6, 7, 8,
       9, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 6, 7, 8, 9, 7, 8, 9, 8, 9, 9])
>>> yv[xv>yv]
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
       2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8])

either send xv and yv to fun , or create the mesh in the function, depending on what makes more sense. 发送xvyvfun ,或者在函数中创建网格yv ,这取决于更合理的选择。 This generates all pairs xi,yj such that xi>yj . 这将生成所有对xi,yj ,使得xi>yj If you want the actual indices just return xv>yv , where each cell ij corresponds x[i] and y[j] . 如果您想要实际的索引,只需返回xv>yv ,其中每个单元格ij对应于x[i]y[j] In your case: 在您的情况下:

def fun(x, y):
    xv,yv=np.meshgrid(x,y)
    return xv>yv

will return a matrix where fun(x,y)[i][j] is True if x[i]>y[j] , or False otherwise. 如果x[i]>y[j]则返回fun(x,y)[i][j]为True或否则为False的x[i]>y[j] Alternatively 另外

return  np.where(xv>yv)

will return a tuple of two arrays of pairs of the indices, such that 将返回两个成对的索引对的数组的元组,这样

for i,j in fun(x,y):

will guarantee x[i]>y[j] as well. 也会保证x[i]>y[j]

In [253]: x = np.random.randint(0,10,5)
In [254]: y = np.random.randint(0,10,5)
In [255]: x
Out[255]: array([3, 2, 2, 2, 5])
In [256]: y
Out[256]: array([2, 6, 7, 6, 5])
In [257]: x>y
Out[257]: array([ True, False, False, False, False])
In [258]: np.where(x>y,0,1)
Out[258]: array([0, 1, 1, 1, 1])

For a cartesian comparison to these two 1d arrays, reshape one so it can use broadcasting : 为了与这两个1d数组进行笛卡尔比较,请重塑一个数组,以便可以使用broadcasting

In [259]: x[:,None]>y
Out[259]: 
array([[ True, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [ True, False, False, False, False]])
In [260]: np.where(x[:,None]>y,0,1)
Out[260]: 
array([[0, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1]])

Your function, with the if only works for scalar inputs. 您的函数(带有if仅适用于标量输入。 If given arrays, the a>b produces a boolean array, which cannot be used in an if statement. 如果给定数组,则a>b会生成一个布尔数组,该布尔数组不能在if语句中使用。 Your iteration works because it passes scalar values. 您的迭代有效,因为它传递了标量值。 For some complex functions that's the best you can do ( np.vectorize can make the iteration simpler, but not faster). 对于某些最好的复杂函数,您可以做到( np.vectorize可以使迭代更简单,但不能更快)。

My answer is to look at the array comparison, and derive the answer from that. 我的答案是看一下数组比较,然后从中得出答案。 In this case, the 3 argument where does a nice job of mapping the boolean array onto the desired 1/0. 在这种情况下,3参数where不布尔阵列映射到所期望的1/0的一个很好的工作。 There are other ways of doing this mapping as well. 还有其他方法可以执行此映射。

Your double loop requires an added layer of coding, the broadcasted None . 您的双循环需要添加一层编码,即广播的None

For a more complex example or if the arrays you are dealing with are a bit larger, or if you can write to a already preallocated array you could consider Numba . 对于更复杂的示例,或者如果要处理的数组更大,或者可以写入已经预先分配的数组,可以考虑使用Numba

Example

import numba as nb
import numpy as np

@nb.njit()
def fun(x, y):
  if x > y:
    return 0
  else:
    return 1

@nb.njit(parallel=False)
#@nb.njit(parallel=True)
def loop(x,y):
  result=np.empty((x.shape[0],y.shape[0]),dtype=np.int32)
  for i in nb.prange(x.shape[0]):
    for j in range(y.shape[0]):
      result[i,j] = fun(x[i], y[j])
  return result

@nb.njit(parallel=False)
def loop_preallocated(x,y,result):
  for i in nb.prange(x.shape[0]):
    for j in range(y.shape[0]):
      result[i,j] = fun(x[i], y[j])
  return result

Timings 计时

x = np.array(range(1000))
y = np.array(range(1000))

#Compilation overhead of the first call is neglected

res=np.where(x[:,None]>y,0,1) -> 2.46ms
loop(single_threaded)         -> 1.23ms
loop(parallel)                -> 1.0ms
loop(single_threaded)*        -> 0.27ms
loop(parallel)*               -> 0.058ms

*Maybe influenced by cache. *可能受缓存影响。 Test on your own examples. 测试您自己的示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM