简体   繁体   English

如何将 function 逐元素应用于两个 arrays?

[英]How can I apply a function element-wise to two arrays?

I have two numpy 1D arrays of strings and a function that takes two strings and generates an score based on some relations between the two input strings.我有两个 numpy 1D arrays 字符串和一个 function 接受两个字符串并根据两个输入字符串之间的某些关系生成分数。

def get_score(string1, string2):
    # compute score ...
    return score

Is there an efficient way (perhaps using numpy) to apply that function to all combinations of the two arrays to generate an array with the scores from which I could select the max score?有没有一种有效的方法(可能使用 numpy)将 function 应用于两个 arrays 的所有组合,以生成一个带有分数的数组,我可以从中获得 Z99938282F04071859941E18F16EFCF4 的分数?

With a large set of operators and ufunc, numpy can easily do this kind of element-wise computation, using a fundamental concept of broadcasting :借助大量运算符和 ufunc,numpy 可以使用broadcasting的基本概念轻松进行这种逐元素计算:

In [155]: A = np.array(['one','two','three']); B = np.array(['four','two'])

In [156]: A[:,None] == B      # compare a (3,1) array with a (2,)
Out[156]: 
array([[False, False],
       [False,  True],
       [False, False]])

But this works much better with numeric arrays.但这对数字 arrays 效果更好。 There aren't many actions that work with string arrays.使用字符串 arrays 的操作并不多。

A few of the np.char functions work with 2 arrays:一些np.char函数适用于 2 arrays:

In [159]: np.char.join(B,A[:,None])
Out[159]: 
array([['ofournfoure', 'otwontwoe'],
       ['tfourwfouro', 'ttwowtwoo'],
       ['tfourhfourrfourefoure', 'ttwohtwortwoetwoe']], dtype='<U21')

Expanding the arrays into 2d arrays (functionally the same as A[:,None] ):将 arrays 扩展为 2d arrays(功能与A[:,None]相同):

In [160]: np.meshgrid(A,B,indexing='ij')
Out[160]: 
[array([['one', 'one'],
        ['two', 'two'],
        ['three', 'three']], dtype='<U5'),
 array([['four', 'two'],
        ['four', 'two'],
        ['four', 'two']], dtype='<U4')]

np.vectorize can be used to apply broadcasting to a function that takes scalar inputs (single strings). np.vectorize可用于将广播应用于采用标量输入(单字符串)的 function。 For small arrays it tends to be slower than list comprehension, but for large arrays it scales somewhat better.对于小的 arrays 它往往比列表理解慢,但对于大的 arrays 它的扩展性更好一些。

In short, there's a lot of power in numpy for doing numeric element-wise operations, less so for strings.简而言之, numpy有很多功能可以进行数字元素操作,而字符串则更少。

You would have to iterate, for instance, if pairscore computes the score of 2 elements:例如,如果pairscore计算 2 个元素的分数,您将不得不进行迭代:

def get_score(string1, string2):
    return max([pairscore(x1, x2) for x1 in string1 for x2 in string2])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM