numpy 中任意 function 的逐行广播

Question

I have a matrix of vectors where each row is a vector.我有一个向量矩阵，其中每一行都是一个向量。 I want to take the mean of all the vectors, then calculate the cosine distance between each vector and this mean, returning an array of distances.我想取所有向量的平均值，然后计算每个向量与这个平均值之间的余弦距离，返回一个距离数组。

>>> x = arange(1,10).reshape(3,3)
array([[1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]])
>>> m = x.mean(0)
array([4., 5., 6.])

The cosine values are as follows余弦值如下

>>> from scipy.spatial.distance import cosine
cosine([1,2,3], [4,5,6])
0.0253681538029239
>>> cosine([4,5,6], [4,5,6])
0.0
>>> cosine([7,8,9], [4,5,6])
0.001809107314273195

Therefore I want to write a function f such that因此我想写一个 function f这样

>>> f(x, m)
array([0.0253681538029239, 0.0, 0.001809107314273195])

(Or the transpose of such an array. It doesn't matter.) （或者这样一个数组的转置。没关系。）

What is the most efficient, most numpythonic way to write f ?写f的最有效、最 numpythonic 的方式是什么？ It seems like the trick is to get the proper broadcast over the cosine function, but I haven't figured out how to do this.似乎诀窍是通过cosine function 获得正确的广播，但我还没有弄清楚如何做到这一点。 The following doesn't work.以下不起作用。

>>> from numpy import frompyfunc
>>> f = frompyfunc(cosine, 2, 1)
>>> f(x, m)
array([[0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0]], dtype=object)

(It looks like here numpy is applying cosine element-wise instead of row-wise.) （看起来 numpy 是按元素而不是按行应用cosine元素。）

Is there a way to do this without writing a for -loop?有没有办法在不编写for循环的情况下做到这一点？

It looks like this is possible with apply_along_axis .看起来这可以通过apply_along_axis 。

>>> from numpy import apply_along_axis
>>> from functools import partial
>>> g = partial(cosine, m)
>>> apply_along_axis(g, 1, x)
array([0.02536815, 0.        , 0.00180911])

Is this the most efficient way?这是最有效的方法吗？

Answer 1

You need to reshape your mean array to be 2D.您需要将平均数组重塑为 2D。

>>> from scipy.spatial.distance import cdist
>>> cdist(x, m.reshape(1, -1), metric='cosine')
array([[2.53681538e-02],
   [2.22044605e-16],
   [1.80910731e-03]])

Answer 2

Guess the trick would be to use cdist that works on 2D arrays in a vectorized manner to get us those cosine distances.猜猜诀窍是使用cdist以矢量化方式在 2D arrays 上工作，以获得那些余弦距离。 So, one way would be -所以，一种方法是 -

In [59]: from scipy.spatial.distance import cosine

In [61]: cdist(x,x.mean(0,keepdims=True),'cosine')
Out[61]: 
array([[2.53681538e-02],
       [2.22044605e-16],
       [1.80910731e-03]])

That keepdims lets the input to be 2D and hence makes it compatible with the cdist input requirements. keepdims允许输入为2D ，因此使其与 cdist 输入要求兼容。

numpy 中任意 function 的逐行广播

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-10-08 19:46:56

解决方案2
1 2019-10-08 19:45:57

numpy 中任意 function 的逐行广播

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-10-08 19:46:56

解决方案2 1 2019-10-08 19:45:57

解决方案1
2 已采纳 2019-10-08 19:46:56

解决方案2
1 2019-10-08 19:45:57