简体   繁体   English

计算两个点阵列之间成对角度的矩阵

[英]Compute matrix of pairwise angles between two arrays of points

I have two vectors of points, x and y , shaped (n, p) and (m, p) respectively. 我有两个点矢量, xy ,分别为(n, p)(m, p) As an example: 举个例子:

x = np.array([[ 0.     , -0.16341,  0.98656],
              [-0.05937, -0.25205,  0.96589],
              [ 0.05937, -0.25205,  0.96589],
              [-0.11608, -0.33488,  0.93508],
              [ 0.     , -0.33416,  0.94252]])
y = np.array([[ 0.     , -0.36836,  0.92968],
              [-0.12103, -0.54558,  0.82928],
              [ 0.12103, -0.54558,  0.82928]])

I want to compute an (n, m) -sized matrix that contains the angles between the two points, a la this question. 我想计算一个(n, m)大小的矩阵,它包含两个点之间的角度, 这个问题。 That is, a vectorized version of: 也就是说,矢量化版本:

theta = np.array(
            [ np.arccos(np.dot(i, j) / (la.norm(i) * la.norm(j)))
                 for i in x for j in y ]
        ).reshape((n, m))

Note: n and m can be of the order of ~10000 each. 注意: nm各为~10000。

There are multiple ways to do this: 有多种方法可以做到这一点:

import numpy.linalg as la
from scipy.spatial import distance as dist

# Manually
def method0(x, y):
    dotprod_mat = np.dot(x,  y.T)
    costheta = dotprod_mat / la.norm(x, axis=1)[:, np.newaxis]
    costheta /= la.norm(y, axis=1)
    return np.arccos(costheta)

# Using einsum
def method1(x, y):
    dotprod_mat = np.einsum('ij,kj->ik', x, y)
    costheta = dotprod_mat / la.norm(x, axis=1)[:, np.newaxis]
    costheta /= la.norm(y, axis=1)
    return np.arccos(costheta)

# Using scipy.spatial.cdist (one-liner)
def method2(x, y):
    costheta = 1 - dist.cdist(x, y, 'cosine')
    return np.arccos(costheta)

# Realize that your arrays `x` and `y` are already normalized, meaning you can
# optimize method1 even more
def method3(x, y):
    costheta = np.einsum('ij,kj->ik', x, y) # Directly gives costheta, since
                                            # ||x|| = ||y|| = 1
    return np.arccos(costheta)

Timing results for (n, m) = (1212, 252): (n,m)=(1212,252)的定时结果:

>>> %timeit theta = method0(x, y)
100 loops, best of 3: 11.1 ms per loop
>>> %timeit theta = method1(x, y)
100 loops, best of 3: 10.8 ms per loop
>>> %timeit theta = method2(x, y)
100 loops, best of 3: 12.3 ms per loop
>>> %timeit theta = method3(x, y)
100 loops, best of 3: 9.42 ms per loop

The difference in timing reduces as the number of elements increases. 随着元件数量的增加,时序差异减小。 For (n, m) = (6252, 1212): 对于(n,m)=(6252,1212):

>>> %timeit -n10 theta = method0(x, y)
10 loops, best of 3: 365 ms per loop
>>> %timeit -n10 theta = method1(x, y)
10 loops, best of 3: 358 ms per loop
>>> %timeit -n10 theta = method2(x, y)
10 loops, best of 3: 384 ms per loop
>>> %timeit -n10 theta = method3(x, y)
10 loops, best of 3: 314 ms per loop

However, if you leave out the np.arccos step, ie, suppose you could manage with just costheta , and didn't need theta itself, then: 但是,如果你忽略了np.arccos步骤,即假设你只能用costheta管理,并且不需要 theta本身,那么:

>>> %timeit costheta = np.einsum('ij,kj->ik', x, y)
10 loops, best of 3: 61.3 ms per loop
>>> %timeit costheta = 1 - dist.cdist(x, y, 'cosine')
10 loops, best of 3: 124 ms per loop
>>> %timeit costheta = dist.cdist(x, y, 'cosine')
10 loops, best of 3: 112 ms per loop

This is for the case of (6252, 1212). 这是针对(6252,1212)的情况。 So actually np.arccos is taking up 80% of the time. 所以实际上np.arccos占80%的时间。 In this case I find that np.einsum is much faster than dist.cdist . 在这种情况下,我发现np.einsum快得多 dist.cdist So you definitely want to be using einsum . 所以你肯定想要使用einsum

Summary: Results for theta are largely similar, but np.einsum is fastest for me, especially when you're not extraneously computing the norms. 总结: theta结果大致相似,但np.einsum对我来说最快,特别是当你没有无关地计算规范时。 Try to avoid computing theta and working with just costheta . 尽量避免计算theta并使用costheta

Note: An important point I didn't mention is that finiteness of floating-point precision can cause np.arccos to give nan values. 注意:我没有提到的一个重点是浮点精度的有限性会导致np.arccos给出nan值。 method[0:3] worked for values of x and y that hadn't been properly normalized, naturally. method[0:3]自然地适用于未正确归一化的xy值。 But method3 gave a few nan s. method3给了几个nan秒。 I fixed this with pre-normalization, which naturally destroys any gain in using method3 , unless you need to do this computation many many times for a small set of pre-normalized matrices (for whatever reason). 我用预标准化来修复它,这自然会破坏使用method3任何增益,除非你需要对一method3预标准化矩阵进行多次计算(无论出于何种原因)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算 numpy 中两个向量之间的成对差异? - Compute pairwise differences between two vectors in numpy? 修正Python中两点之间的角度 - Correcting angles between two points in Python 两个3D点的欧拉角和旋转矩阵 - Euler Angles and Rotation Matrix from two 3D points 计算TensorFlow中两个输入集合的每对之间的成对距离 - Compute the pairwise distance between each pair of the two collections of inputs in TensorFlow 有效地计算两个数据集之间的成对半正弦距离 - NumPy / Python - Efficiently compute pairwise haversine distances between two datasets - NumPy / Python 如何测量两组点之间的成对距离? - How to measure pairwise distances between two sets of points? 计算两个矩阵/数组之间的相似度(百分比) - Compute Similarity(percentage) between two Matrix/Array 如何计算一个文本文件中单词之间的成对余弦相似度矩阵 - How to compute pairwise cosine similarity matrix between words in one text file 计算 numpy 数组和 csr_matrix 之间成对最小值的最有效方法 - Most effective way to compute the pairwise minimum between a numpy array and a csr_matrix 为NumPy数组有效地计算成对相等 - Efficiently compute pairwise equal for NumPy arrays
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM