计算 numpy 中高维向量之间滚动距离的最快方法？

Question

I have a time series of vectors: Y = [v1, v2, ..., vn ].我有一个时间序列的向量： Y = [v1, v2, ..., vn ]。 At each time t , I want to compute the distance between vector t and the average of the vectors before t .在每次t ，我想计算向量t与t之前向量的平均值之间的距离。 So for example, at t=3 I want to compute the cosine distance between v3 and (v1+v2)/2 .例如，在t=3时，我想计算v3和(v1+v2)/2之间的余弦距离。

I have a script to do it but wondering if there's any way to do this faster via numpy's convolve feature or something like that?我有一个脚本可以做到这一点，但想知道是否有任何方法可以通过 numpy 的卷积功能或类似的东西更快地做到这一点？

import numpy
from scipy.spatial.distance import cosine
np.random.seed(10)

# Generate `T` vectors of dimension `vector_dim`
# NOTE: In practice, the vector is a very large column vector! 
T = 3
vector_dim = 2
y = [np.random.rand(1, vector_dim)[0] for t in range(T)]


def moving_distance(v):
  moving_dists = []
  for t in range(len(v)):
    if t == 0: 
      pass
    else:
      # Create moving average of values up until time t
      prior_vals = v[:t]
      m_avg = np.add.reduce(prior_vals) / len(prior_vals) 
      # Now compute distance between this moving average and vector t
      moving_dists.append(cosine(m_avg, v[t]))
  return moving_dists

d = moving_distance(y)

For this dataset, it should return: [0.3337342770170698, 0.0029993196890111262]对于这个数据集，它应该返回： [0.3337342770170698, 0.0029993196890111262]

Answer 1

TL;DR TL;博士

This is a much faster approach using NumPy (speedups above ~100x for even modest input sizes like 64x16):这是使用 NumPy 的一种更快的方法（即使是 64x16 等中等输入尺寸，速度也可以提高到 100 倍以上）：

import numpy as np


def cos_dist(a, b, axis=None):
    ab = np.sum(a * b, axis=axis)
    aa = np.sum(a * a, axis=axis)
    bb = np.sum(b * b, axis=axis)
    return 1 - (ab / np.sqrt(aa * bb))


def moving_dist_cumsum_np(arr, dist=cos_dist):
    return dist(np.cumsum(arr, axis=0)[:-1], arr[1:], axis=1)

which uses a custom definition of cosine distance and is much more efficient than OP's approach as it is fully vectorized.它使用余弦距离的自定义定义，并且比 OP 的方法更有效，因为它是完全矢量化的。

A slightly faster and more memory efficient ( O(1) instead of O(n) ) approach involves using Numba-accelerated explicit looping:一种稍快且更高效的 memory（ O(1)而不是O(n) ）方法涉及使用 Numba 加速显式循环：

import numba as nb


@nb.njit
def cos_dist_nb(a, b):
    a = a.ravel()
    b = b.ravel()
    ab = aa = bb = 0
    n = len(a)
    for i in range(n):
        ab += a[i] * b[i]
        aa += a[i] * a[i]
        bb += b[i] * b[i]
    return 1 - (ab / (aa * bb) ** 0.5)


@nb.njit
def moving_dist_nb(arr, dist=cos_dist_nb):
    n, m = arr.shape
    result = np.empty(n - 1)
    moving = np.zeros(m)
    for i in range(n - 1):
        moving += arr[i, :]
        result[i] = dist(moving, arr[i + 1, :])
    return result

Long Answer长答案

The computation delineated in the OP can be further speed up with various optimizations. OP 中描述的计算可以通过各种优化进一步加速。

OP's code is significantly more complex than needed. OP 的代码比需要的要复杂得多。

Let us start with an adaptation that essentially just:让我们从一个基本上只是的改编开始：

renames the main input重命名主输入
exposes the dist function公开dist function
returns a NumPy array返回一个 NumPy 数组
replaces len(prior_vals) with t as it is the same value by construction用t替换len(prior_vals)因为它是相同的构造值

def moving_dist_OP(arr, dist=sp.spatial.distance.cosine):
    moving_dists = []
    for t in range(len(arr)):
        if t == 0:
            pass
        else:
            # Create moving average of values up until time t
            prior_vals = arr[:t]
            m_avg = np.add.reduce(prior_vals) / t 
            # Now compute distance between this moving average and vector t
            moving_dists.append(dist(m_avg, arr[t]))
    return np.array(moving_dists)

Now, this can be further simplified to this:现在，这可以进一步简化为：

def moving_dist_simpler(arr, dist=sp.spatial.distance.cosine):
    return np.array([dist(np.add.reduce(arr[:t]), arr[t]) for t in range(1, len(arr))])

On the provision that:关于规定：

the loop appending can be rewritten as a list comprehension循环附加可以重写为列表理解
the range can be made to start from 1 rather than skipping范围可以从 1 开始而不是跳过
the division by the length (a non-negative number) can be factored out in the cosine distance除以长度（非负数）可以在余弦距离中分解

This last observation stems from the definition of the cosine distance for two vectors a and b of identical size, where a. b最后一个观察源于两个相同大小的向量a和b的余弦距离的定义，其中a. b a. b is the dot product of a and b and |a| = √(a. a) a. b是a和b和|a| = √(a. a)的点积|a| = √(a. a) is the norm induced by said dot product: |a| = √(a. a)是由所述点积导出的范数：

cos_dist(a, b) = 1 - (a . b) / (|a| |b|)

if a is replaced with k * a with k > 0 (and |k| is the absolute value of k ), this becomes:如果a被k > 0的k * a替换（并且|k|是k的绝对值），则变为：

     1 - ((k * a) . b) / (|k * a| |b|)
 ->  1 - (k * (a . b)) / (|k| |a| |b|)
 ->  1 - sign(k) * (a . b) / (|a| |b|)
 ->  1 - (a . b) / (|a| |b|)

The np.add.reduce() computation is not very efficient because its values at the next iteration could be computed in terms of the result from the previous iteration, but instead at each iteration an increasing number of numbers are summed up together to perform the computation. np.add.reduce()计算效率不是很高，因为它在下一次迭代中的值可以根据上一次迭代的结果来计算，而是在每次迭代中将越来越多的数字相加来执行计算。 Instead, re-written with partial sums, this becomes:相反，用部分和重写，这变成：

def moving_dist_part(arr, dist=sp.spatial.distance.cosine):
    n, m = arr.shape
    moving_dists = []
    moving = np.zeros(m)
    for i in range(n - 1):
        moving += arr[i, :]
        moving_dists.append(dist(moving, arr[i + 1]))
    return np.array(moving_dists)

It has been already noted (in @MechanicPig's answer ) that the np.add.reduce() computation can also be rewritten with np.cumsum() , which is also more efficient than np.add.reduce() and of similar efficiency as the partial sum, but it uses more temporary memory ( O(n) for np.cumsum() versus O(1) for partial sums):已经注意到（在@MechanicPig的回答中） np.add.reduce()计算也可以用np.cumsum()重写，这也比np.add.reduce()更有效，效率与部分总和，但它使用更多临时 memory （ O(n)用于np.cumsum()而O(1)用于部分和）：

def moving_dist_cumsum(arr, dist=sp.spatial.distance.cosine):
    movings = np.cumsum(arr, axis=0)[:-1]
    return np.array([dist(moving, arr[i]) for i, moving in enumerate(movings, 1)])

It is beneficial to rewrite this either fully vectorized or with simpler loops to be accelerated with Numba.完全矢量化或使用更简单的循环重写它以使用 Numba 加速是有益的。

For the fully vectorized version, np.cumsum() is very helpful as it provides some of the partial computation in vector form.对于完全向量化的版本， np.cumsum()非常有用，因为它以向量形式提供了一些部分计算。

Unfortunately, scipy.spatial.distance.cosine() does not accept higher dimensional input.不幸的是， scipy.spatial.distance.cosine()不接受更高维度的输入。

However, based on its definition, it is relatively simple to write a vectorized version of the cosine distance:但是，根据它的定义，写一个余弦距离的向量化版本相对简单：

def cos_dist(a, b, axis=None):
    ab = np.sum(a * b, axis=axis)
    aa = np.sum(a * a, axis=axis)
    bb = np.sum(b * b, axis=axis)
    return 1 - (ab / np.sqrt(aa * bb))

With this, one can define a fully vectorized approach:有了这个，可以定义一种完全矢量化的方法：

def moving_dist_cumsum_np(arr, dist=cos_dist):
    return dist(np.cumsum(arr, axis=0)[:-1], arr[1:], axis=1)

Note that the new definition of the cosine distance can be used just about anywhere else scipy.spatial.distance.cosine() was used, eg:请注意，余弦距离的新定义几乎可以在任何其他使用scipy.spatial.distance.cosine()的地方使用，例如：

def moving_dist_cumsum2(arr, dist=cos_dist):
    movings = np.cumsum(arr, axis=0)[:-1]
    return np.array([dist(moving, arr[i]) for i, moving in enumerate(movings, 1)])

However, the vectorized version still has the shortcoming of requiring a potentially large ( O(n) ) temporary object to store the result of np.cumsum() .但是，矢量化版本仍然存在需要潜在大（ O(n) ）临时 object 来存储np.cumsum()的结果的缺点。

Fortunately, with a little more adaptation it is possible to write a Numba-accelerated version of this (similar to moving_dist_part() ) that does require only O(1) temporary memory:幸运的是，通过更多的调整，可以编写一个 Numba 加速版本（类似于moving_dist_part() ），它只需要O(1)临时 memory：

import numba as nb


@nb.njit
def cos_dist_nb(a, b):
    a = a.ravel()
    b = b.ravel()
    ab = aa = bb = 0
    n = len(a)
    for i in range(n):
        ab += a[i] * b[i]
        aa += a[i] * a[i]
        bb += b[i] * b[i]
    return 1 - (ab / (aa * bb) ** 0.5)


@nb.njit
def moving_dist_nb(arr, dist=cos_dist_nb):
    n, m = arr.shape
    result = np.empty(n - 1)
    moving = np.zeros(m)
    for i in range(n - 1):
        moving += arr[i, :]
        result[i] = dist(moving, arr[i + 1, :])
    return result

The above approaches can be benchmarked and plotted with the following (where smaller inputs are tested multiple times for more stable results):可以使用以下方法对上述方法进行基准测试和绘图（其中多次测试较小的输入以获得更稳定的结果）：

import pandas as pd
import matplotlib.pyplot as plt


def benchmark(
    funcs,
    args=None,
    kws=None,
    ii=range(4, 15),
    m=16,
    kk=1024,
    is_equal=np.allclose,
    seed=0,
    unit="ms",
    verbose=True
):
    labels = [func.__name__ for func in funcs]
    units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
    args = tuple(args) if args else ()
    kws = dict(kws) if kws else {}
    assert unit in units
    np.random.seed(seed)
    timings = {}
    for i in ii:
        n = 2 ** i
        k = 1 + i * kk // n
        if verbose:
            print(f"i={i}, n={n}, m={m}, k={k}")
        arrs = np.random.random((k, n, m))
        base = np.array([funcs[0](arr, *args, **kws) for arr in arrs])
        timings[n] = []
        for func in funcs:
            res = np.array([func(arr, *args, **kws) for arr in arrs])
            is_good = is_equal(base, res)
            timed = %timeit -n 1 -r 1 -q -o [func(arr, *args, **kws) for arr in arrs]
            timing = timed.best / k
            timings[n].append(timing if is_good else None)
            if verbose:
                print(
                    f"{func.__name__:>24}"
                    f"  {is_good!s:5}"
                    f"  {timing * (10 ** units[unit]):10.3f} {unit}"
                    f"  {timings[n][0] / timing:5.1f}x")
    return timings, labels


def plot(timings, labels, xlabel="Input Size / #", unit="ms"):
    n_rows = 1
    n_cols = 3
    fig, axs = plt.subplots(n_rows, n_cols, figsize=(8 * n_cols, 6 * n_rows), squeeze=False)
    units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
    df = pd.DataFrame(data=timings, index=labels).transpose()
    
    base = df[[labels[0]]].to_numpy()
    (df * 10 ** units[unit]).plot(marker="o", xlabel=xlabel, ylabel=f"Best timing / {unit}", ax=axs[0, 0])
    (df / base * 100).plot(marker='o', xlabel=xlabel, ylabel='Relative speed /labels %', logx=True, ax=axs[0, 1])
    (base / df).plot(marker='o', xlabel=xlabel, ylabel='Speed Gain / x', ax=axs[0, 2])

    fig.patch.set_facecolor('white')

to be used as:用作：

funcs = moving_dist_OP, moving_dist_simpler, moving_dist_part, moving_dist_cumsum, moving_dist_cumsum2, moving_dist_cumsum_np, moving_dist_nb

timings, labels = benchmark(funcs, unit="ms", verbose=True)

plot(timings, labels, "Benchmarks", unit="ms")

to obtain:获得：

These results indicate that Numba approach is the fastest by far and large, but the vectorized approach is reasonably fast.这些结果表明 Numba 方法是迄今为止最快的方法，但矢量化方法相当快。 When it comes to explicit non-accelerated looping, it is still beneficial to use the custom-defined cos_dist() in place of scipy.spatial.distance.cosine() (see moving_dist_cumsum() vs moving_dist_cumsum2() ), while np.cumsum() is reasonably faster than np.add.reduce() but only marginally faster over computing the partial sum.当涉及到显式的非加速循环时，使用自定义cos_dist()代替scipy.spatial.distance.cosine()仍然是有益的（参见moving_dist_cumsum()与moving_dist_cumsum2() ），而np.cumsum()比np.add.reduce()快得多，但在计算部分总和时只快一点。 Finally, moving_dist_OP() and moving_dist_simpler() are effectively equivalent (as expected).最后， moving_dist_OP()和moving_dist_simpler()实际上是等效的（正如预期的那样）。

Answer 2

ndarray.cumsum or np.add.accumulate can be used to calculate the cumulative sum: ndarray.cumsum或np.add.accumulate可用于计算累积和：

>>> y
array([[0.77132064, 0.02075195],
       [0.63364823, 0.74880388],
       [0.49850701, 0.22479665]])
>>> y.cumsum(0)
array([[0.77132064, 0.02075195],
       [1.40496888, 0.76955583],
       [1.90347589, 0.99435248]])

Therefore, the equivalent code of the function you provide is as follows:因此，您提供的function的等效代码如下：

>>> means = y.cumsum(0)[:-1] / np.arange(1, len(y))[:, None]
>>> [cosine(avg, vec) for avg, vec in zip(means, y[1:])]
[0.3337342770170698, 0.0029993196890111262]

Referring to the implementation of cosine , the more vectorized code is as follows:参考cosine的实现，更加矢量化的代码如下：

>>> y_ = y[1:]
>>> uv = (means * y_).mean(1)
>>> uu = (means ** 2).mean(1)
>>> vv = (y_ ** 2).mean(1)
>>> np.clip(np.abs(1 - uv / np.sqrt(uu * vv)), 0, 2)
array([0.33373428, 0.00299932])

计算 numpy 中高维向量之间滚动距离的最快方法？

问题描述

2 个解决方案

解决方案1
3 2022-08-22 12:51:43

TL;DR TL;博士

Long Answer长答案

解决方案2
2 2022-08-20 15:56:55

计算 numpy 中高维向量之间滚动距离的最快方法？

问题描述

2 个解决方案

解决方案1 3 2022-08-22 12:51:43

TL;DR TL;博士

Long Answer长答案

解决方案2 2 2022-08-20 15:56:55

解决方案1
3 2022-08-22 12:51:43

解决方案2
2 2022-08-20 15:56:55