[英]Fastest way to compute a rolling distance between high-dimensional vectors in numpy?
I have a time series of vectors: Y = [v1, v2, ..., vn
].我有一个时间序列的向量:
Y = [v1, v2, ..., vn
]。 At each time t
, I want to compute the distance between vector t
and the average of the vectors before t
.在每次
t
,我想计算向量t
与t
之前向量的平均值之间的距离。 So for example, at t=3
I want to compute the cosine distance between v3
and (v1+v2)/2
.例如,在
t=3
时,我想计算v3
和(v1+v2)/2
之间的余弦距离。
I have a script to do it but wondering if there's any way to do this faster via numpy's convolve feature or something like that?我有一个脚本可以做到这一点,但想知道是否有任何方法可以通过 numpy 的卷积功能或类似的东西更快地做到这一点?
import numpy
from scipy.spatial.distance import cosine
np.random.seed(10)
# Generate `T` vectors of dimension `vector_dim`
# NOTE: In practice, the vector is a very large column vector!
T = 3
vector_dim = 2
y = [np.random.rand(1, vector_dim)[0] for t in range(T)]
def moving_distance(v):
moving_dists = []
for t in range(len(v)):
if t == 0:
pass
else:
# Create moving average of values up until time t
prior_vals = v[:t]
m_avg = np.add.reduce(prior_vals) / len(prior_vals)
# Now compute distance between this moving average and vector t
moving_dists.append(cosine(m_avg, v[t]))
return moving_dists
d = moving_distance(y)
For this dataset, it should return: [0.3337342770170698, 0.0029993196890111262]
对于这个数据集,它应该返回:
[0.3337342770170698, 0.0029993196890111262]
This is a much faster approach using NumPy (speedups above ~100x for even modest input sizes like 64x16):这是使用 NumPy 的一种更快的方法(即使是 64x16 等中等输入尺寸,速度也可以提高到 100 倍以上):
import numpy as np
def cos_dist(a, b, axis=None):
ab = np.sum(a * b, axis=axis)
aa = np.sum(a * a, axis=axis)
bb = np.sum(b * b, axis=axis)
return 1 - (ab / np.sqrt(aa * bb))
def moving_dist_cumsum_np(arr, dist=cos_dist):
return dist(np.cumsum(arr, axis=0)[:-1], arr[1:], axis=1)
which uses a custom definition of cosine distance and is much more efficient than OP's approach as it is fully vectorized.它使用余弦距离的自定义定义,并且比 OP 的方法更有效,因为它是完全矢量化的。
A slightly faster and more memory efficient ( O(1)
instead of O(n)
) approach involves using Numba-accelerated explicit looping:一种稍快且更高效的 memory(
O(1)
而不是O(n)
)方法涉及使用 Numba 加速显式循环:
import numba as nb
@nb.njit
def cos_dist_nb(a, b):
a = a.ravel()
b = b.ravel()
ab = aa = bb = 0
n = len(a)
for i in range(n):
ab += a[i] * b[i]
aa += a[i] * a[i]
bb += b[i] * b[i]
return 1 - (ab / (aa * bb) ** 0.5)
@nb.njit
def moving_dist_nb(arr, dist=cos_dist_nb):
n, m = arr.shape
result = np.empty(n - 1)
moving = np.zeros(m)
for i in range(n - 1):
moving += arr[i, :]
result[i] = dist(moving, arr[i + 1, :])
return result
The computation delineated in the OP can be further speed up with various optimizations. OP 中描述的计算可以通过各种优化进一步加速。
OP's code is significantly more complex than needed. OP 的代码比需要的要复杂得多。
Let us start with an adaptation that essentially just:让我们从一个基本上只是的改编开始:
dist
functiondist
functionlen(prior_vals)
with t
as it is the same value by constructiont
替换len(prior_vals)
因为它是相同的构造值def moving_dist_OP(arr, dist=sp.spatial.distance.cosine):
moving_dists = []
for t in range(len(arr)):
if t == 0:
pass
else:
# Create moving average of values up until time t
prior_vals = arr[:t]
m_avg = np.add.reduce(prior_vals) / t
# Now compute distance between this moving average and vector t
moving_dists.append(dist(m_avg, arr[t]))
return np.array(moving_dists)
Now, this can be further simplified to this:现在,这可以进一步简化为:
def moving_dist_simpler(arr, dist=sp.spatial.distance.cosine):
return np.array([dist(np.add.reduce(arr[:t]), arr[t]) for t in range(1, len(arr))])
On the provision that:关于规定:
This last observation stems from the definition of the cosine distance for two vectors a
and b
of identical size, where a. b
最后一个观察源于两个相同大小的向量
a
和b
的余弦距离的定义,其中a. b
a. b
is the dot product of a
and b
and |a| = √(a. a)
a. b
是a
和b
和|a| = √(a. a)
的点积|a| = √(a. a)
is the norm induced by said dot product: |a| = √(a. a)
是由所述点积导出的范数:
cos_dist(a, b) = 1 - (a . b) / (|a| |b|)
if a
is replaced with k * a
with k > 0
(and |k|
is the absolute value of k
), this becomes:如果
a
被k > 0
的k * a
替换(并且|k|
是k
的绝对值),则变为:
1 - ((k * a) . b) / (|k * a| |b|)
-> 1 - (k * (a . b)) / (|k| |a| |b|)
-> 1 - sign(k) * (a . b) / (|a| |b|)
-> 1 - (a . b) / (|a| |b|)
The np.add.reduce()
computation is not very efficient because its values at the next iteration could be computed in terms of the result from the previous iteration, but instead at each iteration an increasing number of numbers are summed up together to perform the computation. np.add.reduce()
计算效率不是很高,因为它在下一次迭代中的值可以根据上一次迭代的结果来计算,而是在每次迭代中将越来越多的数字相加来执行计算。 Instead, re-written with partial sums, this becomes:相反,用部分和重写,这变成:
def moving_dist_part(arr, dist=sp.spatial.distance.cosine):
n, m = arr.shape
moving_dists = []
moving = np.zeros(m)
for i in range(n - 1):
moving += arr[i, :]
moving_dists.append(dist(moving, arr[i + 1]))
return np.array(moving_dists)
It has been already noted (in @MechanicPig's answer ) that the np.add.reduce()
computation can also be rewritten with np.cumsum()
, which is also more efficient than np.add.reduce()
and of similar efficiency as the partial sum, but it uses more temporary memory ( O(n)
for np.cumsum()
versus O(1)
for partial sums):已经注意到(在@MechanicPig的回答中)
np.add.reduce()
计算也可以用np.cumsum()
重写,这也比np.add.reduce()
更有效,效率与部分总和,但它使用更多临时 memory ( O(n)
用于np.cumsum()
而O(1)
用于部分和):
def moving_dist_cumsum(arr, dist=sp.spatial.distance.cosine):
movings = np.cumsum(arr, axis=0)[:-1]
return np.array([dist(moving, arr[i]) for i, moving in enumerate(movings, 1)])
It is beneficial to rewrite this either fully vectorized or with simpler loops to be accelerated with Numba.完全矢量化或使用更简单的循环重写它以使用 Numba 加速是有益的。
For the fully vectorized version, np.cumsum()
is very helpful as it provides some of the partial computation in vector form.对于完全向量化的版本,
np.cumsum()
非常有用,因为它以向量形式提供了一些部分计算。
Unfortunately, scipy.spatial.distance.cosine()
does not accept higher dimensional input.不幸的是,
scipy.spatial.distance.cosine()
不接受更高维度的输入。
However, based on its definition, it is relatively simple to write a vectorized version of the cosine distance:但是,根据它的定义,写一个余弦距离的向量化版本相对简单:
def cos_dist(a, b, axis=None):
ab = np.sum(a * b, axis=axis)
aa = np.sum(a * a, axis=axis)
bb = np.sum(b * b, axis=axis)
return 1 - (ab / np.sqrt(aa * bb))
With this, one can define a fully vectorized approach:有了这个,可以定义一种完全矢量化的方法:
def moving_dist_cumsum_np(arr, dist=cos_dist):
return dist(np.cumsum(arr, axis=0)[:-1], arr[1:], axis=1)
Note that the new definition of the cosine distance can be used just about anywhere else scipy.spatial.distance.cosine()
was used, eg:请注意,余弦距离的新定义几乎可以在任何其他使用
scipy.spatial.distance.cosine()
的地方使用,例如:
def moving_dist_cumsum2(arr, dist=cos_dist):
movings = np.cumsum(arr, axis=0)[:-1]
return np.array([dist(moving, arr[i]) for i, moving in enumerate(movings, 1)])
However, the vectorized version still has the shortcoming of requiring a potentially large ( O(n)
) temporary object to store the result of np.cumsum()
.但是,矢量化版本仍然存在需要潜在大(
O(n)
)临时 object 来存储np.cumsum()
的结果的缺点。
Fortunately, with a little more adaptation it is possible to write a Numba-accelerated version of this (similar to moving_dist_part()
) that does require only O(1)
temporary memory:幸运的是,通过更多的调整,可以编写一个 Numba 加速版本(类似于
moving_dist_part()
),它只需要O(1)
临时 memory:
import numba as nb
@nb.njit
def cos_dist_nb(a, b):
a = a.ravel()
b = b.ravel()
ab = aa = bb = 0
n = len(a)
for i in range(n):
ab += a[i] * b[i]
aa += a[i] * a[i]
bb += b[i] * b[i]
return 1 - (ab / (aa * bb) ** 0.5)
@nb.njit
def moving_dist_nb(arr, dist=cos_dist_nb):
n, m = arr.shape
result = np.empty(n - 1)
moving = np.zeros(m)
for i in range(n - 1):
moving += arr[i, :]
result[i] = dist(moving, arr[i + 1, :])
return result
The above approaches can be benchmarked and plotted with the following (where smaller inputs are tested multiple times for more stable results):可以使用以下方法对上述方法进行基准测试和绘图(其中多次测试较小的输入以获得更稳定的结果):
import pandas as pd
import matplotlib.pyplot as plt
def benchmark(
funcs,
args=None,
kws=None,
ii=range(4, 15),
m=16,
kk=1024,
is_equal=np.allclose,
seed=0,
unit="ms",
verbose=True
):
labels = [func.__name__ for func in funcs]
units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
args = tuple(args) if args else ()
kws = dict(kws) if kws else {}
assert unit in units
np.random.seed(seed)
timings = {}
for i in ii:
n = 2 ** i
k = 1 + i * kk // n
if verbose:
print(f"i={i}, n={n}, m={m}, k={k}")
arrs = np.random.random((k, n, m))
base = np.array([funcs[0](arr, *args, **kws) for arr in arrs])
timings[n] = []
for func in funcs:
res = np.array([func(arr, *args, **kws) for arr in arrs])
is_good = is_equal(base, res)
timed = %timeit -n 1 -r 1 -q -o [func(arr, *args, **kws) for arr in arrs]
timing = timed.best / k
timings[n].append(timing if is_good else None)
if verbose:
print(
f"{func.__name__:>24}"
f" {is_good!s:5}"
f" {timing * (10 ** units[unit]):10.3f} {unit}"
f" {timings[n][0] / timing:5.1f}x")
return timings, labels
def plot(timings, labels, xlabel="Input Size / #", unit="ms"):
n_rows = 1
n_cols = 3
fig, axs = plt.subplots(n_rows, n_cols, figsize=(8 * n_cols, 6 * n_rows), squeeze=False)
units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
df = pd.DataFrame(data=timings, index=labels).transpose()
base = df[[labels[0]]].to_numpy()
(df * 10 ** units[unit]).plot(marker="o", xlabel=xlabel, ylabel=f"Best timing / {unit}", ax=axs[0, 0])
(df / base * 100).plot(marker='o', xlabel=xlabel, ylabel='Relative speed /labels %', logx=True, ax=axs[0, 1])
(base / df).plot(marker='o', xlabel=xlabel, ylabel='Speed Gain / x', ax=axs[0, 2])
fig.patch.set_facecolor('white')
to be used as:用作:
funcs = moving_dist_OP, moving_dist_simpler, moving_dist_part, moving_dist_cumsum, moving_dist_cumsum2, moving_dist_cumsum_np, moving_dist_nb
timings, labels = benchmark(funcs, unit="ms", verbose=True)
plot(timings, labels, "Benchmarks", unit="ms")
to obtain:获得:
These results indicate that Numba approach is the fastest by far and large, but the vectorized approach is reasonably fast.这些结果表明 Numba 方法是迄今为止最快的方法,但矢量化方法相当快。 When it comes to explicit non-accelerated looping, it is still beneficial to use the custom-defined
cos_dist()
in place of scipy.spatial.distance.cosine()
(see moving_dist_cumsum()
vs moving_dist_cumsum2()
), while np.cumsum()
is reasonably faster than np.add.reduce()
but only marginally faster over computing the partial sum.当涉及到显式的非加速循环时,使用自定义
cos_dist()
代替scipy.spatial.distance.cosine()
仍然是有益的(参见moving_dist_cumsum()
与moving_dist_cumsum2()
),而np.cumsum()
比np.add.reduce()
快得多,但在计算部分总和时只快一点。 Finally, moving_dist_OP()
and moving_dist_simpler()
are effectively equivalent (as expected).最后,
moving_dist_OP()
和moving_dist_simpler()
实际上是等效的(正如预期的那样)。
ndarray.cumsum
or np.add.accumulate
can be used to calculate the cumulative sum: ndarray.cumsum
或np.add.accumulate
可用于计算累积和:
>>> y
array([[0.77132064, 0.02075195],
[0.63364823, 0.74880388],
[0.49850701, 0.22479665]])
>>> y.cumsum(0)
array([[0.77132064, 0.02075195],
[1.40496888, 0.76955583],
[1.90347589, 0.99435248]])
Therefore, the equivalent code of the function you provide is as follows:因此,您提供的function的等效代码如下:
>>> means = y.cumsum(0)[:-1] / np.arange(1, len(y))[:, None]
>>> [cosine(avg, vec) for avg, vec in zip(means, y[1:])]
[0.3337342770170698, 0.0029993196890111262]
Referring to the implementation of cosine
, the more vectorized code is as follows:参考
cosine
的实现,更加矢量化的代码如下:
>>> y_ = y[1:]
>>> uv = (means * y_).mean(1)
>>> uu = (means ** 2).mean(1)
>>> vv = (y_ ** 2).mean(1)
>>> np.clip(np.abs(1 - uv / np.sqrt(uu * vv)), 0, 2)
array([0.33373428, 0.00299932])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.