[英]how to vectorize a matrix sum in a for loop using numpy?
Basically I have a matrix with rows=3600 and columns=5 and wish to downsample it to parcels of 60 rows: 基本上,我有一个行= 3600和列= 5的矩阵,希望将其降采样为60行的包裹:
import numpy as np
X = np.random.rand(3600,5)
down_sample = 60
ds_rng = range(0,X.shape[0],down_sample)
X_ds = np.zeros((ds_rng.__len__(),X.shape[1]))
i = 0
for j in ds_rng:
X_ds[i,:] = np.sum( X[j:j+down_sample,:], axis=0 )
i += 1
Another way to do this might be: 另一种方法是:
def blockwise_sum(X, down_sample=60):
n, m = X.shape
ds_n = n / down_sample
N = ds_n * down_sample
if N == n:
return np.sum(X.reshape(-1, down_sample, m), axis=1)
X_ds = np.zeros((ds_n + 1, m))
X_ds[:ds_n] = np.sum(X[:N].reshape(-1, down_sample, m), axis=1)
X_ds[-1] = np.sum(X[N:], axis=0)
return X_ds
I don't know if it's any faster though. 我不知道它是否更快。
At least in this case, einsum
is faster than sum
. 至少在这种情况下,
einsum
比sum
快。
np.einsum('ijk->ik',x.reshape(-1,down_sample,x.shape[1]))
is 2x faster than blockwise_sum
. 比
blockwise_sum
快2倍。
My timings: 我的时间:
OP iterative - 1.59 ms
with strided - 198 us
blockwise_sum - 179 us
einsum - 76 us
Looks like you can use some stride tricks to get the job done. 看起来您可以使用一些大步诀窍来完成工作。
Here's the setup code we'll need: 这是我们需要的设置代码:
import numpy as np
X = np.random.rand(1000,5)
down_sample = 60
And now we trick numpy into thinking X
is split into parcels: 现在,我们欺骗numpy认为
X
被拆分为包裹:
num_parcels = int(np.ceil(X.shape[0] / float(down_sample)))
X_view = np.lib.stride_tricks.as_strided(X, shape=(num_parcels,down_sample,X.shape[1]))
X_ds = X_view.sum(axis=1) # sum over the down_sample axis
Finally, if your downsampling interval doesn't exactly divide your rows evenly, you'll need to fix up the last row in X_ds
, because the stride trick we pulled made it wrap back around. 最后,如果您的下采样间隔没有完全均匀地划分行,则需要修复
X_ds
的最后一行,因为我们拉出的跨度技巧使它回绕。
rem = X.shape[0] % down_sample
if rem != 0:
X_ds[-1] = X[-rem:].sum(axis=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.