[英]Slicing NumPy array given start and end indices for generic dimensions
[英]Vectorized solution to filling a 1-D numpy array given a list of start and end indices for slicing?
給定一維零的數組,稱為a:
In [38]: a
Out[38]: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
我想用某些值填充某些索引。 我有一個開始索引和結束索引以及相關值的列表,應該在這些位置填充這些值。 這存儲在一個列表中:fill_oneDim_array
[[1, 3, 500], [5, 7, 1000], [9, 15, 200]]
例如:[1、3、500],這樣填充數組a; a [1:3] =500。以[5:7] = 1000重復[5,7,100]。
有矢量解決方案嗎? 我想盡可能避免for循環。
到目前為止,我的研究是:-據我所知,似乎沒有明顯的解決方案。
下面是中提到的招啟發了量化方法this post
-
def fillval(a, fill):
info = np.asarray(fill)
start, stop, val = info.T
id_arr = np.zeros(len(a), dtype=int)
id_arr[start] = 1
id_arr[stop] = -1
a[id_arr.cumsum().astype(bool)] = np.repeat(val, stop - start)
return a
樣品運行-
In [676]: a = np.zeros(20, dtype=int)
...: fill = [[1, 3, 500], [5, 7, 1000], [9, 15, 200]]
In [677]: fillval(a, fill)
Out[677]:
array([ 0, 500, 500, 0, 0, 1000, 1000, 0, 0, 200, 200,
200, 200, 200, 200, 0, 0, 0, 0, 0])
修改/優化版本
可以對其進行進一步修改/優化,以在輸入時以最少的內存占用量完成所有操作,例如-
def fillval(a, fill):
fill = np.asarray(fill)
start, stop, val = fill[:,0], fill[:,1], fill[:,2]
a[start] = val
a[stop] = -val
return a.cumsum()
樣品運行-
In [830]: a = np.zeros(20, dtype=int)
...: fill = [[1, 3, 500], [5, 7, 1000], [9, 15, 200]]
In [831]: fillval(a, fill)
Out[831]:
array([ 0, 500, 500, 0, 0, 1000, 1000, 0, 0, 200, 200,
200, 200, 200, 200, 0, 0, 0, 0, 0])
其他方法-
# Loopy one
def loopy(a, fill):
for start,stop,val in fill:
a[start:stop] = val
return a
# @Paul Panzer's soln
def multifill(target, spec):
spec = np.asarray(spec)
inds = np.zeros((2*len(spec) + 2,), dtype=int)
inds[-1] = len(target)
inds[1:-1] = spec[:, :2].astype(int).ravel()
lens = np.diff(inds)
mask = np.repeat((np.arange(len(lens), dtype=np.uint8)&1).view(bool), lens)
target[mask] = np.repeat(spec[:, 2], lens[1::2])
return target
時間-
案例#1:緊密排列的短組
In [912]: # Setup inputs with group lengths at maximum extent of 10
...: L = 10000 # decides number of groups
...: np.random.seed(0)
...: s0 = np.random.randint(0,9,(L)) + 20*np.arange(L)
...: s1 = s0 + np.random.randint(2,10,(len(s0)))
...: fill = np.c_[s0,s1, np.random.randint(0,9,(len(s0)))].tolist()
...: len_a = fill[-1][1]+1
...: a0 = np.zeros(len_a, dtype=int)
...: a1 = a0.copy()
...: a2 = a0.copy()
In [913]: %timeit loopy(a0, fill)
...: %timeit multifill(a1, fill)
...: %timeit fillval(a2, fill)
100 loops, best of 3: 4.26 ms per loop
100 loops, best of 3: 4.49 ms per loop
100 loops, best of 3: 3.34 ms per loop
In [914]: # Setup inputs with group lengths at maximum extent of 10
...: L = 100000 # decides number of groups
In [915]: %timeit loopy(a0, fill)
...: %timeit multifill(a1, fill)
...: %timeit fillval(a2, fill)
10 loops, best of 3: 43.2 ms per loop
10 loops, best of 3: 49.4 ms per loop
10 loops, best of 3: 38.2 ms per loop
案例#2:間隔較大的長組
In [916]: # Setup inputs with group lengths at maximum extent of 10
...: L = 10000 # decides number of groups
...: np.random.seed(0)
...: s0 = np.random.randint(0,9,(L)) + 100*np.arange(L)
...: s1 = s0 + np.random.randint(10,50,(len(s0)))
...: fill = np.c_[s0,s1, np.random.randint(0,9,(len(s0)))].tolist()
...: len_a = fill[-1][1]+1
...: a0 = np.zeros(len_a, dtype=int)
...: a1 = a0.copy()
...: a2 = a0.copy()
In [917]: %timeit loopy(a0, fill)
...: %timeit multifill(a1, fill)
...: %timeit fillval(a2, fill)
100 loops, best of 3: 4.51 ms per loop
100 loops, best of 3: 9.18 ms per loop
100 loops, best of 3: 5.16 ms per loop
In [921]: # Setup inputs with group lengths at maximum extent of 10
...: L = 100000 # decides number of groups
In [922]: %timeit loopy(a0, fill)
...: %timeit multifill(a1, fill)
...: %timeit fillval(a2, fill)
10 loops, best of 3: 44.9 ms per loop
10 loops, best of 3: 89 ms per loop
10 loops, best of 3: 58.3 ms per loop
因此,選擇最快的一組取決於用例,尤其取決於典型的組長度及其在輸入陣列中的分布。
您可以使用np.repeat
構建遮罩並填充值:
import numpy as np
def multifill(target, spec):
inds = np.zeros((2*len(spec) + 2,), dtype=int)
inds[-1] = len(target)
inds[1:-1] = spec[:, :2].astype(int).ravel()
lens = np.diff(inds)
mask = np.repeat((np.arange(len(lens), dtype=np.uint8)&1).view(bool), lens)
target[mask] = np.repeat(spec[:, 2], lens[1::2])
target = np.zeros((16,))
spec = np.array([[1, 3, 500], [5, 7, 1000], [9, 15, 200]])
multifill(target, spec)
print(target)
# [ 0. 500. 500. 0. 0. 1000. 1000. 0. 0. 200.
# 200. 200. 200. 200. 200. 0.]
基准。 Divakar2最快,但是它要求模板全為零。 PP和Divakar1更靈活。 更新:這些都通過簡單的循環“謝謝” @hpaulj蒸發了。
# hpaulj 0.00256890 ms
# pp 0.01587310 ms
# D1 0.01193481 ms
# D2 0.00533720 ms
# n=100000
# hpaulj 0.03514440 ms
# pp 0.57968440 ms
# D1 0.87605349 ms
# D2 0.34365610 ms
# n=1000000
# hpaulj 0.50301510 ms
# pp 6.91325230 ms
# D1 8.96669030 ms
# D2 3.97435970 ms
碼:
import numpy as np
import types
from timeit import timeit
def f_hpaulj(target, spec):
for s, e, v in spec:
target[int(s):int(e)] = v
def f_pp(target, spec):
inds = np.zeros((2*len(spec) + 2,), dtype=int)
inds[-1] = len(target)
inds[1:-1:2] = spec[:, 0].astype(int)
inds[2:-1:2] = spec[:, 1].astype(int)
lens = np.diff(inds)
mask = np.repeat((np.arange(len(lens), dtype=np.uint8)&1).view(bool), lens)
target[mask] = np.repeat(spec[:, 2], lens[1::2])
def f_D1(a, info):
start, stop, val = info[:,0].astype(int), info[:,1].astype(int), info[:,2]
id_arr = np.zeros(len(a), dtype=int)
id_arr[start] = 1
id_arr[stop] = -1
a[id_arr.cumsum().astype(bool)] = np.repeat(val, stop - start)
def f_D2(a, info):
start, stop, val = info[:,0].astype(int), info[:,1].astype(int), info[:,2]
id_arr = np.zeros(len(a), dtype=val.dtype)
id_arr[start] = val
id_arr[stop] = -val
return id_arr.cumsum()
def setup_data(n, k):
inds = np.sort(np.random.randint(0, n-2*k, (2*k,)) + np.arange(2*k))
return np.c_[inds.reshape(-1, 2), np.random.randint(1, 10, (k,))].astype(float)
for n in (100, 100000, 1000000):
k = 3**(n.bit_length()>>3)
spec = setup_data(n, k)
ref = np.zeros((n,))
f_pp(ref, spec)
print(f"n={n}")
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
try:
res = np.zeros((n,))
ret = func(res, spec)
if not ret is None:
res = ret
assert np.allclose(ref, res)
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'f(a, spec)', 'a=np.zeros((n,))',
globals={'f':func, 'spec':spec, 'np':np, 'n':n}, number=10)*100))
except Exception:
print("{:16s} apparently failed".format(name[2:]))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.