[英]NumPy: Selecting n points every m points
If I have a numpy.ndarray
that's, say, 300 points in size (1 x 300 for now), and I wanted to select 10 points every 30 points, how would I do that? 如果我有一个numpy.ndarray
,比方说300点大小(现在是1 x 300),我想每30分选择10分,我该怎么做?
In other words: I want the first 10 points, then skip 20, then grab 10 more, and then skip 10... until the end of the array. 换句话说:我想要前10个点,然后跳过20个,然后再抓10个,然后跳过10 ......直到数组结束。
To select 10
elements off each block of 30
elements, we can simply reshape into 2D
and slice out the first 10
columns from each row - 要从30
10
元素的每个块中选择10
30
元素,我们可以简单地重塑为2D
并从每行切出前10
列 -
a.reshape(-1,30)[:,:10]
The benefit is the output would be a view into the input and as such virtually free and without any extra memory overhead. 好处是输出将是输入的视图,因此几乎是免费的,没有任何额外的内存开销。 Let's have a sample run to show and prove those - 让我们有一个示例运行来展示和证明那些 -
In [43]: np.random.seed(0)
In [44]: a = np.random.randint(0,9,(1,300))
In [48]: np.shares_memory(a,a.reshape(10,30)[0,:,:10])
Out[48]: True
If you need a flattened version, use .ravel()
- 如果您需要扁平版本,请使用.ravel()
-
a.reshape(-1,30)[:,:10].ravel()
Timings - 计时 -
In [38]: a = np.random.randint(0,9,(300))
# @sacul's soln
In [39]: %%timeit
...: msk = [True] * 10 + [False] * 20
...: out = a[np.tile(msk, len(a)//len(msk))]
100000 loops, best of 3: 7.6 µs per loop
# From this post
In [40]: %timeit a.reshape(-1,30)[:,:10].ravel()
1000000 loops, best of 3: 1.07 µs per loop
In [41]: a = np.random.randint(0,9,(3000000))
# @sacul's soln
In [42]: %%timeit
...: msk = [True] * 10 + [False] * 20
...: out = a[np.tile(msk, len(a)//len(msk))]
100 loops, best of 3: 3.66 ms per loop
# From this post
In [43]: %timeit a.reshape(-1,30)[:,:10].ravel()
100 loops, best of 3: 2.32 ms per loop
# If you are okay with `2D` output, it is virtually free
In [44]: %timeit a.reshape(-1,30)[:,:10]
1000000 loops, best of 3: 519 ns per loop
1D
array 1D
数组的通用案例 A. No. of elements being multiple of block length A.元素数量是块长度的倍数
For a 1D
array a
with number of elements being a multiple of n
, to select m
elements off each block of n
elements and get a 1D
array output, we would have : 对于1D
阵列a
与正在的多个元素的数目n
,以选择m
截止的每个块元件n
元素,并获得1D
阵列输出,我们将有:
a.reshape(-1,n)[:,:m].ravel()
Note that ravel()
flattening part makes a copy there. 请注意, ravel()
平部分会在那里制作副本。 So, if possible keep the unflattened 2D
version for memory efficiency. 因此,如果可能的话,请保留未平整的2D
版本以提高内存效率。
Sample run - 样品运行 -
In [59]: m,n = 2,5
In [60]: N = 25
In [61]: a = np.random.randint(0,9,(N))
In [62]: a
Out[62]:
array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1, 5, 8, 4,
3, 0, 3])
# Select 2 elements off each block of 5 elements
In [63]: a.reshape(-1,n)[:,:m].ravel()
Out[63]: array([5, 0, 3, 5, 6, 8, 7, 7, 8, 4])
B. Generic no. B.通用号码 of elements 元素
We would leverage np.lib.stride_tricks.as_strided
, inspired by this post
to select m
elements off each block of n
elements - 我们将利用np.lib.stride_tricks.as_strided
,受this post
启发,从n
元素的每个块中选择m
n
元素 -
def skipped_view(a, m, n):
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
shp = ((a.size+n-1)//n,n)
return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]
def slice_m_everyn(a, m, n):
a_slice2D = skipped_view(a,m,n)
extra = min(m,len(a)-n*(len(a)//n))
L = m*(len(a)//n) + extra
return a_slice2D.ravel()[:L]
Note that skipped_view
gets us a view into the input array and possibly into memory region not assigned to the input array, but after that we are flattening and slicing to restrict it to our desired output and that's a copy. 请注意, skipped_view
让我们看到输入数组的视图,并可能进入未分配给输入数组的内存区域,但之后我们进行展平和切片以将其限制为我们所需的输出,这是一个副本。
Sample run - 样品运行 -
In [170]: np.random.seed(0)
...: a = np.random.randint(0,9,(16))
In [171]: a
Out[171]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7])
# Select 2 elements off each block of 5 elements
In [172]: slice_m_everyn(a, m=2, n=5)
Out[172]: array([5, 0, 3, 5, 6, 8, 7])
In [173]: np.random.seed(0)
...: a = np.random.randint(0,9,(19))
In [174]: a
Out[174]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1])
# Select 2 elements off each block of 5 elements
In [175]: slice_m_everyn(a, m=2, n=5)
Out[175]: array([5, 0, 3, 5, 6, 8, 7, 7])
You could create a mask and index by the mask, repeated until it reaches the length of your array: 你可以通过掩码创建一个掩码和索引,重复直到达到数组的长度:
msk = [True] * 10 + [False] * 20
arr[np.tile(msk, len(arr)//len(msk))]
Minimal example: 最小的例子:
In an array of 30 values, select 1 element, then skip 2 elements: 在包含30个值的数组中,选择1个元素,然后跳过2个元素:
>>> arr
array([6, 7, 2, 7, 1, 9, 1, 4, 4, 8, 6, 5, 2, 6, 3, 6, 8, 5, 6, 7, 2, 1, 9,
6, 7, 2, 1, 8, 2, 2])
msk = [True] * 1 + [False] * 2
>>> arr[np.tile(msk, len(arr)//len(msk))]
array([6, 7, 1, 8, 2, 6, 6, 1, 7, 8])
Explanation : 说明 :
msk
is a boolean mask msk
是一个布尔掩码
>>> msk
[True, False, False]
You can then repeat that mask with np.tile
, until it is the same length as your original array ( ie the length of your array divided by the length of your mask): 然后,您可以使用np.tile
重复该掩码,直到它与原始数组的长度相同( 即数组的长度除以掩码的长度):
>>> np.tile(msk, len(arr)//len(msk))
array([ True, False, False, True, False, False, True, False, False,
True, False, False, True, False, False, True, False, False,
True, False, False, True, False, False, True, False, False,
True, False, False], dtype=bool)
Then it's a simple matter of indexing by a boolean, which numpy
excels at 然后,这是一个简单的布尔值索引, numpy
擅长
IIUC IIUC
get = 10
skip = 20
k = [item for z in [np.arange(get) + idx for idx in np.arange(0, x.size, skip+get)] for item in z]
Then just slice 然后切片
x[k]
Example: 例:
x = np.arange(100)
x[k]
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 90, 91, 92, 93,
94, 95, 96, 97, 98, 99])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.