简体   繁体   English

NumPy:每m点选择n个点

[英]NumPy: Selecting n points every m points

If I have a numpy.ndarray that's, say, 300 points in size (1 x 300 for now), and I wanted to select 10 points every 30 points, how would I do that? 如果我有一个numpy.ndarray ,比方说300点大小(现在是1 x 300),我想每30分选择10分,我该怎么做?

In other words: I want the first 10 points, then skip 20, then grab 10 more, and then skip 10... until the end of the array. 换句话说:我想要前10个点,然后跳过20个,然后再抓10个,然后跳过10 ......直到数组结束。

To select 10 elements off each block of 30 elements, we can simply reshape into 2D and slice out the first 10 columns from each row - 要从30 10元素的每个块中选择10 30元素,我们可以简单地重塑为2D并从每行切出前10列 -

a.reshape(-1,30)[:,:10]

The benefit is the output would be a view into the input and as such virtually free and without any extra memory overhead. 好处是输出将是输入的视图,因此几乎是免费的,没有任何额外的内存开销。 Let's have a sample run to show and prove those - 让我们有一个示例运行来展示和证明那些 -

In [43]: np.random.seed(0)

In [44]: a = np.random.randint(0,9,(1,300))

In [48]: np.shares_memory(a,a.reshape(10,30)[0,:,:10])
Out[48]: True

If you need a flattened version, use .ravel() - 如果您需要扁平版本,请使用.ravel() -

a.reshape(-1,30)[:,:10].ravel()

Timings - 计时 -

In [38]: a = np.random.randint(0,9,(300))

# @sacul's soln
In [39]: %%timeit
    ...: msk = [True] * 10 + [False] * 20
    ...: out = a[np.tile(msk, len(a)//len(msk))]
100000 loops, best of 3: 7.6 µs per loop

# From this post
In [40]: %timeit a.reshape(-1,30)[:,:10].ravel()
1000000 loops, best of 3: 1.07 µs per loop

In [41]: a = np.random.randint(0,9,(3000000))

# @sacul's soln
In [42]: %%timeit
    ...: msk = [True] * 10 + [False] * 20
    ...: out = a[np.tile(msk, len(a)//len(msk))]
100 loops, best of 3: 3.66 ms per loop

# From this post
In [43]: %timeit a.reshape(-1,30)[:,:10].ravel()
100 loops, best of 3: 2.32 ms per loop

# If you are okay with `2D` output, it is virtually free
In [44]: %timeit a.reshape(-1,30)[:,:10]
1000000 loops, best of 3: 519 ns per loop

Generic case with 1D array 1D数组的通用案例

A. No. of elements being multiple of block length A.元素数量是块长度的倍数

For a 1D array a with number of elements being a multiple of n , to select m elements off each block of n elements and get a 1D array output, we would have : 对于1D阵列a与正在的多个元素的数目n ,以选择m截止的每个块元件n元素,并获得1D阵列输出,我们将有:

a.reshape(-1,n)[:,:m].ravel()

Note that ravel() flattening part makes a copy there. 请注意, ravel()平部分会在那里制作副本。 So, if possible keep the unflattened 2D version for memory efficiency. 因此,如果可能的话,请保留未平整的2D版本以提高内存效率。

Sample run - 样品运行 -

In [59]: m,n = 2,5

In [60]: N = 25

In [61]: a = np.random.randint(0,9,(N))

In [62]: a
Out[62]: 
array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1, 5, 8, 4,
       3, 0, 3])

# Select 2 elements off each block of 5 elements
In [63]: a.reshape(-1,n)[:,:m].ravel()
Out[63]: array([5, 0, 3, 5, 6, 8, 7, 7, 8, 4])

B. Generic no. B.通用号码 of elements 元素

We would leverage np.lib.stride_tricks.as_strided , inspired by this post to select m elements off each block of n elements - 我们将利用np.lib.stride_tricks.as_strided ,受this post启发,从n元素的每个块中选择m n元素 -

def skipped_view(a, m, n):
    s = a.strides[0]
    strided = np.lib.stride_tricks.as_strided
    shp = ((a.size+n-1)//n,n)
    return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]

def slice_m_everyn(a, m, n):
    a_slice2D = skipped_view(a,m,n)
    extra = min(m,len(a)-n*(len(a)//n))
    L = m*(len(a)//n) + extra
    return a_slice2D.ravel()[:L]

Note that skipped_view gets us a view into the input array and possibly into memory region not assigned to the input array, but after that we are flattening and slicing to restrict it to our desired output and that's a copy. 请注意, skipped_view让我们看到输入数组的视图,并可能进入未分配给输入数组的内存区域,但之后我们进行展平和切片以将其限制为我们所需的输出,这是一个副本。

Sample run - 样品运行 -

In [170]: np.random.seed(0)
     ...: a = np.random.randint(0,9,(16))

In [171]: a
Out[171]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7])

# Select 2 elements off each block of 5 elements
In [172]: slice_m_everyn(a, m=2, n=5)
Out[172]: array([5, 0, 3, 5, 6, 8, 7])

In [173]: np.random.seed(0)
     ...: a = np.random.randint(0,9,(19))

In [174]: a
Out[174]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1])

# Select 2 elements off each block of 5 elements
In [175]: slice_m_everyn(a, m=2, n=5)
Out[175]: array([5, 0, 3, 5, 6, 8, 7, 7])

You could create a mask and index by the mask, repeated until it reaches the length of your array: 你可以通过掩码创建一个掩码和索引,重复直到达到数组的长度:

msk = [True] * 10 + [False] * 20

arr[np.tile(msk, len(arr)//len(msk))]

Minimal example: 最小的例子:

In an array of 30 values, select 1 element, then skip 2 elements: 在包含30个值的数组中,选择1个元素,然后跳过2个元素:

>>> arr
array([6, 7, 2, 7, 1, 9, 1, 4, 4, 8, 6, 5, 2, 6, 3, 6, 8, 5, 6, 7, 2, 1, 9,
       6, 7, 2, 1, 8, 2, 2])

msk = [True] * 1 + [False] * 2

>>> arr[np.tile(msk, len(arr)//len(msk))]
array([6, 7, 1, 8, 2, 6, 6, 1, 7, 8])

Explanation : 说明

msk is a boolean mask msk是一个布尔掩码

>>> msk
[True, False, False]

You can then repeat that mask with np.tile , until it is the same length as your original array ( ie the length of your array divided by the length of your mask): 然后,您可以使用np.tile重复该掩码,直到它与原始数组的长度相同( 数组的长度除以掩码的长度):

>>> np.tile(msk, len(arr)//len(msk))
array([ True, False, False,  True, False, False,  True, False, False,
        True, False, False,  True, False, False,  True, False, False,
        True, False, False,  True, False, False,  True, False, False,
        True, False, False], dtype=bool)

Then it's a simple matter of indexing by a boolean, which numpy excels at 然后,这是一个简单的布尔值索引, numpy擅长

IIUC IIUC

get = 10
skip = 20
k = [item for z in [np.arange(get) + idx for idx in np.arange(0, x.size, skip+get)] for item in z]

Then just slice 然后切片

x[k]

Example: 例:

x = np.arange(100)
x[k]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 30, 31, 32, 33, 34, 35, 36,
       37, 38, 39, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 90, 91, 92, 93,
       94, 95, 96, 97, 98, 99])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM