简体   繁体   English

如何使用开始和结束索引对numpy行进行切片

[英]How to slice numpy rows using start and end index

index = np.array([[1,2],[2,4],[1,5],[5,6]])
z = np.zeros(shape = [4,10], dtype = np.float32)

What is the efficient way to set z[np.arange(4),index[:,0]] , z[np.arange(4), index[:,1]] and everything between them as 1? 设置z[np.arange(4),index[:,0]]z[np.arange(4), index[:,1]]以及它们之间的所有内容为1的有效方法是什么?

expected output: 预期产量:

array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]])

We can leverage NumPy broadcasting for a vectorized solution by simply comparing the start and end indices against the ranged array covering the length of columns to give us a mask that represents all the places in the output array required to be assigned as 1s . 我们可以通过简单地将起始和结束索引与覆盖列长度的范围数组进行比较来利用NumPy broadcasting来实现矢量化解决方案,从而为我们提供一个掩码,该掩码表示输出数组中需要指定为1s

So, the solution would be something like this - 所以,解决方案将是这样的 -

ncols = z.shape[1]
r = np.arange(z.shape[1])
mask = (index[:,0,None] <= r) & (index[:,1,None] >= r)
z[mask] = 1

Sample run - 样品运行 -

In [39]: index = np.array([[1,2],[2,4],[1,5],[5,6]])
    ...: z = np.zeros(shape = [4,10], dtype = np.float32)

In [40]: ncols = z.shape[1]
    ...: r = np.arange(z.shape[1])
    ...: mask = (index[:,0,None] <= r) & (index[:,1,None] >= r)
    ...: z[mask] = 1

In [41]: z
Out[41]: 
array([[0., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 1., 1., 0., 0., 0., 0., 0.],
       [0., 1., 1., 1., 1., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 1., 0., 0., 0.]], dtype=float32)

If z is always a zeros-initialized array, we can directly get the output from mask - 如果z总是一个zeros-initialized数组,我们可以直接从mask获取输出 -

z = mask.astype(int)

Sample run - 样品运行 -

In [37]: mask.astype(int)
Out[37]: 
array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]])

Benchmarking 标杆

Comparing @hpaulj's foo0 and mine foo4 as listed in @hpaulj's post for a set with 1000 rows and variable number of columns. 比较@ hpaulj的foo0和我的foo4如@ hpaulj的帖子中列出的1000行和可变列数的集合。 We are starting with 10 columns as that was how the input sample was listed and we are giving it a bigger number of rows - 1000 . 我们从10列开始,因为输入样本是如何列出的,我们给它的行数更多 - 1000 We would increase the number of columns to 1000 . 我们会将列数增加到1000

Here's the timings - 这是时间 -

In [14]: ncols = 10
    ...: index = np.random.randint(0,ncols,(10000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [15]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
100 loops, best of 3: 6.27 ms per loop
1000 loops, best of 3: 594 µs per loop

In [16]: ncols = 100
    ...: index = np.random.randint(0,ncols,(10000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [17]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
100 loops, best of 3: 6.49 ms per loop
100 loops, best of 3: 2.74 ms per loop

In [38]: ncols = 300
    ...: index = np.random.randint(0,ncols,(1000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [39]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
1000 loops, best of 3: 657 µs per loop
1000 loops, best of 3: 600 µs per loop

In [40]: ncols = 1000
    ...: index = np.random.randint(0,ncols,(1000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [41]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
1000 loops, best of 3: 673 µs per loop
1000 loops, best of 3: 1.78 ms per loop

Thus, choosing the best one would depend on the number of columns of the problem set between the loopy and the broadcasting based vectorized one. 因此,选择最佳的一个将取决于loopy和基于广播的矢量化之间的问题集的列数。

I think this is what you want to do - but with the loop: 我认为这是你想要做的 - 但是循环:

In [35]: z=np.zeros((4,10),int)
In [36]: index = np.array([[1,2],[2,4],[1,5],[5,6]])
In [37]: for i in range(4):
    ...:     z[i,index[i,0]:index[i,1]] = 1
    ...:     
In [38]: z
Out[38]: 
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 0, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]])

Since there differing length slices, this will be tricky to do with one array expression. 由于存在不同长度的切片,因此使用一个数组表达式会很棘手。 Maybe not impossible, but tricky enough that it might not be worth trying. 也许并非不可能,但足够棘手,可能不值得尝试。

Look at the indices of the 1s in this z : 看看这个z 1的索引:

In [40]: np.where(z)
Out[40]: 
(array([0, 1, 1, 2, 2, 2, 2, 3], dtype=int32),
 array([1, 2, 3, 1, 2, 3, 4, 5], dtype=int32))

Is there a regular pattern that could be generated [0,1,2,3] and index ? 是否有可以生成[0,1,2,3]和index的常规模式?

I can generate the 2nd row with a concatenation of slices: 我可以使用切片连接生成第二行:

In [39]: np.r_[1:2, 2:4, 1:5, 5:6]
Out[39]: array([1, 2, 3, 1, 2, 3, 4, 5])

But notice that r_ involves several iterations - to generate the input, to generate the expanded slices, and to concatenate them. 但请注意, r_涉及多次迭代 - 生成输入,生成扩展切片,以及连接它们。

I can generate the first row of the where with: 我可以生成where的第一行:

In [41]: index[:,1]-index[:,0]
Out[41]: array([1, 2, 4, 1])
In [42]: np.arange(4).repeat(_)
Out[42]: array([0, 1, 1, 2, 2, 2, 2, 3])

and as expected, those 2 index arrays give us all the 1s: 正如预期的那样,这两个索引数组为我们提供了所有1:

In [43]: z[Out[42],Out[39]]
Out[43]: array([1, 1, 1, 1, 1, 1, 1, 1])

Or to generate Out[39] from index : 或者从index生成Out[39]

In [50]: np.concatenate([np.arange(i,j) for i,j in index])
Out[50]: array([1, 2, 3, 1, 2, 3, 4, 5])

Comparing my solutions with @Divakar's 将我的解决方案与@Divakar's进行比较

def foo0(z,index):
    for i in range(z.shape[0]):
        z[i,index[i,0]:index[i,1]] = 1
    return z

def foo4(z,index):
    r = np.arange(z.shape[1])
    mask = (index[:,0,None] <= r) & (index[:,1,None] >= r)
    z[mask] = 1
    return z

For this small example, row iteration is faster: 对于这个小例子,行迭代更快:

In [155]: timeit foo0(z,index)
7.12 µs ± 224 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [156]: timeit foo4(z,index)
19.8 µs ± 890 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Even for larger arrays, the row iteration approach is faster: 即使对于较大的数组,行迭代方法也更快:

In [157]: Z.shape
Out[157]: (1000, 1000)
In [158]: Index.shape
Out[158]: (1000, 2)
In [159]: timeit foo0(Z,Index)
1.72 ms ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [160]: timeit foo4(Z,Index)
7.47 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM