简体   繁体   English

2d 数组作为 3d 数组的索引

[英]2d array as index of a 3d array

I had a 2D array (C) with 8000x64 elements, an 1D array (s) with 8000x1 elements and another 1D array (d) with 1x64 elements.我有一个 8000x64 元素的二维数组 (C)、一个 8000x1 元素的一维数组 (s) 和另一个 1x64 元素的一维数组 (d)。 Every row of index i, where s[i] is True, shall be added by vector d.索引 i 的每一行,其中 s[i] 为 True,应由向量 d 添加。 This works quite well:这很有效:

C[s == True] += d

Now I have added one dimension to C, s, and d and the logic above shall be applied to every element of the additional dimension.现在我已经为 C、s 和 d 添加了一个维度,上面的逻辑将应用于附加维度的每个元素。

The following code does what I want, but it's very slow.下面的代码做了我想要的,但速度很慢。

for i in range(I):
        C_this = C[:,:,i]
        s_this = s[:,i]
        d_this = d[:,i]

        C_this[s_this == True] += d_this
        C[:,:,i] = C_this

Is there a numpy way to do this without a for loop?有没有一种没有 for 循环的 numpy 方法来做到这一点?

It's easier with the extra dimension at the beginning:开始时使用额外维度会更容易:

In [376]: C = np.zeros((4,2,3),int)                                                            
In [377]: s = np.array([[0,0],[0,1],[1,0],[1,1]],bool)                                         
In [378]: d = np.arange(1,13).reshape(4,3)                                                     
In [379]: C.shape, s.shape, d.shape                                                            
Out[379]: ((4, 2, 3), (4, 2), (4, 3))
In [380]: I,J = np.nonzero(s)                                                                  
In [381]: I,J                                                                                  
Out[381]: (array([1, 2, 3, 3]), array([1, 0, 0, 1]))

In [383]: C[I,J]=d[I]                                                                          
In [384]: C                                                                                    
Out[384]: 
array([[[ 0,  0,  0],
        [ 0,  0,  0]],

       [[ 0,  0,  0],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [ 0,  0,  0]],

       [[10, 11, 12],
        [10, 11, 12]]])

Your way:你的方式:

In [385]: C = np.zeros((4,2,3),int)                                                            
In [386]: for i in range(4): 
     ...:     C[i,:,:][s[i,:]] += d[i,:] 
     ...:                                                                                      
In [387]: C                                                                                    
Out[387]: 
array([[[ 0,  0,  0],
        [ 0,  0,  0]],

       [[ 0,  0,  0],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [ 0,  0,  0]],

       [[10, 11, 12],
        [10, 11, 12]]])

Due to how numpy indexing works, s selects the relevant rows of C in the first example.由于 numpy 索引的工作方式, s在第一个示例中选择C的相关行。 To do the same thing in the 3D case, you would have to reshape C into something that is (8000*3, 64) and s into (8000*3, 1) .要在 3D 情况下做同样的事情,您必须将C重塑为(8000*3, 64)并将s重塑为(8000*3, 1) The only problem now is getting d to account for the different number of rows in each third dimension, which can be done with np.repeat .现在唯一的问题是让d考虑每个第三维中不同的行数,这可以通过np.repeat完成。

The first part is第一部分是

C2 = np.swapaxes(C, -1, 1).reshape(-1, 64)

This is extremely inefficient because it copies your entire array.这是非常低效的,因为它会复制整个数组。 A better arrangement would be if C had shape (3, 8000, 64) to begin with.更好的安排是,如果C(3, 8000, 64)开头。 Then you would only need to ravel the first two axes to get the proper shape and memory layout, without copying data.然后您只需要解开前两个轴即可获得正确的形状和内存布局,而无需复制数据。

repeats = np.count_nonzero(s, axis=0)
C.reshape(-1, 64)[s.ravel()] += np.repeat(d, repeats, axis=0)

Since the reshape operation returns a view in this case, the indexing should work properly to increment in-place.由于 reshape 操作在这种情况下返回一个视图,索引应该正常工作以就地递增。 I don't think this approach is necessarily very good though, since it copies each row of d as many times as s is non-zero in the corresponding element of the new dimension.不过,我认为这种方法不一定非常好,因为它复制d每一行的次数与s在新维度的相应元素中不为零的次数相同。

Here is my implementation of the method @hpaulj proposed.这是我对@hpaulj 提出的方法的实现。 Note that I don't want to take the credit from him, so please mark his answer, not mine, as correct.请注意,我不想从他那里获得功劳,所以请将他的答案标记为正确,而不是我的答案。 Just wanted to share what I did.只是想分享我所做的。

import numpy as np
import numpy.random as npr

C = np.zeros((100, 8000, 64), dtype=int)
s = np.zeros((100, 8000), dtype=bool)
d = np.zeros((100, 64), dtype=int)

C[:,:,:] = npr.randint(50, size=C.shape)
s[:,:] = npr.randint(3, size=s.shape)
d[:,:] = npr.randint(10, size=d.shape)

I, J = np.nonzero(s)
C[I, J] += d[I]

I then profiled the program I made, and it runs on my machine in less than 450 milliseconds (the last two lines take less than 300 ms).然后我分析了我制作的程序,它在我的机器上运行的时间不到 450 毫秒(最后两行不到 300 毫秒)。 Note that the calls to "randint" were just to set up the array values, so those lines wouldn't apply to your use case.请注意,对“randint”的调用只是为了设置数组值,因此这些行不适用于您的用例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM