如何根据索引更有效地将一个数组中的值分配给另一个数组？

Question

I am trying to replace the values of one array with another according to how many ones are in the source array.我试图根据源数组中有多少个数组来替换一个数组的值。 I assign a value from a given index in the replacement array based on the sum.我根据总和从替换数组中的给定索引中分配一个值。 Thus, if there are 2 ones in a row, it assigns a value of l1[1] to the species, and if there is one unit, it assigns a value of l1[0] to the output.因此，如果连续有 2 个，则为物种分配值l1[1] ，如果有一个单元，则为输出分配值l1[0] 。

It will be better seen in a specific example:在一个具体的例子中会更好地看到：

import numpy as np

l1 = np.array([4, 5])
x112 = np.array([[0, 0], [0, 1], [1, 1], [0, 0], [1, 0], [1, 1]])

array([[0, 0],
       [1, 0],
       [1, 1],
       [0, 0],
       [1, 0],
       [1, 1]])

Required output:所需输出：

[[0]
 [4]
 [5]
 [0]
 [4]
 [5]]

I did this by counting the units in each row and assigning accordingly using np.where :我通过计算每行中的单位并使用np.where进行np.where分配来np.where ：

x1x2 = np.array([0, 1, 2, 0 1, 2]) #count value 1
x1x2 = np.where(x1x2 != 1, x1x2, l1[0]) 
x1x2 = np.where(x1x2 != 2, x1x2, l1[1])             
print(x1x2)

output输出

[0 4 5 0 4 5]

Could this be done more effectively?这可以更有效地完成吗？

Answer 1

Okay I actually gave devectorizing your code a shot.好吧，我实际上尝试了对您的代码进行去向量化。 First the vectorized NumPy you have:首先是您拥有的矢量化 NumPy：

def op(x112, l1):
    # bit of cheating, adding instead of counting 1s
    x1x2 = x112[:,0] + x112[:,1]

    x1x2=np.where(x1x2 != 1, x1x2, l1[0])
    x1x2=np.where(x1x2 != 2, x1x2, l1[1])
    return x1x2

The most efficient alternative is to loop through x112 only once, so let's do a Numba loop.最有效的替代方法是只循环一次x112 ，所以让我们做一个 Numba 循环。

import numba as nb

@nb.njit
def loop(x112, l1):
    d0, d1 = x112.shape
    x1x2 = np.zeros(d0, dtype = x112.dtype)
    for i in range(d0):
        # actually count the 1s
        num1s = 0
        for j in range(d1):
            if x112[i,j] == 1:
                num1s += 1
        
        if num1s == 1:
            x1x2[i] = l1[0]
        elif num1s == 2:
            x1x2[i] = l1[1]
    return x1x2

Numba loop has a ~9-10x speed improvement on my laptop. Numba 循环在我的笔记本电脑上有大约 9-10 倍的速度提升。

%timeit op(x112, l1)
8.05 µs ± 34.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit loop(x112, l1)
873 ns ± 5.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

As @Mad_Physicist requested, timings with a bigger array.正如@Mad_Physicist 所要求的那样，使用更大的数组进行计时。 I'm including his advanced-indexing method too.我也包括他的高级索引方法。

x112 = np.random.randint(0, 2, size = (100000, 2))
l1_v2 = np.array([0,4,5])

%timeit op(x112, l1)
1.35 ms ± 27.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit loop(x112, l1)
956 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit l1_v2[x112.sum(1)]
1.2 ms ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

EDIT: Okay maybe take these timings with a grain of salt because when I went to restart the IPython kernel and reran this stuff, op(x112, l1) improved to 390 µs ± 22.1 µs per loop while the other methods retained the same performance (971 µs, 1.23 ms).编辑：好吧，也许对这些时间持保留op(x112, l1)因为当我重新启动 IPython 内核并重新运行这些东西时， op(x112, l1)提高到390 µs ± 22.1 µs per loop而其他方法保持相同的性能（ 971 微秒，1.23 毫秒）。

Answer 2

You can use direct indexing:您可以使用直接索引：

l1 = np.array([0, 4, 5])
x112 = np.array([[0, 0], [0, 1], [1, 1], [0, 0], [1, 0], [1, 1]])

result = l1[x112.sum(1)]

This works if you're at liberty to prepend the zero to l1 at creation time.如果您可以在创建时l1在l1前面加上零，则此方法有效。 If not:如果不：

result = np.r_[0, l1][x112.sum(1)]

如何根据索引更有效地将一个数组中的值分配给另一个数组？

问题描述

2 个解决方案

解决方案1
1 2021-07-24 00:36:40

解决方案2
0 2021-07-24 00:42:00

如何根据索引更有效地将一个数组中的值分配给另一个数组？

问题描述

2 个解决方案

解决方案1 1 2021-07-24 00:36:40

解决方案2 0 2021-07-24 00:42:00

解决方案1
1 2021-07-24 00:36:40

解决方案2
0 2021-07-24 00:42:00