与numpy重复使用的两个数组的元素方式编织

Question

I have two arrays of unequal length val1 and val2 that I am trying to weave together in a specific way that is defined by the equal-length arrays mult1 and mult2 . 我有两个长度不等的val1和val2数组，我试图以一种由相等长度的数组mult1和mult2定义的特定方式编织在一起。 In general, my arrays are very long (~1e6 elements), and this is a performance-critical bottleneck in my calculation, so I cannot afford to do a python-for loop and so I am trying to take advantage of vectorized functions in Numpy. 通常，我的数组很长（〜1e6个元素），这是我计算中的性能关键瓶颈，因此我无法承受python-for循环的费用，因此我尝试利用Numpy中的矢量化函数。 For the sake of being explicit: 为了明确起见：

mult1 = np.array([0, 1, 2, 1, 0])
mult2 = np.array([1, 0, 1, 1, 0])

val1 = np.array([1, 2, 3, 4])
val2 = np.array([-1, -2, -3])

desired_final_result = np.array([-1, 1, 2, 3, -2, 4, -3])

The weaving of val1 and val2 is defined by the following element-wise procession through the indices of mult1 and mult2 . val1和val2的编织由以下通过mult1和mult2的索引的元素 逐行定义 。 Each entry of the two mult arrays defines how many elements to choose from the corresponding val array. 两个MULT阵列中的每个条目定义多少个元素从相应VAL阵列选择。 We proceed element-wise through the mult arrays; 我们进行逐元素通过MULT阵列; the value of mult1[i] determines how many entries we choose from val1 , then we proceed to the value of mult2[i] to select the appropriate number of val2 entries, always choosing the val1 entries to come first for each index i. mult1 [i]的值确定我们从val1中选择多少个条目，然后我们继续使用mult2 [i]的值来选择适当数量的val2条目，始终选择每个索引i排在最前面的val1条目。

Note that len(val1) = mult1.sum() and len(val2) = mult2.sum() , so we always end up with a final array with len(desired_final_result) = len(val1) + len(val2) . 注意len（val1）= mult1.sum（）和len（val2）= mult2.sum（） ，因此我们总是以len（desired_final_result）= len（val1）+ len（val2）的最终数组结束。

Explicit explanation of minimal example 最小示例的明确说明

Since entry i=0 of mult1 is 0 , we select 0 entries from val1 and move on to entry i=0 of mult2 , which is 1 , so we select 1 entry from val2 . 由于mult1的条目i = 0为0 ，因此我们从val1中选择0个条目，然后继续执行mult2的条目i = 0 ，即1 ，因此我们从val2中选择1个条目。 This explains why the first entry of desired_final_result is -1. 这解释了为什么desired_final_result的第一项为-1。
Since entry i=1 of mult1 is 1 , we select 1 entry from val1 and move on to entry i=1 of mult2 , which is 0 , so select 0 entries from val2 . 由于mult1的条目i = 1为1 ，因此我们从val1中选择1个条目，然后移至mult2的条目i = 1 ，即0 ，因此从val2中选择0个条目。 This explains why the second entry of desired_final_result is 1. 这解释了为什么desired_final_result的第二个条目是1。
Since entry i=2 of mult1 is 2 , we select the next 2 entries from val1 and move on to entry i=2 of mult2 , which is 1 , so we select the next 1 entry from val2 . 由于i项的MULT1 = 2为2时，我们从VAL1选择下一个条目2和移动到条目= 2 i的 MULT2，为1，所以我们选择从val2中的下一个1个条目。 This explains why entries 2-4 of desired_final_result are 2, 3, -2. 这解释了为什么desired_final_result的条目2-4是2、3，-2。
Since entry i=3 of mult1 is 1 , we select the next 1 entry from val1 and move on to entry i=3 of mult2 , which is also 1 , so we select the next 1 entry from val2 . 由于mult1的条目i = 3为1 ，因此我们从val1中选择下一个1条目，然后移至mult2的条目i = 3 ，它也为1 ，因此我们从val2中选择下一个1条目。 This explains why entries 5-6 of desired_final_result are 4, -3. 这解释了为什么desired_final_result的条目5-6为4，-3。
Finally, since the i=4 of both mult1 and mult2 is 0 , we have nothing left to do and our array is filled. 最后，由于mult1和mult2的i = 4均为0 ，所以我们什么也没做，数组被填充。

Question 题

Is there a way to use vectorized functions such as np.repeat and/or np.choose to solve my problem? 有没有办法使用矢量化功能（例如np.repeat和/或np.choose）来解决我的问题？ Or do I need to resort to coding this calculation up in C and wrapping it into python? 还是我需要用C编码此计算并将其包装到python中？

Answer 1

Creating a Boolean index into the result array: 在结果数组中创建一个布尔索引：

mult = np.array([mult1, mult2]).ravel('F')
tftf = np.tile([True, False], len(mult1))
mask = np.repeat(tftf, mult)

result = np.empty(len(val1) + len(val2), int)
result[ mask] = val1
result[~mask] = val2

Edit - I believe this works too: 编辑-我相信这也可以：

idx = np.repeat(mult1.cumsum(), mult2)
result = np.insert(val1, idx, val2)

It's short, but it may not be faster. 它很短，但是可能不会更快。

Answer 2

This can be done with NumPy routines, but the best I've come up with is pretty clumsy: 这可以使用NumPy例程来完成，但是我想出的最好的方法是很笨拙的：

reps = numpy.empty([len(mult1)*2], dtype=int)
reps[::2] = mult1
reps[1::2] = mult2

to_repeat = numpy.empty_like(reps)
to_repeat[::2] = -1   # Avoid using 0 and 1 in case either of val1 or val2 is empty
to_repeat[1::2] = -2

indices = numpy.repeat(to_repeat, reps)
indices[indices==-1] = numpy.arange(len(val1))
indices[indices==-2] = numpy.arange(len(val1), len(val1) + len(val2))

final_result = numpy.concatenate([val1, val2])[indices]

与numpy重复使用的两个数组的元素方式编织

问题描述

Explicit explanation of minimal example 最小示例的明确说明

Question 题

2 个解决方案

解决方案1
4 已采纳

解决方案2
2 2016-06-28 21:32:59

与numpy重复使用的两个数组的元素方式编织

问题描述

Explicit explanation of minimal example 最小示例的明确说明

Question 题

2 个解决方案

解决方案1 4 已采纳

解决方案2 2 2016-06-28 21:32:59

解决方案1
4 已采纳

解决方案2
2 2016-06-28 21:32:59