简体   繁体   English

与numpy重复使用的两个数组的元素方式编织

[英]Element-wise weave of two arrays used with numpy repeat

I have two arrays of unequal length val1 and val2 that I am trying to weave together in a specific way that is defined by the equal-length arrays mult1 and mult2 . 我有两个长度不等的val1val2数组,我试图以一种由相等长度的数组mult1mult2定义的特定方式编织在一起。 In general, my arrays are very long (~1e6 elements), and this is a performance-critical bottleneck in my calculation, so I cannot afford to do a python-for loop and so I am trying to take advantage of vectorized functions in Numpy. 通常,我的数组很长(〜1e6个元素),这是我计算中的性能关键瓶颈,因此我无法承受python-for循环的费用,因此我尝试利用Numpy中的矢量化函数。 For the sake of being explicit: 为了明确起见:

mult1 = np.array([0, 1, 2, 1, 0])
mult2 = np.array([1, 0, 1, 1, 0])

val1 = np.array([1, 2, 3, 4])
val2 = np.array([-1, -2, -3])

desired_final_result = np.array([-1, 1, 2, 3, -2, 4, -3])

The weaving of val1 and val2 is defined by the following element-wise procession through the indices of mult1 and mult2 . val1val2的编织由以下通过mult1mult2的索引的元素 逐行定义 Each entry of the two mult arrays defines how many elements to choose from the corresponding val array. 两个MULT阵列中的每个条目定义多少个元素从相应VAL阵列选择。 We proceed element-wise through the mult arrays; 我们进行逐元素通过MULT阵列; the value of mult1[i] determines how many entries we choose from val1 , then we proceed to the value of mult2[i] to select the appropriate number of val2 entries, always choosing the val1 entries to come first for each index i. mult1 [i]的值确定我们从val1中选择多少个条目,然后我们继续使用mult2 [i]的值来选择适当数量的val2条目,始终选择每个索引i排在最前面的val1条目。

Note that len(val1) = mult1.sum() and len(val2) = mult2.sum() , so we always end up with a final array with len(desired_final_result) = len(val1) + len(val2) . 注意len(val1)= mult1.sum()len(val2)= mult2.sum() ,因此我们总是以len(desired_final_result)= len(val1)+ len(val2)的最终数组结束。

Explicit explanation of minimal example 最小示例的明确说明

  • Since entry i=0 of mult1 is 0 , we select 0 entries from val1 and move on to entry i=0 of mult2 , which is 1 , so we select 1 entry from val2 . 由于mult1的条目i = 00 ,因此我们从val1中选择0个条目,然后继续执行mult2的条目i = 0 ,即1 ,因此我们从val2中选择1个条目。 This explains why the first entry of desired_final_result is -1. 这解释了为什么desired_final_result的第一项为-1。

  • Since entry i=1 of mult1 is 1 , we select 1 entry from val1 and move on to entry i=1 of mult2 , which is 0 , so select 0 entries from val2 . 由于mult1的条目i = 11 ,因此我们从val1中选择1个条目,然后移至mult2的条目i = 1 ,即0 ,因此从val2中选择0个条目。 This explains why the second entry of desired_final_result is 1. 这解释了为什么desired_final_result的第二个条目是1。

  • Since entry i=2 of mult1 is 2 , we select the next 2 entries from val1 and move on to entry i=2 of mult2 , which is 1 , so we select the next 1 entry from val2 . 由于i项的MULT1 = 22时 ,我们从VAL1选择下一个条目2和移动到条目= 2 i的 MULT2,1,所以我们选择从val2中的下一个1个条目。 This explains why entries 2-4 of desired_final_result are 2, 3, -2. 这解释了为什么desired_final_result的条目2-4是2、3,-2。

  • Since entry i=3 of mult1 is 1 , we select the next 1 entry from val1 and move on to entry i=3 of mult2 , which is also 1 , so we select the next 1 entry from val2 . 由于mult1的条目i = 31 ,因此我们从val1中选择下一个1条目,然后移至mult2的条目i = 3 ,它也为1 ,因此我们从val2中选择下一个1条目。 This explains why entries 5-6 of desired_final_result are 4, -3. 这解释了为什么desired_final_result的条目5-6为4,-3。

  • Finally, since the i=4 of both mult1 and mult2 is 0 , we have nothing left to do and our array is filled. 最后,由于mult1mult2i = 4均为0 ,所以我们什么也没做,数组被填充。

Question

Is there a way to use vectorized functions such as np.repeat and/or np.choose to solve my problem? 有没有办法使用矢量化功能(例如np.repeat和/或np.choose)来解决我的问题? Or do I need to resort to coding this calculation up in C and wrapping it into python? 还是我需要用C编码此计算并将其包装到python中?

Creating a Boolean index into the result array: 在结果数组中创建一个布尔索引:

mult = np.array([mult1, mult2]).ravel('F')
tftf = np.tile([True, False], len(mult1))
mask = np.repeat(tftf, mult)

result = np.empty(len(val1) + len(val2), int)
result[ mask] = val1
result[~mask] = val2

Edit - I believe this works too: 编辑-我相信这也可以:

idx = np.repeat(mult1.cumsum(), mult2)
result = np.insert(val1, idx, val2)

It's short, but it may not be faster. 它很短,但是可能不会更快。

This can be done with NumPy routines, but the best I've come up with is pretty clumsy: 这可以使用NumPy例程来完成,但是我想出的最好的方法是很笨拙的:

reps = numpy.empty([len(mult1)*2], dtype=int)
reps[::2] = mult1
reps[1::2] = mult2

to_repeat = numpy.empty_like(reps)
to_repeat[::2] = -1   # Avoid using 0 and 1 in case either of val1 or val2 is empty
to_repeat[1::2] = -2

indices = numpy.repeat(to_repeat, reps)
indices[indices==-1] = numpy.arange(len(val1))
indices[indices==-2] = numpy.arange(len(val1), len(val1) + len(val2))

final_result = numpy.concatenate([val1, val2])[indices]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM