简体   繁体   English

从数组和 boolean “共享”数组创建 m*n arrays

[英]Creating m*n arrays from an array and a boolean "shared" array

The problem: I am trying to generate m vectors of n elements, based on a "packaged", or shorter, master vector, V , that is shorter than m xn , and a length n boolean vector that determines how elements are repeated.问题:我正在尝试基于“打包”或更短的主向量V (比m xn短)和长度为n boolean 的向量生成n 个元素的m个向量,该向量确定元素如何重复。 (Vectors explained more below). (向量在下面有更多解释)。 How the Master Vector is created, and the results used, are relevant only in that the format (Master and Boolean, resulting in m xn ) must be respected.主向量的创建方式和使用的结果仅在必须遵守格式(Master 和 Boolean,导致m xn )方面相关。

For example, if Element 0 has a boolean of False, all m vectors will have the same value for Element 0, V[0] .例如,如果元素 0 的 boolean 为 False,则元素 0 的所有m个向量将具有相同的值V[0] If Element 1 has a boolean of True, then vector 0 will have Element 1 from V[1] , but vector 1 will have Element 1 from V[6] .如果元素 1 的 boolean 为 True,则向量 0 将具有来自V[1]的元素 1,但向量 1 将具有来自V[6]的元素 1。 A Master Vector, V , of;一个主向量, V ,的;

(1,2,3,4,5,6,10,30,40,60,100,300,400,600)

and a boolean vector of和一个 boolean 向量

1, 0, 1, 1, 0, 1

should produce three resulting vectors;应该产生三个结果向量;

[1 2 3 4 5 6]
[10.  2. 30. 40.  5. 60.]
[100.   2. 300. 400.   5. 600.]

Which share some elements, but not others.它们共享一些元素,但不共享其他元素。 I have a methodology for this, but it relies on nested loops and if statements.我有一个方法,但它依赖于嵌套循环和 if 语句。 What I've tried: A working, but inefficient example with 3 resulting vectors of 6 elements:我尝试过的:一个有效但效率低下的示例,其中包含 6 个元素的 3 个结果向量:

import numpy as np

p = np.array((1,2,3,4,5,6,10,30,40,60,100,300,400,600))
genome = np.array((1, 0, 1, 1, 0, 1))

index = 0
for i in range(0,3):
    
    if i==0:
        pBase = p[0:genome.size]
        print(pBase)
    else:
        extra = np.zeros(genome.size)
        for j in range(0,genome.size):
            if genome[j]==True:
                extra[j] = p[genome.size+index]
                index += 1
        pSplit = np.where(genome==False, pBase, extra)
        print(pSplit)

returns (as expected):返回(如预期):

[1 2 3 4 5 6]
[10.  2. 30. 40.  5. 60.]
[100.   2. 300. 400.   5. 600.]

taking 45.1 µs ± 2.4 µs per loop.每个循环耗时 45.1 µs ± 2.4 µs。 This seems unnecessarily verbose and slow for what should hypothetically be an easy operation, but I don't know any alternative methods.对于假设的简单操作来说,这似乎不必要地冗长和缓慢,但我不知道任何替代方法。 Is there some combination of list comprehensions or alternative functions that can accomplish the same results in a faster and more pythonic fashion?是否有一些列表推导或替代函数的组合可以以更快、更 Python 的方式完成相同的结果?

EDIT: The values of V will not always be as simple as V 10^i, the given vector is just for a demonstration.编辑: V的值并不总是像V 10^i 那样简单,给定的向量仅用于演示。 The values could be considered arbitrary (Generated from another method, following no replicable pattern like 10^i).这些值可以被认为是任意的(从另一种方法生成,没有像 10^i 这样的可复制模式)。

This program is working in another way, in order to also support vectors not having powers of 10. It first generates the base in vectors, and then adds as many vectors as needed.该程序以另一种方式工作,以支持不是 10 次方的向量。它首先生成向量中的基数,然后根据需要添加尽可能多的向量。 The vectors are generated in the following way: If the place in the boolean vector is 1, then it takes a new element from rest , which contains all the elements that aren't used yet, and updates rest .向量的生成方式如下:如果 boolean 向量中的位置为 1,则从rest中获取一个新元素,该元素包含所有尚未使用的元素,并更新rest If not, the value in the boolean vector is zero, and thus the program takes the value from vectors[0][i] , which is the same as taking it from V .如果不是,则 boolean 向量中的值为零,因此程序从vectors[0][i]中获取值,这与从V中获取值相同。

V=[1,2,3,4,5,6,10,30,40,60,100,300,400,600]
boolean=[1,0,1,1,0,1]
vectors=[V[:len(boolean)]]
rest=V[len(boolean):]
while len(rest)>=sum(boolean):# no more vectors constructable
    newv=[]
    for i,x in enumerate(boolean):
        
        if x==1:
            newv.append(rest[0])
            rest=rest[1:]
        else:
            newv.append(vectors[0][i])
    vectors.append(newv)

Here is a simpler approach -这是一个更简单的方法 -

  1. Use genome as boolean so you can use it efficiently使用基因组作为 boolean 以便您可以有效地使用它
  2. Take the first few elements from p and multiple them elementwise with the genome and the i to the power 10, where i ranges from 0 to np中取出前几个元素,然后将它们与基因组和i的元素乘以 10 次方,其中i的范围从0 to n
  3. This gives [1,0,3,4,0,6] or, [100,0,300,400,0,600] etc这给出[1,0,3,4,0,6][100,0,300,400,0,600]
  4. Then sum the product of the p elements to the inverted boolean ~genome to get [0,2,0,0,5,0]然后将p个元素的乘积与倒置的boolean ~genome相加得到[0,2,0,0,5,0]
  5. Finally iterate over range(n) and print最后迭代range(n)并打印
p = np.array((1,2,3,4,5,6,10,30,40,60,100,300,400,600))
genome = np.array((1, 0, 1, 1, 0, 1)).astype(bool)

n = 3

for i in range(3):
    print(p[:genome.size]*genome*(10**i) + p[:genome.size]*~genome)
[1 2 3 4 5 6]
[10  2 30 40  5 60]
[100   2 300 400   5 600]

If I understand the question correctly, I the following performs the task in a cleaner manner, but you can let me know what you think.如果我正确理解了这个问题,我会以更简洁的方式执行任务,但您可以让我知道您的想法。

def convert_vectors(master_vector, boolean_vector):
    """
    example:
    master_vector = [1,2,3,4,5,6,10,30,40,60,100,300,400,600]
    boolean_vector = [1,0,1,1,0,1]
    result = [[1, 2, 3, 4, 5, 6],[10, 2, 30, 40, 5, 60],[100, 2, 300, 400, 5, 600]]
    """
    res = []  # result
    curIndexInMaster = 0  # index in master_vector
    while curIndexInMaster < len(master_vector):
        curArray = []  # current array
        for bool in boolean_vector:  # for each element in boolean_vector
            if bool:  # should get new element from master_vector
                curArray.append(master_vector[curIndexInMaster])
                curIndexInMaster += 1
            else:
                curArray.append(master_vector[len(curArray)])
                if curIndexInMaster < len(boolean_vector):  # only for first array
                    curIndexInMaster += 1
        res.append(curArray)
    return res


master_vector = [1, 2, 3, 4, 5, 6, 10, 30, 40, 60, 100, 300, 400, 600]
boolean_vector = [1, 0, 1, 1, 0, 1]
print(convert_vectors(master_vector, boolean_vector))

Output: Output:

[[1, 2, 3, 4, 5, 6], [10, 2, 30, 40, 5, 60], [100, 2, 300, 400, 5, 600]]

Try the following code:试试下面的代码:

gs1 = genome.size             # Number of elements
gs2 = genome.sum()            # Number of "True" values
idx_p = np.r_[gs1 : p.size + 1 : gs2]  # Starting indices in "p"
idx_g = np.where(genome)[0]   # Indices of "true" in "genome"
res = np.tile(p[0:gs1], idx_p.size).reshape(idx_p.size, -1)  # result
iStart = gs1
iRow = 1          # Row number in "res"
for iEnd in idx_p[1:]:
    np.put(res[iRow], idx_g, p[iStart : iEnd])
    iRow += 1
    iStart = iEnd

For your source data the result ( res array) is:对于您的源数据,结果( res数组)是:

[[  1   2   3   4   5   6]
 [ 10   2  30  40   5  60]
 [100   2 300 400   5 600]]

You can try this:你可以试试这个:

import numpy as np

V = np.array((1, 2, 3, 4, 5, 6, 10, 30, 40, 60, 100, 300, 400, 600))
b = np.array([1, 0, 1, 1, 0, 1]).astype(bool)

nc = len(b)
nr = (len(V) - len(b)) // b.sum() + 1
out = np.tile(V[:nc], reps=(nr, 1))
out[1:][np.tile(b, reps=(nr - 1, 1))] = V[nc:]
print(out)

It gives:它给:

[[  1   2   3   4   5   6]
 [ 10   2  30  40   5  60]
 [100   2 300 400   5 600]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM