简体   繁体   English

根据条件从numpy数组中随机选择行

[英]Randomly select rows from numpy array based on a condition

Let's say I have 2 arrays of arrays, labels is 1D and data is 5D note that both arrays have the same first dimension .假设我有 2 个数组数组,标签是 1D,数据是 5D请注意,两个数组具有相同的第一维

To simplify things let's say labels contain only 3 arrays :为了简化事情,假设标签只包含 3 个数组:

labels=np.array([[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]])

And let's say I have a datalist of data arrays (length=3) where each array has a 5D shape where the first dimension of each one is the same as the arrays of the labels array.让我们说我有数据阵列(长度= 3),其中每个阵列具有5D形状,其中每一个的第一尺寸是相同的标签阵列的阵列的数据列表

In this example, datalist has 3 arrays of shapes : ( 8 ,3,100,10,1), ( 5 ,3,100,10,1) and ( 10 ,3,100,10,1) respectively.在这个例子中, datalist有 3 个形状数组:( 8 ,3,100,10,1), ( 5 ,3,100,10,1) 和 ( 10 ,3,100,10,1) Here, the first dimension of each of these arrays is the same as the lengths of each array in label .这里,每个数组的第一个维度与label中每个数组的长度相同。

Now I want to reduce the number of zeros in each array of labels and keep the other values.现在我想减少每个标签数组中的零数量并保留其他值。 Let's say I want to keep only 3 zeros for each array.假设我只想为每个数组保留3 个零 Therefore, the length of each array in labels as well as the first dimension of each array in data will be 6 , 4 and 8 .因此,标签中每个数组的长度以及数据中每个数组的第一维将是648

In order to reduce the number of zeros in each array of labels , I want to randomly select and keep only 3 .为了减少每个标签数组中零的数量,我想随机选择并保留3 个 Now these same random selected indexes will be used then to select the correspondant rows from data .现在将使用这些相同的随机选择的索引从数据中选择相应的行。

For this example, the new_labels array will be something like this :对于这个例子, new_labels数组将是这样的:

new_labels=np.array([[0,0,1,1,2,0],[4,0,0,0],[0,3,2,1,0,1,7,0]])

Here's what I have tried so far :这是我迄今为止尝试过的:

all_ind=[] #to store indexes where value=0 for all arrays
indexes_to_keep=[] #to store the random selected indexes
new_labels=[] #to store the final results

for i in range(len(labels)):
    ind=[] #to store indexes where value=0 for one array
    for j in range(len(labels[i])):
        if (labels[i][j]==0):
            ind.append(j)
    all_ind.append(ind)

for k in range(len(labels)):   
    indexes_to_keep.append(np.random.choice(all_ind[i], 3))
    aux= np.zeros(len(labels[i]) - len(all_ind[i]) + 3)
    ....
    .... 
    Here, how can I fill **aux** with the values ?
    ....
    .... 
    new_labels.append(aux)

Any suggestions ?有什么建议 ?

Playing with numpy arrays of different lenghts is not a good idea therefore you are required to iterate each item and perform some method on it.使用不同长度的 numpy 数组不是一个好主意,因此您需要迭代每个项目并对其执行一些方法。 Assuming you want to optimize that method only, masking might work pretty well here:假设您只想优化该方法,屏蔽在这里可能会很好地工作:

def specific_choice(x, n):
    '''leaving n random zeros of the list x'''
    x = np.array(x)
    mask = x != 0
    idx = np.flatnonzero(~mask)
    np.random.shuffle(idx) #dynamical change of idx value, quite fast
    idx = idx[:n]
    mask[idx] = True
    return x[mask] # or mask if you need it

Iteration of list is faster than one of array so effective usage would be:列表的迭代比数组的迭代快,因此有效的用法是:

labels = [[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]]
output = [specific_choice(n, 3) for n in labels]

Output:输出:

[array([0, 1, 1, 2, 0, 0]), array([0, 4, 0, 0]), array([0, 3, 0, 2, 1, 1, 7, 0])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM