如何创建一个随机掩码矩阵，其中我们掩码一个连续的长度？

Question

How do I create a 10000 x 1000 mask matrix randomly such that each row has 3 contiguous masked entries of length 100?如何随机创建一个 10000 x 1000 掩码矩阵，使每行有 3 个长度为 100 的连续掩码条目？ One naive way of doing this is as follows:一种天真的方法如下：

import numpy as np
mask = np.ones((10000, 1000))
idx = np.random.choice(mask.shape[1] - 100, 3 * mask.shape[0]).reshape([mask.shape[0], 3])
for i, id in enumerate(idx):
    for j in range(3):
        for k in range(100):
            mask[i][id[j] + k] = 0

However, this is extremely inefficient and takes a lot of time.然而，这是非常低效的并且需要很多时间。 What would be an efficient implementation?什么是有效的实施方式？ Also, it would be nice if the three blocks in a row are non-overlapping.此外，如果连续的三个块不重叠，那就太好了。

Answer 1

You can create a list of indices for each row and apply this directly on the mask instead of using 2 for loops.您可以为每一行创建一个索引列表并将其直接应用于掩码，而不是使用 2 个 for 循环。 For example:例如：

mask = np.ones((10000, 1000))
for i in range(len(mask)):
    start_indices = np.random.choice(900, 3)
    indices = [idx for start_idx in start_indices for idx in range(start_idx, start_idx+100)]
    mask[i][indices] = 0

To make sure that the blocks are non-overlapping, add this as a condition for the indices as follows:要确保块不重叠，请将其添加为索引的条件，如下所示：

mask = np.ones((10000, 1000))
for i in range(len(mask)):
    cond = True
    while cond:
        start_indices = sorted(np.random.choice(900, 3))
        cond = any([True for idx1, idx2 in zip(start_indices, start_indices[1:]) if idx1 + 100 >= idx2])
    
    indices = [idx for start_idx in start_indices for idx in range(start_idx, start_idx+100)]
    mask[i][indices] = 0

Timings:时间：

# original
3.42 s ± 153 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# overlaps allowed
1.41 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# no overlaps
2.25 s ± 199 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Answer 2

I got quite good performance improvement (30-40x faster than original)我得到了相当不错的性能提升（比原来快 30-40 倍）

I make sure zeros do not overlap:我确保零不重叠：

In each sample there are 700 ones, I split 700 to 4 random integers (so they sum up to 700) -> I have sizes of ones在每个样本中有 700 个，我将 700 拆分为 4 个随机整数（因此它们总和为 700）-> 我有大小
I calculate indices of zeros based of sizes of ones我根据大小计算零的索引

def faster_than_original():
    zeros_size = 100
    n_zeros = 3
    mask = np.ones((10000, 1000))
    indices_weights = np.random.random((mask.shape[0], n_zeros + 1))

    number_of_ones = mask.shape[1] - zeros_size * n_zeros
    ones_sizes = np.round(indices_weights[:, :n_zeros].T
                          * (number_of_ones / np.sum(indices_weights, axis=-1))).T.astype(np.int32)
    ones_sizes[:, 1:] += zeros_size
    zeros_start_indices = np.cumsum(ones_sizes, axis=-1)
    for sample_idx in range(len(mask)):
        for zeros_idx in zeros_start_indices[sample_idx]:
            mask[sample_idx, zeros_idx: zeros_idx + zeros_size] = 0
    return mask

Profiling:分析：

    42         1    8974014.0 8974014.0     76.2      mask = original()
    43         1     235235.0 235235.0      2.0      mask2 = faster_than_original()
    44         1    2565371.0 2565371.0     21.8      mask3 = shaido_method()

如何创建一个随机掩码矩阵，其中我们掩码一个连续的长度？

问题描述

2 个解决方案

解决方案1
0 2021-11-24 08:04:11

解决方案2
0 2021-11-24 23:32:18

如何创建一个随机掩码矩阵，其中我们掩码一个连续的长度？

问题描述

2 个解决方案

解决方案1 0 2021-11-24 08:04:11

解决方案2 0 2021-11-24 23:32:18

解决方案1
0 2021-11-24 08:04:11

解决方案2
0 2021-11-24 23:32:18