[英]Numpy - how to convert an array of vector indices to a mask?
Given a np.ndarray
named indices
with a n
rows and variable length vector in each row I want to create a boolean mask of n
rows and m
rows where m
is a pre-known value equal to the greatest value possible in indices
. 给定
np.ndarray
命名的indices
,每行有n
行和可变长度向量,我想创建一个n
行和m
行的布尔掩码,其中m
是一个预先知道的值,等于indices
可能的最大值。 Take note that the indices specified in indices
refer to per-row indices, and not global matrix indices. 请注意,索引中指定的
indices
是指每行索引,而不是全局矩阵索引。
For example, given: 例如,给定:
indices = np.array([
[2, 0],
[0],
[4, 7, 1]
])
# Expected output
print(mask)
[[ True False True False False False False False]
[ True False False False False False False False]
[False True False False True False False True]]
m
is known beforehand (the maximum length of each row in mask
) and doesn't need to be inferred from indices
m
是事先已知的( mask
中每行的最大长度),不需要从indices
推断出来
Notice : This is different from converting an array of indices to a mask where the indices refer to the resulting matrix indices 注意 :这与将索引数组转换为掩码(其中索引引用结果矩阵索引)不同
Here's one way - 这是一种方式 -
def mask_from_indices(indices, ncols=None):
# Extract column indices
col_idx = np.concatenate(indices)
# If number of cols is not given, infer it based on max column index
if ncols is None:
ncols = col_idx.max()+1
# Length of indices, to be used as no. of rows in o/p
n = len(indices)
# Initialize o/p array
out = np.zeros((n,ncols), dtype=bool)
# Lengths of each index element that represents each group of col indices
lens = np.array(list(map(len,indices)))
# Use np.repeat to generate all row indices
row_idx = np.repeat(np.arange(len(lens)),lens)
# Finally use row, col indices to set True values
out[row_idx,col_idx] = 1
return out
Sample run - 样品运行 -
In [89]: mask_from_indices(indices)
Out[89]:
array([[ True, False, True, False, False, False, False, False],
[ True, False, False, False, False, False, False, False],
[False, True, False, False, True, False, False, True]])
Here is a variant: 这是一个变种:
def create_mask(indices, m):
mask = np.zeros((len(indices), m), dtype=bool)
for i, idx in enumerate(indices):
mask[i, idx] = True
return mask
Usage: 用法:
>>> create_mask(indices, 8)
array([[ True, False, True, False, False, False, False, False],
[ True, False, False, False, False, False, False, False],
[False, True, False, False, True, False, False, True]])
While there is no direct way of doing this in a fully vectorized way, for larger inputs, a single application of mask[full_row_indices, full_col_indices]
with the pre-computed full list of indices is faster than multiple applications of mask[partial_row_indices, partial_col_indices]
. 虽然没有直接的方法以完全矢量化的方式执行此操作,但对于较大的输入,使用预先计算的完整索引列表的
mask[full_row_indices, full_col_indices]
的单个应用程序比mask[partial_row_indices, partial_col_indices]
多个应用程序mask[partial_row_indices, partial_col_indices]
更快。 Memory-wise, the multiple applications are also less demanding because no intermediate full_row_indices
/ full_col_indices
need to be built. 在内存方面,多个应用程序的要求也较低,因为不需要
full_row_indices
中间的full_row_indices
/ full_col_indices
。 Of course this would generally depend on the length of indices
. 当然,这通常取决于
indices
的长度。
Just to get some feeling on how much faster the different possible solutions could, the following functions have been tested: 只是为了了解不同可能的解决方案可以更快的速度,已经测试了以下功能:
import numpy as np
import random
def gen_mask_direct(col_indices, cols=None):
if cols is None:
cols = np.max(np.concatenate(col_indices)) + 1
rows = len(col_indices)
mask = np.zeros((rows, cols), dtype=bool)
for row_index, col_index in enumerate(col_indices):
mask[row_index, col_index] = True
return mask
def gen_mask_loops(col_indices, cols=None):
rows = len(col_indices)
row_indices = tuple(i for i, j in enumerate(col_indices) for _ in j)
col_indices = tuple(sum(col_indices, ()))
if cols is None:
cols = np.max(col_indices) + 1
mask = np.zeros((rows, cols), dtype=bool)
mask[row_indices, col_indices] = True
return mask
def gen_mask_np_repeat(col_indices, cols=None):
rows = len(col_indices)
lengths = list(map(len, col_indices))
row_indices = np.repeat(np.arange(rows), lengths)
col_indices = np.concatenate(col_indices)
if cols is None:
cols = np.max(col_indices) + 1
mask = np.zeros((rows, cols), dtype=bool)
mask[row_indices, col_indices] = True
return mask
def gen_mask_np_concatenate(col_indices, cols=None):
rows = len(col_indices)
row_indices = tuple(np.full(len(col_index), i) for i, col_index in enumerate(col_indices))
row_indices = np.concatenate(row_indices)
col_indices = np.concatenate(col_indices)
if cols is None:
cols = np.max(col_indices) + 1
mask = np.zeros((rows, cols), dtype=bool)
mask[row_indices, col_indices] = True
return mask
gen_mask_direct()
is basically @Derlin answer and implements multiple applications of mask[partial_row_indices, partial_col_indices]
. gen_mask_direct()
基本上是@Derlin的答案并实现了mask[partial_row_indices, partial_col_indices]
多个应用程序。 All the others implement a single application of mask[full_row_indices, full_col_indices]
with different ways of preparing the full_row_indices
and the full_col_indices
: 所有其他实现了
mask[full_row_indices, full_col_indices]
的单个应用程序mask[full_row_indices, full_col_indices]
包含准备full_row_indices
和full_col_indices
不同方法:
gen_mask_loops()
uses direct looping gen_mask_loops()
使用直接循环 gen_mask_np_repeat()
uses np.repeat()
(and it is substantially the same as @Divakar answer gen_mask_np_repeat()
使用np.repeat()
(和@Divakar答案基本相同) gen_mask_np_concatenate()
uses a combination of np.full()
and np.concatenate()
gen_mask_np_concatenate()
使用np.full()
和np.concatenate()
A quick sanity check indicates that all these are equivalent: 快速完整性检查表明所有这些都是等效的:
funcs = gen_mask_direct, gen_mask_loops, gen_mask_np_repeat, gen_mask_np_concatenate
random.seed(0)
test_inputs = [
(tuple(
tuple(sorted(set([random.randint(0, n - 1) for _ in range(random.randint(1, n - 1))])))
for _ in range(random.randint(1, n - 1))))
for n in range(5, 6)
]
print(test_inputs)
# [((0, 2, 3, 4), (2, 3, 4), (1, 4), (0, 1, 4))]
for func in funcs:
print('Func:', func.__name__)
for test_input in test_inputs:
print(func(test_input).astype(int))
Func: gen_mask_direct
[[1 0 1 1 1]
[0 0 1 1 1]
[0 1 0 0 1]
[1 1 0 0 1]]
Func: gen_mask_loops
[[1 0 1 1 1]
[0 0 1 1 1]
[0 1 0 0 1]
[1 1 0 0 1]]
Func: gen_mask_np_repeat
[[1 0 1 1 1]
[0 0 1 1 1]
[0 1 0 0 1]
[1 1 0 0 1]]
Func: gen_mask_np_concatenate
[[1 0 1 1 1]
[0 0 1 1 1]
[0 1 0 0 1]
[1 1 0 0 1]]
Here are some benchmarks (using the code from here ): 以下是一些基准测试(使用此处的代码):
and zooming to the fastest: 并缩放到最快:
supporting the overall statement that, typically, a single application of mask[...]
for full indices is faster multiple applications of mask[...]
for partial indices. 支持整体声明,通常,对于完整索引,单个应用
mask[...]
可以更快地为部分索引多次应用mask[...]
。
For completeness, the following code was used to generate the inputs, compare the outputs, run the benchmarks and prepare the plots: 为了完整起见,使用以下代码生成输入,比较输出,运行基准并准备图:
def gen_input(n):
random.seed(0)
return tuple(
tuple(sorted(set([random.randint(0, n - 1) for _ in range(random.randint(n // 2, n - 1))])))
for _ in range(random.randint(n // 2, n - 1)))
def equal_output(a, b):
return np.all(a == b)
input_sizes = tuple(int(2 ** (2 + (3 * i) / 4)) for i in range(13))
print('Input Sizes:\n', input_sizes, '\n')
runtimes, input_sizes, labels, results = benchmark(
funcs, gen_input=gen_input, equal_output=equal_output,
input_sizes=input_sizes)
plot_benchmarks(runtimes, input_sizes, labels, units='ms')
plot_benchmarks(runtimes, input_sizes, labels, units='ms', zoom_fastest=2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.