简体   繁体   English

对 numpy 数组的许多随机排列进行采样的最快方法

[英]fastest way to sample many random permutations of a numpy array

Unlike many other numpy/random functions, numpy.random.Generator.permutation() doesn't provide an obvious way to return multiple results in a single function call.与许多其他 numpy/随机函数不同, numpy.random.Generator.permutation()没有提供在单个 function 调用中返回多个结果的明显方法。 Given a (1d) numpy array x , I want to sample n permutations of x (each of length len(x)), and have the result as a numpy array with shape (n, len(x)) .给定一个 (1d) numpy 数组x ,我想对xn个排列进行采样(每个排列的长度为 len(x)),并将结果作为形状为(n, len(x))的 numpy 数组。 A naive way of generating many permutations is np.array([rng.permutation(x) for _ in range(n)]) .生成许多排列的一种天真的方法是np.array([rng.permutation(x) for _ in range(n)]) This is not ideal, mostly because the loop is in Python rather than inside a compiled numpy function.这并不理想,主要是因为循环在 Python 中,而不是在已编译的 numpy function 中。

import numpy as np

rng = np.random.default_rng(1234)
# x is big enough to not want to enumerate all permutations
x = rng.standard_normal(size=20)
n = 10000
perms = np.array([rng.permutation(x) for _ in range(n)])

My use case is for a brute-force search to find permutations that minimise a specific property (constituting a "good enough" search solution).我的用例是用于蛮力搜索以找到最小化特定属性的排列(构成“足够好”的搜索解决方案)。 I can calculate the property of interest for each permutation using numpy operations that vectorise/broadcast nicely over the resulting matrix of permutations.我可以使用 numpy 操作计算每个排列的感兴趣属性,这些操作可以很好地矢量化/广播生成的排列矩阵。 It turns out that naively generating the matrix of permutations is the bottleneck in my code.事实证明,天真地生成排列矩阵是我代码中的瓶颈。 Is there a better way?有没有更好的办法?

You can use rng.permuted instead of rng.permutation and combine it with np.tile so to repeat x multiple times and shuffle each replicates independently.您可以使用rng.permuted而不是rng.permutation并将其与np.tile组合,以便多次重复x并独立地打乱每个复制。 Here is how:方法如下:

perms = rng.permuted(np.tile(x, n).reshape(n,x.size), axis=1)

This is about 10 times faster on my machine than your initial code.这在我的机器上比您的初始代码快 10 倍。

Just be aware that Jérome's solution provides an array of "n" rows but it may include repetitions.请注意,Jérome 的解决方案提供了一个“n”行数组,但它可能包含重复项。 Different rows may have same "x" order (especially if "n" is bigger than "x")不同的行可能具有相同的“x”顺序(特别是如果“n”大于“x”)

If you need to sample without repetition (as it was my case), you can always do set(list(perm)) and keep unique combinations "x" values如果您需要在不重复的情况下进行采样(就像我的情况一样),您可以随时执行set(list(perm))并保留唯一组合“x”值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用随机数填充numpy数组的最快方法 - Fastest way to fill numpy array with random numbers 采样numpy数组的最快方法是什么? - What is the fastest way to sample slices of numpy arrays? 使用NumPy多次对大型数组进行采样的有效方法? - Efficient way to sample a large array many times with NumPy? 移动Numpy阵列的最快方法 - Fastest way to shift a Numpy array 给定一个 NumPy 数组和多对一映射数组,计算聚合映射值的最快方法是什么 - Given a NumPy array and a many to one mapping array, what is the fastest way to calculate the aggregated mapped values 从具有索引的 numpy 数组中抽取随机样本 - Drawing a random sample from a numpy array with index 在 Python 中采样大于 p 的 n 个随机素数的最快方法? - Fastest way to sample n random primes greater than p in Python? 使用 numpy 操作从每行填充 numpy 数组(不包括填充)和未填充值的数量中获取 Select 的最快方法 - Fastest way to Select a random number from each row padded numpy array (excluding the pad) and number of non padded values, using numpy operations 迭代和访问numpy数组元素的最快方法? - Fastest way of iterating and accessing elements of numpy array? 在numpy数组中返回相邻值的最快方法 - Fastest way to return adjacent value in numpy array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM