[英]fastest way to sample many random permutations of a numpy array
Unlike many other numpy/random functions, numpy.random.Generator.permutation()
doesn't provide an obvious way to return multiple results in a single function call.与许多其他 numpy/随机函数不同,
numpy.random.Generator.permutation()
没有提供在单个 function 调用中返回多个结果的明显方法。 Given a (1d) numpy array x
, I want to sample n
permutations of x
(each of length len(x)), and have the result as a numpy array with shape (n, len(x))
.给定一个 (1d) numpy 数组
x
,我想对x
的n
个排列进行采样(每个排列的长度为 len(x)),并将结果作为形状为(n, len(x))
的 numpy 数组。 A naive way of generating many permutations is np.array([rng.permutation(x) for _ in range(n)])
.生成许多排列的一种天真的方法是
np.array([rng.permutation(x) for _ in range(n)])
。 This is not ideal, mostly because the loop is in Python rather than inside a compiled numpy function.这并不理想,主要是因为循环在 Python 中,而不是在已编译的 numpy function 中。
import numpy as np
rng = np.random.default_rng(1234)
# x is big enough to not want to enumerate all permutations
x = rng.standard_normal(size=20)
n = 10000
perms = np.array([rng.permutation(x) for _ in range(n)])
My use case is for a brute-force search to find permutations that minimise a specific property (constituting a "good enough" search solution).我的用例是用于蛮力搜索以找到最小化特定属性的排列(构成“足够好”的搜索解决方案)。 I can calculate the property of interest for each permutation using numpy operations that vectorise/broadcast nicely over the resulting matrix of permutations.
我可以使用 numpy 操作计算每个排列的感兴趣属性,这些操作可以很好地矢量化/广播生成的排列矩阵。 It turns out that naively generating the matrix of permutations is the bottleneck in my code.
事实证明,天真地生成排列矩阵是我代码中的瓶颈。 Is there a better way?
有没有更好的办法?
You can use rng.permuted
instead of rng.permutation
and combine it with np.tile
so to repeat x
multiple times and shuffle each replicates independently.您可以使用
rng.permuted
而不是rng.permutation
并将其与np.tile
组合,以便多次重复x
并独立地打乱每个复制。 Here is how:方法如下:
perms = rng.permuted(np.tile(x, n).reshape(n,x.size), axis=1)
This is about 10 times faster on my machine than your initial code.这在我的机器上比您的初始代码快 10 倍。
Just be aware that Jérome's solution provides an array of "n" rows but it may include repetitions.请注意,Jérome 的解决方案提供了一个“n”行数组,但它可能包含重复项。 Different rows may have same "x" order (especially if "n" is bigger than "x")
不同的行可能具有相同的“x”顺序(特别是如果“n”大于“x”)
If you need to sample without repetition (as it was my case), you can always do set(list(perm))
and keep unique combinations "x" values如果您需要在不重复的情况下进行采样(就像我的情况一样),您可以随时执行
set(list(perm))
并保留唯一组合“x”值
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.