[英]Random Sample of N Distinct Permutations of a List
Suppose I have a Python list of arbitrary length k
. 假设我有一个任意长度为
k
的Python列表。 Now, suppose I would like a random sample of n
, (where n <= k!) distinct permutations of that list. 现在,假设我想要随机抽样
n
,(其中n <= k!)该列表的不同排列。 I was tempted to try: 我很想尝试:
import random
import itertools
k = 6
n = 10
mylist = list(range(0, k))
j = random.sample(list(itertools.permutations(mylist)), n)
for i in j:
print(i)
But, naturally, this code becomes unusably slow when k
gets too large. 但是,当
k
变得太大时,这个代码自然会变得非常慢。 Given that the number of permutations that I may be looking for n
is going to be relatively small compared to the total number of permutations, computing all of the permutations is unnecessary. 鉴于我可能正在寻找的排列的数量
n
与排列的总数相比将相对较小,因此计算所有排列是不必要的。 Yet it's important that none of the permutations in the final list are duplicates. 但重要的是,最终列表中的所有排列都不是重复的。
How would you achieve this more efficiently? 你会如何更有效地实现这一目标? Remember,
mylist
could be a list of anything, I just used list(range(0, k))
for simplicity. 请记住,
mylist
可以是任何列表,我只是使用list(range(0, k))
以简化。
You can generate permutations, and keep track of the ones you have already generated. 您可以生成排列,并跟踪已生成的排列。 To make it more versatile, I made a generator function:
为了使它更通用,我做了一个发电机功能:
import random
k = 6
n = 10
mylist = list(range(0, k))
def perm_generator(seq):
seen = set()
length = len(seq)
while True:
perm = tuple(random.sample(seq, length))
if perm not in seen:
seen.add(perm)
yield perm
rand_perms = perm_generator(mylist)
j = [next(rand_perms) for _ in range(n)]
for i in j:
print(i)
Bellow the naïve implementation I did (well implemented by @Tomothy32, pure PSL using generator): Bellow我做过的天真实现(很好地实现了@ Tomothy32,使用生成器的纯PSL):
import numpy as np
mylist = np.array(mylist)
perms = set()
for i in range(n): # (1) Draw N samples from permutations Universe U (#U = k!)
while True: # (2) Endless loop
perm = np.random.permutation(k) # (3) Generate a random permutation form U
key = tuple(perm)
if key not in perms: # (4) Check if permutation already has been drawn (hash table)
perms.update(key) # (5) Insert into set
break # (6) Break the endless loop
print(i, mylist[perm])
It relies on numpy.random.permutation
which randomly permute a sequence. 它依赖于随机置换序列的
numpy.random.permutation
。
The key idea is: 关键的想法是:
tuple
of int
because it must hash) to prevent duplicates; int
tuple
,因为它必须散列)以防止重复; This naïve version does not directly suffer to factorial complexity O(k!)
of itertools.permutations
function which does generate all k!
这个天真的版本并没有直接遭受
itertools.permutations
函数的因子复杂度O(k!)
,它确实产生了所有k!
permutations before sampling from it. 从中抽样之前的排列。
There is something interesting about the algorithm design and complexity... 算法设计和复杂性有一些有趣的东西......
If we want to be sure that the loop could end, we must enforce N <= k!
如果我们想确保循环可以结束,我们必须强制执行
N <= k!
, but it is not guaranteed. ,但不保证。 Furthermore, assessing the complexity requires to know how many time the endless-loop will actually loop before a new random tuple is found and break it.
此外,评估复杂性需要知道无限循环在找到新的随机元组之前实际循环多少次并打破它。
Let's encapsulate the function written by @Tomothy32: 让我们封装@ Tomothy32编写的函数:
import math
def get_perms(seq, N=10):
rand_perms = perm_generator(mylist)
return [next(rand_perms) for _ in range(N)]
For instance, this call work for very small k<7
: 例如,此调用适用于非常小的
k<7
:
get_perms(list(range(k)), math.factorial(k))
But will fail before O(k!)
complexity (time and memory) when k
grows because it boils down to randomly find a unique missing key when all other k!-1
keys have been found. 但是当
k
增长时, O(k!)
复杂度(时间和内存)会失败,因为当找到所有其他k!-1
键时,它会归结为随机找到一个唯一的缺失键。
On the other hand, it seems the method can generate a reasonable amount of permuted tuples in a reasonable amount of time when N<<<k!
另一方面,当
N<<<k!
时,似乎该方法可以在合理的时间内生成合理数量的置换元组N<<<k!
. 。 Example, it is possible to draw more than
N=5000
tuples of length k
where 10 < k < 1000
in less than one second. 例如,可以在不到一秒的时间内绘制多于
N=5000
个长度为k
元组,其中10 < k < 1000
。
When k
and N
are kept small and N<<<k!
当
k
和N
保持较小且N<<<k!
, then the algorithm seems to have a complexity: 那么算法似乎有一个复杂性:
k
; k
; N
. N
This is somehow valuable. 这有点宝贵。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.