简体   繁体   English

列表的N个不同排列的随机样本

[英]Random Sample of N Distinct Permutations of a List

Suppose I have a Python list of arbitrary length k . 假设我有一个任意长度为k的Python列表。 Now, suppose I would like a random sample of n , (where n <= k!) distinct permutations of that list. 现在,假设我想要随机抽样n ,(其中n <= k!)该列表的不同排列。 I was tempted to try: 我很想尝试:

import random
import itertools

k = 6
n = 10

mylist = list(range(0, k))

j = random.sample(list(itertools.permutations(mylist)), n)

for i in j:
  print(i)

But, naturally, this code becomes unusably slow when k gets too large. 但是,当k变得太大时,这个代码自然会变得非常慢。 Given that the number of permutations that I may be looking for n is going to be relatively small compared to the total number of permutations, computing all of the permutations is unnecessary. 鉴于我可能正在寻找的排列的数量n与排列的总数相比将相对较小,因此计算所有排列是不必要的。 Yet it's important that none of the permutations in the final list are duplicates. 但重要的是,最终列表中的所有排列都不是重复的。

How would you achieve this more efficiently? 你会如何更有效地实现这一目标? Remember, mylist could be a list of anything, I just used list(range(0, k)) for simplicity. 请记住, mylist可以是任何列表,我只是使用list(range(0, k))以简化。

You can generate permutations, and keep track of the ones you have already generated. 您可以生成排列,并跟踪已生成的排列。 To make it more versatile, I made a generator function: 为了使它更通用,我做了一个发电机功能:

import random

k = 6
n = 10

mylist = list(range(0, k))

def perm_generator(seq):
    seen = set()
    length = len(seq)
    while True:
        perm = tuple(random.sample(seq, length))
        if perm not in seen:
            seen.add(perm)
            yield perm

rand_perms = perm_generator(mylist)

j = [next(rand_perms) for _ in range(n)]

for i in j:
    print(i)

Naïve implementation 天真的实施

Bellow the naïve implementation I did (well implemented by @Tomothy32, pure PSL using generator): Bellow我做过的天真实现(很好地实现了@ Tomothy32,使用生成器的纯PSL):

import numpy as np

mylist = np.array(mylist)
perms = set()
for i in range(n):                          # (1) Draw N samples from permutations Universe U (#U = k!)
    while True:                             # (2) Endless loop
        perm = np.random.permutation(k)     # (3) Generate a random permutation form U
        key = tuple(perm)
        if key not in perms:                # (4) Check if permutation already has been drawn (hash table)
            perms.update(key)               # (5) Insert into set
            break                           # (6) Break the endless loop
    print(i, mylist[perm])

It relies on numpy.random.permutation which randomly permute a sequence. 它依赖于随机置换序列的numpy.random.permutation

The key idea is: 关键的想法是:

  • to generate a new random permutation (index randomly permuted); 生成一个新的随机排列(索引随机置换);
  • to check if permutation already exists and store it (as tuple of int because it must hash) to prevent duplicates; 检查排列是否已经存在并存储它(作为int tuple ,因为它必须散列)以防止重复;
  • Then to permute the original list using the index permutation. 然后使用索引排列来置换原始列表。

This naïve version does not directly suffer to factorial complexity O(k!) of itertools.permutations function which does generate all k! 这个天真的版本并没有直接遭受itertools.permutations函数的因子复杂度O(k!) ,它确实产生了所有k! permutations before sampling from it. 从中抽样之前的排列。

About Complexity 关于复杂性

There is something interesting about the algorithm design and complexity... 算法设计和复杂性有一些有趣的东西......

If we want to be sure that the loop could end, we must enforce N <= k! 如果我们想确保循环可以结束,我们必须强制执行N <= k! , but it is not guaranteed. ,但不保证。 Furthermore, assessing the complexity requires to know how many time the endless-loop will actually loop before a new random tuple is found and break it. 此外,评估复杂性需要知道无限循环在找到新的随机元组之前实际循环多少次并打破它。

Limitation 局限性

Let's encapsulate the function written by @Tomothy32: 让我们封装@ Tomothy32编写的函数:

import math
def get_perms(seq, N=10):
    rand_perms = perm_generator(mylist)
    return [next(rand_perms) for _ in range(N)]

For instance, this call work for very small k<7 : 例如,此调用适用于非常小的k<7

get_perms(list(range(k)), math.factorial(k))

But will fail before O(k!) complexity (time and memory) when k grows because it boils down to randomly find a unique missing key when all other k!-1 keys have been found. 但是当k增长时, O(k!)复杂度(时间和内存)会失败,因为当找到所有其他k!-1键时,它会归结为随机找到一个唯一的缺失键。

Always look on the bright side... 总是看着光明的一面......

On the other hand, it seems the method can generate a reasonable amount of permuted tuples in a reasonable amount of time when N<<<k! 另一方面,当N<<<k!时,似乎该方法可以在合理的时间内生成合理数量的置换元组N<<<k! . Example, it is possible to draw more than N=5000 tuples of length k where 10 < k < 1000 in less than one second. 例如,可以在不到一秒的时间内绘制多于N=5000个长度为k元组,其中10 < k < 1000

在此输入图像描述 在此输入图像描述

When k and N are kept small and N<<<k! kN保持较小且N<<<k! , then the algorithm seems to have a complexity: 那么算法似乎有一个复杂性:

  • Constant versus k ; 常数与k ;
  • Linear versus N . 线性与N

This is somehow valuable. 这有点宝贵。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM