简体   繁体   English

从两个列表创建所有可能的项组合的元组,而不复制元组中的项

[英]Creating tuples of all possible combinations of items from two lists, without duplicating items within tuples

I would like to be able to take a range of numbers and return a list containing triples without duplicates. 我希望能够获取一系列数字并返回包含三元组的列表而不重复。 Each element of x should appear once in each position of the triples. x的每个元素应该在三元组的每个位置出现一次。 The goal is to get something like the following: 目标是获得如下内容:

get_combinations_without_duplicates(3) = [(0, 1, 2), (1, 2, 0), (2, 0, 1)]

For range(3) this is just a list rotation, but for higher ranges, there are more possible combinations. 对于范围(3),这只是一个列表旋转,但对于更高的范围,有更多可能的组合。 I would like to be able to randomly generate a list of triples that satisfies these constraints. 我希望能够随机生成满足这些约束的三元组列表。

Suppose we start by specifying the first element of each triple for the case where n=4: 假设我们首先为n = 4的情况指定每个三元组的第一个元素:

[(0,), (1,), (2,), (3,)] [(0,),(1,),(2,),(3,)]

The second element of the first triple can be anything other than 0. Once one of these is chosen, then this limits the options for the next triple, and so on. 第一个三元组的第二个元素可以是除0之外的任何元素。一旦选择了其中一个元素,那么这将限制下一个三元组的选项,依此类推。 The goal is to have a function that takes a number and creates the triples in this manner, but doesn't always create the same set of triples. 目标是使用一个函数来获取数字并以这种方式创建三元组,但并不总是创建相同的三元组。 That is, the end result could be a rotation: 也就是说,最终结果可能是轮换:

[(0, 1, 2), (1, 2, 3), (2, 3, 0), (3, 0, 1),]

or 要么

[(0, 2, 3), (1, 3, 0), (2, 0, 1), (3, 1, 2)]

Here is an implementation of this function: 这是这个函数的一个实现:

def get_combinations_without_duplicates(n):
    output = []
    second = range(n)
     third = range(n)
for i in range(n):
    triple = [i]
    #Get the second value of the triple, but make sure that it isn't a 
    #duplicate of the first value
    #in the triple or any value that has appeared in the second position of any triple
    choices_for_second = [number for number in second if number not in triple]
    #Randomly select a number from the allowed possibilities
    n_second = random.choice(choices_for_second) 
    #Append it to the triple
    triple.append(n_second)
    #Remove that value from second so that it won't be chosen for other triples
    second = [number for number in second if number != n_second]
    #Do the same for the third value
    choices_for_third = [number for number in third if number not in triple]
    n_third = random.choice(choices_for_third)
    triple.append(n_third)
    third = [number for number in third if number != n_third]
    output.append(tuple(triple))
return output

As pointed out below, this process will sometimes randomly select combinations that don't work. 如下所述,此过程有时会随机选择不起作用的组合。 That can be handled if you do something like: 如果您执行以下操作,则可以处理:

def keep_trying(n):
    try:
        return get_combinations_without_duplicates(n)
    except IndexError:
        return keep_trying(n)

However, I'm wondering if there is a better way to do this in general. 但是,我想知道是否有更好的方法来做到这一点。

Let's try this again. 让我们再试一次。

A few observations. 一些观察。

  1. The first value will always be zero in a sorted array of your tuples. 在元组的排序数组中,第一个值始终为零。
  2. The length of the array will always be as long as the number of tuples that exist in your array. 数组的长度始终与数组中存在的元组数一样长。
  3. You want these to be randomly generated. 您希望随机生成这些。
  4. The tuples are produced in 'sorted' order. 元组以“排序”顺序生成。

Based on these specifications, we can come up with a procedural method; 根据这些规范,我们可以提出一种程序方法;

  1. Generate 2 lists of serial integers, one to pick from, the other to seed from. 生成2个串行整数列表,一个用于选择,另一个用于种子。
  2. For each number in the seed list, [0, 1, 2, 3] , randomly append and remove a number that's not already in the element. 对于种子列表中的每个数字[0, 1, 2, 3] ,随机追加并删除元素中尚未存在的数字。 [01, 13, 20, 32]
  3. Generate another list of serial integers, and repeat. 生成另一个串行整数列表,然后重复。 [012, 130, 203, 321]

But, this doesn't work. 但是,这不起作用。 For some iterations, it will back itself into a corner and not be able to generate a number. 对于某些迭代,它将自己回到角落而不能生成数字。 For instance, [01, 13, 20, 32].. appending [3, 0, 1... crap, I'm stuck. 例如, [01, 13, 20, 32].. appending [3, 0, 1... crap, I'm stuck.

The only way to fix this, is to do a true shuffling over the entire row, and reshuffle until one fits. 解决这个问题的唯一方法是在整个行上进行真正的改组,然后重新洗牌,直到一个适合。 This may take quite a bit of time, and will only get more painful as the sets get longer. 这可能需要相当长的时间,并且随着设置变长而变得更加痛苦。

So, procedurally speaking: 所以,从程序上讲:

Solution 1: Random generation 解决方案1:随机生成

  1. Populate a list with your range. 使用您的范围填充列表。 [0, 1, 2, 3] [0,1,2,3]
  2. Create another list. 创建另一个列表。 [0, 1, 2, 3] [0,1,2,3]
  3. Shuffle the list. 洗牌清单。 [1, 0, 2, 3] [1,0,2,3]
  4. Shuffle until you find one that fits... [1, 2, 3, 0] 随机播放,直到找到适合的... [1,2,3,0]
  5. Repeat with the third element. 重复第三个元素。

With this procedure, while a computer can verify solutions very quickly, it cannot generate solutions very quickly. 通过此过程,虽然计算机可以非常快速地验证解决方案,但它无法非常快速地生成解决方案。 However, it is merely one of two ways to generate a truly random answer. 但是,它只是产生真正随机答案的两种方法之一。

Therefore, the fastest guaranteed method would use make use of a verification procedure, rather than a generating procedure. 因此,最快保证的方法将使用验证程序而不是生成程序。 First things first, generate all the possible permutations. 首先,产生所有可能的排列。

from itertools import permutations

n = 4
candidates = [i for i in permutations(xrange(n),3)]

Then. 然后。

Solution 2: Random verification 解决方案2:随机验证

  1. Pick a triplet that starts with 0. 选择一个以0开头的三元组。
  2. Pop, at random, a triplet that does not start with 0. 随机弹出一个不以0开头的三元组。
  3. Verify if the randomly picked triplet is an intermediate solution. 验证随机选取的三联体是否为中间溶液。
  4. If not, pop another triplet. 如果没有,请弹出另一个三元组。
  5. If yes, append the triplet, and REPOPULATE THE TRIPLET QUEUE . 如果是,请附加三元组,然后重新编写TRIPLET QUEUE
  6. Repeat n times. 重复n次。 # or until you exhaust the queue, at which point repeat n times naturally becomes TRUE #或直到你耗尽队列,此时重复n次自然变为TRUE

A solution for the next triplet is mathematically guaranteed to be in the solution set, so if you just let it exhaust itself, a randomized solution should appear. 下一个三元组的解决方案在数学上保证在解决方案集中,所以如果你让它自己耗尽,应该出现一个随机解决方案。 The problem with this approach is that there's no guarantee that every possible outcome has an equal probability. 这种方法的问题在于无法保证每个可能的结果具有相同的概率。

Solution 3: Iterative verification 解决方案3:迭代验证

For equal probability results, get rid of the randomization, and generate every possible 3-tuple combination, n-lists long-- and verify each of those solution candidates. 对于等概率结果,去除随机化,并生成每个可能的3元组合,n列表长 - 并验证每个解决方案候选者。

Write a function to verify over the list of candidate solutions to produce every solution, and then randomly pop a solution from that list. 编写一个函数来验证候选解决方案列表以生成每个解决方案,然后从该列表中随机弹出解决方案。

from itertools import combinations

results = [verify(i) for i in combinations(candidates, n)]
# this is 10626 calls to verify for n=4, 5 million for n=5 
# this is not an acceptable solution.  

Neither Solution 1 or 3 is very fast, O(n**2), but given your criteria, it's possible this is as fast as it'll get if you want a truly random solution. 解决方案1或3都不是非常快,O(n ** 2),但是根据您的标准,如果您想要一个真正随机的解决方案,这可能会达到最快速度。 Solution 2 will guaranteed be the fastest of these three, often times significantly beating 1 or 3, Solution 3 has the most stable results. 解决方案2将保证是这三者中最快的,通常大大超过1或3,解决方案3具有最稳定的结果。 Which of these approaches you choose will depend on what you want to do with the output. 您选择的这些方法中的哪一种取决于您想要对输出执行的操作。

Afterward: 之后:

Ultimately, the speed of the code will be contingent on exactly how random you want your code to be. 最终,代码的速度将取决于您希望代码的随机性 An algorithm to spit out the VERY first (and only the very first) instance of a tuple series that satisfies your requirement can run supremely quickly, as it just attacks the permutations in order, once, and it will run in O(n) time. 吐出满足您要求的元组系列的第一个(也是唯一的第一个)实例的算法可以快速运行,因为它只是按顺序攻击排列,一次,它将在O(n)时间内运行。 However, it will not do anything randomly... 但是,它不会随意做任何事情......

Also, here's some quick code for verify(i). 此外,这里有一些验证(i)的快速代码。 It's based on the observation that two tuples may not have the same number in the same index. 这是基于观察到两个元组在同一索引中可能没有相同的数字。

def verify(t):
    """ Verifies that a set of tuples satisfies the combinations without duplicates condition. """
    zipt = zip(*t)
    return all([len(i) == len(set(i)) for i in zipt])

n = 4 Full Solution Set n = 4完整解集

((0, 1, 2), (1, 0, 3), (2, 3, 0), (3, 2, 1))
((0, 1, 2), (1, 0, 3), (2, 3, 1), (3, 2, 0))
((0, 1, 2), (1, 2, 3), (2, 3, 0), (3, 0, 1))
((0, 1, 2), (1, 3, 0), (2, 0, 3), (3, 2, 1))
((0, 1, 3), (1, 0, 2), (2, 3, 0), (3, 2, 1))
((0, 1, 3), (1, 0, 2), (2, 3, 1), (3, 2, 0))
((0, 1, 3), (1, 2, 0), (2, 3, 1), (3, 0, 2))
((0, 1, 3), (1, 3, 2), (2, 0, 1), (3, 2, 0))
((0, 2, 1), (1, 0, 3), (2, 3, 0), (3, 1, 2))
((0, 2, 1), (1, 3, 0), (2, 0, 3), (3, 1, 2))
((0, 2, 1), (1, 3, 0), (2, 1, 3), (3, 0, 2))
((0, 2, 1), (1, 3, 2), (2, 0, 3), (3, 1, 0))
((0, 2, 3), (1, 0, 2), (2, 3, 1), (3, 1, 0))
((0, 2, 3), (1, 3, 0), (2, 0, 1), (3, 1, 2))
((0, 2, 3), (1, 3, 2), (2, 0, 1), (3, 1, 0))
((0, 2, 3), (1, 3, 2), (2, 1, 0), (3, 0, 1))
((0, 3, 1), (1, 0, 2), (2, 1, 3), (3, 2, 0))
((0, 3, 1), (1, 2, 0), (2, 0, 3), (3, 1, 2))
((0, 3, 1), (1, 2, 0), (2, 1, 3), (3, 0, 2))
((0, 3, 1), (1, 2, 3), (2, 1, 0), (3, 0, 2))
((0, 3, 2), (1, 0, 3), (2, 1, 0), (3, 2, 1))
((0, 3, 2), (1, 2, 0), (2, 1, 3), (3, 0, 1))
((0, 3, 2), (1, 2, 3), (2, 0, 1), (3, 1, 0))
((0, 3, 2), (1, 2, 3), (2, 1, 0), (3, 0, 1))

n = 5 has 552 unique solutions. n = 5有552个独特的解决方案。 Here are the first 20. 这是第20个。

((0, 1, 2), (1, 0, 3), (2, 3, 4), (3, 4, 0), (4, 2, 1))
((0, 1, 2), (1, 0, 3), (2, 3, 4), (3, 4, 1), (4, 2, 0))
((0, 1, 2), (1, 0, 3), (2, 4, 0), (3, 2, 4), (4, 3, 1))
((0, 1, 2), (1, 0, 3), (2, 4, 1), (3, 2, 4), (4, 3, 0))
((0, 1, 2), (1, 0, 4), (2, 3, 0), (3, 4, 1), (4, 2, 3))
((0, 1, 2), (1, 0, 4), (2, 3, 1), (3, 4, 0), (4, 2, 3))
((0, 1, 2), (1, 0, 4), (2, 4, 3), (3, 2, 0), (4, 3, 1))
((0, 1, 2), (1, 0, 4), (2, 4, 3), (3, 2, 1), (4, 3, 0))
((0, 1, 2), (1, 2, 0), (2, 3, 4), (3, 4, 1), (4, 0, 3))
((0, 1, 2), (1, 2, 0), (2, 4, 3), (3, 0, 4), (4, 3, 1))
((0, 1, 2), (1, 2, 3), (2, 0, 4), (3, 4, 0), (4, 3, 1))
((0, 1, 2), (1, 2, 3), (2, 0, 4), (3, 4, 1), (4, 3, 0))
((0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 0), (4, 0, 1))
((0, 1, 2), (1, 2, 3), (2, 4, 0), (3, 0, 4), (4, 3, 1))
((0, 1, 2), (1, 2, 3), (2, 4, 1), (3, 0, 4), (4, 3, 0))
((0, 1, 2), (1, 2, 4), (2, 0, 3), (3, 4, 0), (4, 3, 1))
((0, 1, 2), (1, 2, 4), (2, 0, 3), (3, 4, 1), (4, 3, 0))
((0, 1, 2), (1, 2, 4), (2, 3, 0), (3, 4, 1), (4, 0, 3))
((0, 1, 2), (1, 2, 4), (2, 3, 1), (3, 4, 0), (4, 0, 3))
((0, 1, 2), (1, 2, 4), (2, 4, 3), (3, 0, 1), (4, 3, 0))

So, you can generate solutions like this, but it takes time. 因此,您可以生成这样的解决方案,但这需要时间。 If you were going to utilize this, I would cache the solutions generated as is, and then randomly pull from them when you need them for whatever number n. 如果你打算利用这个,我会缓存按原样生成的解决方案,然后当你需要它们时随机拉出它们。 Incidentally, n=5 took a little under a minute to complete, brute force. 顺便说一句,n = 5花了不到一分钟才完成,蛮力。 Since the solution is O(n**2), I expect n=6 to take over an hour, n=7 over a day. 由于解是O(n ** 2),我预计n = 6需要一个多小时,n = 7一天。 The only way you can get return a true randomized solution is by doing it this way. 回归真正的随机解决方案的唯一方法就是这样做。

Edited: Random Solution without equal distribution: 编辑:随机解决方案没有平等分配:

The following is code I wrote in attempting to solve this problem, an implementation of Solution 2 . 以下是我在尝试解决此问题时编写的代码,即解决方案2的实现。 I figured I would post it, since it is a partial, non-equal distribution solution, and generates every possible solution , guaranteed, given enough time. 我想我会发布它,因为它是一个部分的,不平等的分发解决方案,并且在给定足够的时间的情况下生成所有可能的解决方案

def seeder(n):
    """ Randomly generates the first element in a solution. """
    seed = [0]
    a = range(1, n)
    for i in range(1, 3):
        seed.append(a.pop(random.randint(0,len(a)-1)))
    return [seed]

def extend_seed(seed, n):
    """ Semi-Randomly generates the next element in a solution. """
    next_seed = [seed[-1][0] + 1]
    candidates = range(0, n)
    for i in range(1, 3):
        c = candidates[:]
        for s in next_seed:
            c.remove(s)
        for s in seed:
            try:
                c.remove(s[i])
            except ValueError:
                pass
        next_seed.append(c.pop(random.randint(0,len(c)-1)))
    seed.append(next_seed)
    return seed

def combo(n):
    """ Extends seed until exhausted. 
    Some random generations return results shorter than n. """
    seed = seeder(n)
    while True:
        try:
            extend_seed(seed, n)
        except (ValueError, IndexError):
            return seed

def combos(n):
    """ Ensures that the final return is of length n. """
    while True:
        result = combo(n)
        if len(result) == n:
            return result

You essentially want a Latin square , anxn grid of numbers where each row and each column contains each number exactly once, except you only care about the first three numbers in each row (a Latin rectangle). 你基本上想要一个拉丁方 ,焦数网格,其中每一行和每一列只包含一个数字,除了你只关心每一行中的前三个数字(一个拉丁矩形)。

Update: 更新:

I've erased my ineffective code sample, because generating random Latin squares with equal probability is non-trivial, as discussed in a question on math.stackexchange.com . 我已经删除了我无效的代码示例,因为生成具有相等概率的随机拉丁方是非常重要的,正如math.stackexchange.com上的一个问题所讨论的那样

The SAGE project implements the algorithm mentioned in that question, so you may look at the code for inspiration. SAGE项目实现了该问题中提到的算法,因此您可以查看代码以获取灵感。

Alternatively, if you really want to get into the details, check out this paper for the specific case of generating random Latin rectangles. 或者,如果您真的想了解详细信息,请查看本文以了解生成随机拉丁矩形的具体情况。

Actually itertools has solved this for you already. 实际上,itertools已经为你解决了这个问题。

import itertools

allp = [x for x in itertools.permutations(range(3))]
print allp

mylist = ['A','B','C']
allp2 = [x for x in itertools.permutations(mylist)]
print allp2

output 产量

[(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)]
[('A', 'B', 'C'), ('A', 'C', 'B'), ('B', 'A', 'C'), ('B', 'C', 'A'), ('C', 'A', 'B'), ('C', 'B', 'A')]

Just a different viewpoint to your problem. 对你的问题只是一个不同的观点。 See if this works for you 看看这是否适合你

>>> from itertools import chain,combinations
>>> def get_combinations_without_duplicates(iterable):
        return (tuple(chain(*(set(iterable) - set(e) , e))) for e in combinations(iterable,2))

>>> list(get_combinations_without_duplicates(range(3)))
[(2, 0, 1), (1, 0, 2), (0, 1, 2)]

A simple list rotation provides a correct solution for all n >= 3: 简单的列表轮换为所有n> = 3提供了正确的解决方案:

Consider the rotation solution for n = 5: 考虑n = 5的旋转解:

[
    (0, 1, 2),
    (1, 2, 3),
    (2, 3, 4),
    (3, 4, 0),
    (4, 0, 1)
]

Each number appears in each position only once, and for each position all numbers are present. 每个数字仅在每个位置出现一次,并且对于每个位置,所有数字都存在。


In general, len(get_combinations_without_duplicates(n)) == n for n >= 3 通常,对于n> = 3, len(get_combinations_without_duplicates(n)) == n

here is an approach that takes advantage of deque.rotate 这是一种利用deque.rotate的方法

>>> datax = []
>>> from collections import deque
>>> data_length = 10
>>> subset_length = 3
>>> for i in xrange(0, subset_length):
...     datax.append(deque(xrange(data_length)))
...
>>> for i in xrange(0, subset_length):
...     datax[i].rotate(i)
...
>>> print zip(*datax)
[(0, 9, 8), (1, 0, 9), (2, 1, 0), (3, 2, 1), (4, 3, 2), (5, 4, 3), (6, 5, 4), (7, 6, 5), (8, 7, 6), (9, 8, 7)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM