简体   繁体   English

订单排列

[英]Permutations with Order

I am trying to write a Python function that performs a function similar to itertools.permutation . 我正在尝试编写一个执行类似于itertools.permutation的函数的Python函数。

import itertools
for s in itertools.permutations("TCGA****")
    print s

The ideal output from such a function would be 这种功能的理想输出是

('*','*','*','*','T', 'C','G','A')
('*','*','*','T','*', 'C','G','A')
('*','*','*','T','C', '*','G','A')
('*','*','*','T','C', 'G','*','A')
('*','*','*','T','C', 'G','A','*')
('*','*','T','C','G', 'A','*','*')
('*','*','T','C','G', '*','*','A')
('*','*','T','C','*', '*','G','A')
...
('T', 'C','G','A','*','*','*','*')

The only difference between itertools.permutation and this function is that the order is maintained ie 'T' always precedes 'C' which precedes 'G' which precedes 'A'. itertools.permutation与此函数之间的唯一区别在于维持顺序,即'T'始终位于'C'之前,'C'之前是'A'之前的'G'。

The following is an example that violates this rule 以下是违反此规则的示例

('*','*','T','*','G','C','A','*','*')

The order of 'C' and 'G' has changed. 'C'和'G'的顺序已经改变。

How can I produce the permutations for which the order 'TCGA' is maintained among the asterisks? 如何在星号中生成维持订单'TCGA'的排列?

One idea would be to produce all the possible indices for your '*' values with itertools.combinations on your list index range, and then construct each possible permutation from those indices, filling with your 'TCGA' values accordingly for the indices not found in each combination. 一个想法是使用列表索引范围内的itertools.combinations为您的'*'值生成所有可能的索引,然后根据这些索引构造每个可能的排列,相应地填充您的'TCGA'值以查找未找到的索引每个组合。

Since you are assured to use all of TCGA in each iteration, itertools.cycle is one way to continually get the appropriate value for the next position. 由于您确保在每次迭代中都使用所有TCGA ,因此itertools.cycle是一种持续为下一个位置获取适当值的方法。 Here perms is implemented as a generator to allow for lazy evaluation. 这里perms被实现为生成器以允许惰性评估。

from itertools import combinations, cycle

char_cyc = cycle('TCGA')
combos = combinations(range(8), 4)

perms = (['*' if i in combo else next(char_cyc) for i in range(8)]
         for combo in combos)

print(list(perms))

Outputs : 产出

[['*', '*', '*', '*', 'T', 'C', 'G', 'A'], ['*', '*', '*', 'T', '*', 'C', 'G', 'A'], ['*', '*', '*', 'T', 'C', '*', 'G', 'A'], ['*', '*', '*', 'T', 'C', 'G', '*', 'A'], ['*', '*', '*', 'T', 'C', 'G', 'A', '*'], ['*', '*', 'T', '*', '*', 'C', 'G', 'A'], ['*', '*', 'T', '*', 'C', '*', 'G', 'A'], ['*', '*', 'T', '*', 'C', 'G', '*', 'A'], ['*', '*', 'T', '*', 'C', 'G', 'A', '*'], ['*', '*', 'T', 'C', '*', '*', 'G', 'A'], ['*', '*', 'T', 'C', '*', 'G', '*', 'A'], ['*', '*', 'T', 'C', '*', 'G', 'A', '*'], ['*', '*', 'T', 'C', 'G', '*', '*', 'A'], ['*', '*', 'T', 'C', 'G', '*', 'A', '*'], ['*', '*', 'T', 'C', 'G', 'A', '*', '*'], ['*', 'T', '*', '*', '*', 'C', 'G', 'A'], ['*', 'T', '*', '*', 'C', '*', 'G', 'A'], ['*', 'T', '*', '*', 'C', 'G', '*', 'A'], ['*', 'T', '*', '*', 'C', 'G', 'A', '*'], ['*', 'T', '*', 'C', '*', '*', 'G', 'A'], ['*', 'T', '*', 'C', '*', 'G', '*', 'A'], ['*', 'T', '*', 'C', '*', 'G', 'A', '*'], ['*', 'T', '*', 'C', 'G', '*', '*', 'A'], ['*', 'T', '*', 'C', 'G', '*', 'A', '*'], ['*', 'T', '*', 'C', 'G', 'A', '*', '*'], ['*', 'T', 'C', '*', '*', '*', 'G', 'A'], ['*', 'T', 'C', '*', '*', 'G', '*', 'A'], ['*', 'T', 'C', '*', '*', 'G', 'A', '*'], ['*', 'T', 'C', '*', 'G', '*', '*', 'A'], ['*', 'T', 'C', '*', 'G', '*', 'A', '*'], ['*', 'T', 'C', '*', 'G', 'A', '*', '*'], ['*', 'T', 'C', 'G', '*', '*', '*', 'A'], ['*', 'T', 'C', 'G', '*', '*', 'A', '*'], ['*', 'T', 'C', 'G', '*', 'A', '*', '*'], ['*', 'T', 'C', 'G', 'A', '*', '*', '*'], ['T', '*', '*', '*', '*', 'C', 'G', 'A'], ['T', '*', '*', '*', 'C', '*', 'G', 'A'], ['T', '*', '*', '*', 'C', 'G', '*', 'A'], ['T', '*', '*', '*', 'C', 'G', 'A', '*'], ['T', '*', '*', 'C', '*', '*', 'G', 'A'], ['T', '*', '*', 'C', '*', 'G', '*', 'A'], ['T', '*', '*', 'C', '*', 'G', 'A', '*'], ['T', '*', '*', 'C', 'G', '*', '*', 'A'], ['T', '*', '*', 'C', 'G', '*', 'A', '*'], ['T', '*', '*', 'C', 'G', 'A', '*', '*'], ['T', '*', 'C', '*', '*', '*', 'G', 'A'], ['T', '*', 'C', '*', '*', 'G', '*', 'A'], ['T', '*', 'C', '*', '*', 'G', 'A', '*'], ['T', '*', 'C', '*', 'G', '*', '*', 'A'], ['T', '*', 'C', '*', 'G', '*', 'A', '*'], ['T', '*', 'C', '*', 'G', 'A', '*', '*'], ['T', '*', 'C', 'G', '*', '*', '*', 'A'], ['T', '*', 'C', 'G', '*', '*', 'A', '*'], ['T', '*', 'C', 'G', '*', 'A', '*', '*'], ['T', '*', 'C', 'G', 'A', '*', '*', '*'], ['T', 'C', '*', '*', '*', '*', 'G', 'A'], ['T', 'C', '*', '*', '*', 'G', '*', 'A'], ['T', 'C', '*', '*', '*', 'G', 'A', '*'], ['T', 'C', '*', '*', 'G', '*', '*', 'A'], ['T', 'C', '*', '*', 'G', '*', 'A', '*'], ['T', 'C', '*', '*', 'G', 'A', '*', '*'], ['T', 'C', '*', 'G', '*', '*', '*', 'A'], ['T', 'C', '*', 'G', '*', '*', 'A', '*'], ['T', 'C', '*', 'G', '*', 'A', '*', '*'], ['T', 'C', '*', 'G', 'A', '*', '*', '*'], ['T', 'C', 'G', '*', '*', '*', '*', 'A'], ['T', 'C', 'G', '*', '*', '*', 'A', '*'], ['T', 'C', 'G', '*', '*', 'A', '*', '*'], ['T', 'C', 'G', '*', 'A', '*', '*', '*'], ['T', 'C', 'G', 'A', '*', '*', '*', '*']]

A good indication that is output is correct is the fact that the length of perms is 70, which is equal to 8C4 ( or "8 choose 4" ), which is effectively what your problem concerns. 输出的正确指示是正确的是perms的长度是70,等于8C4( 或“8选择4” ),这实际上是你的问题所关心的。

My solution is much less efficient than Mitch's , but it is another way to solve the problem, so it might interest you as well. 我的解决方案效率远低于Mitch ,但它是另一种解决问题的方法,所以它也可能让你感兴趣。

Here is my approach: generate all the possible permutations of "****XXXX" (40320 exactly), then, for each resulting permutation, replace each "X" by the corresponding value in "TGCA" in the wanted order. 这是我的方法:生成“**** XXXX”的所有可能的排列(精确地说40320),然后,对于每个结果排列,用所需顺序中的“TGCA”中的相应值替换每个“X”。 The flaw here is that there won't be 40320 distinct patterns, but only 70, which means: 这里的缺陷是不会有40320个不同的模式,但只有70个,这意味着:

  • we'll have to execute the "for" loop 40320 times when 70 would have been enough 当70已经足够时,我们将不得不执行“for”循环40320次
  • we'll have to store the generated permutations in order to ignore the duplicates 我们必须存储生成的排列以忽略重复

But as I said, it's another way of seeing the problem. 但正如我所说,这是看待问题的另一种方式。

>>> import itertools
>>> already_seen_permutations = set()
>>> for s in itertools.permutations("****XXXX"):
...     if s in already_seen_permutations:
...         continue  # duplicate permutation, just ignore it
...     already_seen_permutations.add(s)
...     # time to insert TCGA correctly
...     s = tuple("".join(s).replace("X", "T", 1).replace("X", "C", 1).replace("X", "G", 1).replace("X", "A", 1))
...     print(s)

On my computer, it takes roughly one second to execute the code 100 times. 在我的计算机上,执行代码大约需要一秒钟。 In term of performance, it's approximately the same than generating all the permutations of "****TCGA" and ignoring the ones that do not follow the "TCGA" order. 在性能方面,它与生成“**** TCGA”的所有排列并忽略不遵循“TCGA”顺序的排列大致相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM