从列表中创建具有多个变体的所有可能组合

Question

Ok so the problem is as follows:好的，所以问题如下：

let's say I have a list like this [12R,102A,102L,250L] what I would want is a list of all possible combinations, however for only one combination/number.假设我有一个像这样的列表[12R,102A,102L,250L]我想要的是所有可能组合的列表，但是只有一个组合/数字。 so for the example above, the output I would like is:所以对于上面的例子，我想要的 output 是：

[12R,102A,250L]
[12R,102L,250L]

my actual problem is a lot more complex with many more sites.我的实际问题要复杂得多，还有更多站点。 Thanks for your help谢谢你的帮助

edit: after reading some comments I guess this is slightly unclear.编辑：在阅读了一些评论后，我想这有点不清楚。 I have 3 unique numbers here, [12, 102, and 250] and for some numbers, I have different variations, for example [102A, 102L].我在这里有 3 个唯一数字，[12、102 和 250]，对于某些数字，我有不同的变体，例如 [102A、102L]。 what I need is a way to combine the different positions[12,102,250] and all possible variations within.我需要的是一种方法来组合不同的位置[12,102,250] 和所有可能的变化。 just like the lists, I presented above.就像我在上面介绍的列表一样。 they are the only valid solutions.它们是唯一有效的解决方案。 [12R] is not. [12R] 不是。 neither is [12R,102A,102L,250L]. [12R,102A,102L,250L] 也不是。 so far I have done this with nested loops, but I have a LOT of variation within these numbers, so I can't really do that anymore到目前为止，我已经用嵌套循环做到了这一点，但是这些数字中有很多变化，所以我不能再这样做了

ill edit this again: ok so it seems as though there is still some confusion so I might extend the point I made before.生病再次编辑：好的，所以似乎仍然存在一些混乱，所以我可能会扩展我之前提出的观点。 what I am dealing with there is DNA.我正在处理的是DNA。 12R means the 12th position in the sequence was changed to an R. 12R 表示序列中的第 12 个 position 更改为 R。 so the solution [12R,102A,250L] means that the amino acid on position 12 is R, 102 is A 250 is L.所以解[12R,102A,250L]表示position 12上的氨基酸是R，102是A 250是L。

this is why a solution like [102L, 102R, 250L] is not usable, because the same position can not be occupied by 2 different amino acids.这就是为什么像 [102L, 102R, 250L] 这样的解决方案不可用的原因，因为同一个 position 不能被 2 个不同的氨基酸占据。

thank you谢谢你

Answer 1

So it works with ["10A","100B","12C","100R"] (case 1) and ['12R','102A','102L','250L'] (case 2)所以它适用于["10A","100B","12C","100R"] （案例 1）和['12R','102A','102L','250L'] （案例 2）

import itertools as it

liste = ['12R','102A','102L','250L']

comb = []
for e in it.combinations(range(4), 3):
    e1 = liste[e[0]][:-1]
    e2 = liste[e[1]][:-1]
    e3 = liste[e[2]][:-1]
    if e1 != e2 and e2 != e3 and e3 != e1:
        comb.append([e1+liste[e[0]][-1], e2+liste[e[1]][-1], e3+liste[e[2]][-1]])
print(list(comb))
# case 1 : [['10A', '100B', '12C'], ['10A', '12C', '100R']]
# case 2 : [['12R', '102A', '250L'], ['12R', '102L', '250L']]

Answer 2

Try this:尝试这个：

from itertools import groupby
import re

def __genComb(arr, res=[]):
    for i in range(len(res), len(arr)):
        el=arr[i]
        if(len(el[1])==1):
            res+=el[1]
        else:
            for el_2 in el[1]:
                yield from __genComb(arr, res+[el_2])
            break
    if(len(res)==len(arr)): yield res

def genComb(arr):
    res=[(k, list(v)) for k,v in groupby(sorted(arr), key=lambda x: re.match(r"(\d*)", x).group(1))]
    yield from __genComb(res)

Sample output (using the input you provided):示例 output（使用您提供的输入）：

test=["12R","102A","102L","250L"]

for el in genComb(test):
    print(el)

# returns:

['102A', '12R', '250L']
['102L', '12R', '250L']

Answer 3

You can use a recursive generator function:您可以使用递归生成器 function：

from itertools import groupby as gb
import re

def combos(d, c = []):
  if not d:
     yield c
  else:
     for a, b in d[0]:
       yield from combos(d[1:], c + [a+b]) 

d = ['12R', '102A', '102L', '250L']
vals = [re.findall('^\d+|\w+$', i) for i in d]
new_d = [list(b) for _, b in gb(sorted(vals, key=lambda x:x[0]), key=lambda x:x[0])]
print(list(combos(new_d)))

Output: Output：

[['102A', '12R', '250L'], ['102L', '12R', '250L']]

Answer 4

import re

def get_grouped_options(input):
     options = {}
     for option in input:
          m = re.match('([\d]+)([A-Z])$', option)
          if m:
               position = int(m.group(1))
               acid = m.group(2)
          else:
               continue
          if position not in options:
               options[position] = []
          options[position].append(acid)
     return options


def yield_all_combos(options):
     n = len(options)
     positions = list(options.keys())
     indices = [0] * n
     while True:
          yield ["{}{}".format(position, options[position][indices[i]])
                 for i, position in enumerate(positions)]
          j = 0
          indices[j] += 1
          while indices[j] == len(options[positions[j]]):
               # carry
               indices[j] = 0
               j += 1
               if j == n:
                    # overflow
                    return
               indices[j] += 1


input = ['12R', '102A', '102L', '250L']

options = get_grouped_options(input)

for combo in yield_all_combos(options):
     print("[{}]".format(",".join(combo)))

Gives:给出：

[12R,102A,250L]
[12R,102L,250L]

Answer 5

I believe this is what you're looking for!我相信这就是你要找的！

This works by这通过

generating a collection of all the postfixes each prefix can have生成每个前缀可以具有的所有后缀的集合
finding the total count of positions (multiply the length of each sublist together)查找位置总数（将每个子列表的长度相乘）
rotating through each postfix by basing the read index off of both its member postfix position in the collection and the absolute result index (known location in final results)通过基于集合中其成员后缀 position 的读取索引和绝对结果索引（最终结果中的已知位置）来旋转每个后缀

import collections
import functools
import operator
import re

# initial input
starting_values = ["12R","102A","102L","250L"]

d = collections.defaultdict(list)  # use a set if duplicates are possible
for value in starting_values:
    numeric, postfix = re.match(r"(\d+)(.*)", value).groups()
    d[numeric].append(postfix)  # .* matches ""; consider (postfix or "_") to give value a size

# d is now a dictionary of lists where each key is the prefix
# and each value is a list of possible postfixes


# each set of postfixes multiplies the total combinations by its length
total_combinations = functools.reduce(
    operator.mul,
    (len(sublist) for sublist in d.values())
)

results = collections.defaultdict(list)
for results_pos in range(total_combinations):
    for index, (prefix, postfix_set) in enumerate(d.items()):
        results[results_pos].append(
            "{}{}".format(  # recombine the values
                prefix,     # numeric prefix
                postfix_set[(results_pos + index) % len(postfix_set)]
            ))

# results is now a dictionary mapping { result index: unique list }

displaying显示

# set width of column by longest prefix string
# need a collection for intermediate cols, but beyond scope of Q
col_width = max(len(str(k)) for k in results)
for k, v in results.items():
    print("{:<{w}}: {}".format(k, v, w=col_width))


0: ['12R', '102L', '250L']
1: ['12R', '102A', '250L']

with a more advanced input具有更高级的输入

["12R","102A","102L","250L","1234","1234A","1234C"]

0: ['12R', '102L', '250L', '1234']
1: ['12R', '102A', '250L', '1234A']
2: ['12R', '102L', '250L', '1234C']
3: ['12R', '102A', '250L', '1234']
4: ['12R', '102L', '250L', '1234A']
5: ['12R', '102A', '250L', '1234C']

You can confirm the values are indeed unique with a set您可以通过一set确认这些值确实是唯一的

final = set(",".join(x) for x in results.values())
for f in final:
    print(f)

12R,102L,250L,1234
12R,102A,250L,1234A
12R,102L,250L,1234C
12R,102A,250L,1234
12R,102L,250L,1234A
12R,102A,250L,1234C

notes笔记

in cPython, regexes are cached after their first compile在 cPython 中，正则表达式在第一次编译后被缓存
list member multiplier from "How can I multiply all items in a list together with Python?"列表成员乘数来自“如何将列表中的所有项目与 Python 相乘？”

从列表中创建具有多个变体的所有可能组合

问题描述

5 个解决方案

解决方案1
0 2020-05-31 14:00:49

解决方案2
0 2020-05-31 14:16:40

解决方案3
0 2020-05-31 15:41:05

解决方案4
0 2020-05-31 16:39:51

解决方案5
0 2020-10-12 19:07:16

从列表中创建具有多个变体的所有可能组合

问题描述

5 个解决方案

解决方案1 0 2020-05-31 14:00:49

解决方案2 0 2020-05-31 14:16:40

解决方案3 0 2020-05-31 15:41:05

解决方案4 0 2020-05-31 16:39:51

解决方案5 0 2020-10-12 19:07:16

解决方案1
0 2020-05-31 14:00:49

解决方案2
0 2020-05-31 14:16:40

解决方案3
0 2020-05-31 15:41:05

解决方案4
0 2020-05-31 16:39:51

解决方案5
0 2020-10-12 19:07:16