從列表中創建具有多個變體的所有可能組合

Question

好的，所以問題如下：

假設我有一個像這樣的列表[12R,102A,102L,250L]我想要的是所有可能組合的列表，但是只有一個組合/數字。 所以對於上面的例子，我想要的 output 是：

[12R,102A,250L]
[12R,102L,250L]

我的實際問題要復雜得多，還有更多站點。 謝謝你的幫助

編輯：在閱讀了一些評論后，我想這有點不清楚。 我在這里有 3 個唯一數字，[12、102 和 250]，對於某些數字，我有不同的變體，例如 [102A、102L]。 我需要的是一種方法來組合不同的位置[12,102,250] 和所有可能的變化。 就像我在上面介紹的列表一樣。 它們是唯一有效的解決方案。 [12R] 不是。 [12R,102A,102L,250L] 也不是。 到目前為止，我已經用嵌套循環做到了這一點，但是這些數字中有很多變化，所以我不能再這樣做了

生病再次編輯：好的，所以似乎仍然存在一些混亂，所以我可能會擴展我之前提出的觀點。 我正在處理的是DNA。 12R 表示序列中的第 12 個 position 更改為 R。 所以解[12R,102A,250L]表示position 12上的氨基酸是R，102是A 250是L。

這就是為什么像 [102L, 102R, 250L] 這樣的解決方案不可用的原因，因為同一個 position 不能被 2 個不同的氨基酸占據。

謝謝你

Answer 1

所以它適用於["10A","100B","12C","100R"] （案例 1）和['12R','102A','102L','250L'] （案例 2）

import itertools as it

liste = ['12R','102A','102L','250L']

comb = []
for e in it.combinations(range(4), 3):
    e1 = liste[e[0]][:-1]
    e2 = liste[e[1]][:-1]
    e3 = liste[e[2]][:-1]
    if e1 != e2 and e2 != e3 and e3 != e1:
        comb.append([e1+liste[e[0]][-1], e2+liste[e[1]][-1], e3+liste[e[2]][-1]])
print(list(comb))
# case 1 : [['10A', '100B', '12C'], ['10A', '12C', '100R']]
# case 2 : [['12R', '102A', '250L'], ['12R', '102L', '250L']]

Answer 2

嘗試這個：

from itertools import groupby
import re

def __genComb(arr, res=[]):
    for i in range(len(res), len(arr)):
        el=arr[i]
        if(len(el[1])==1):
            res+=el[1]
        else:
            for el_2 in el[1]:
                yield from __genComb(arr, res+[el_2])
            break
    if(len(res)==len(arr)): yield res

def genComb(arr):
    res=[(k, list(v)) for k,v in groupby(sorted(arr), key=lambda x: re.match(r"(\d*)", x).group(1))]
    yield from __genComb(res)

示例 output（使用您提供的輸入）：

test=["12R","102A","102L","250L"]

for el in genComb(test):
    print(el)

# returns:

['102A', '12R', '250L']
['102L', '12R', '250L']

Answer 3

您可以使用遞歸生成器 function：

from itertools import groupby as gb
import re

def combos(d, c = []):
  if not d:
     yield c
  else:
     for a, b in d[0]:
       yield from combos(d[1:], c + [a+b]) 

d = ['12R', '102A', '102L', '250L']
vals = [re.findall('^\d+|\w+$', i) for i in d]
new_d = [list(b) for _, b in gb(sorted(vals, key=lambda x:x[0]), key=lambda x:x[0])]
print(list(combos(new_d)))

Output：

[['102A', '12R', '250L'], ['102L', '12R', '250L']]

Answer 4

import re

def get_grouped_options(input):
     options = {}
     for option in input:
          m = re.match('([\d]+)([A-Z])$', option)
          if m:
               position = int(m.group(1))
               acid = m.group(2)
          else:
               continue
          if position not in options:
               options[position] = []
          options[position].append(acid)
     return options


def yield_all_combos(options):
     n = len(options)
     positions = list(options.keys())
     indices = [0] * n
     while True:
          yield ["{}{}".format(position, options[position][indices[i]])
                 for i, position in enumerate(positions)]
          j = 0
          indices[j] += 1
          while indices[j] == len(options[positions[j]]):
               # carry
               indices[j] = 0
               j += 1
               if j == n:
                    # overflow
                    return
               indices[j] += 1


input = ['12R', '102A', '102L', '250L']

options = get_grouped_options(input)

for combo in yield_all_combos(options):
     print("[{}]".format(",".join(combo)))

給出：

[12R,102A,250L]
[12R,102L,250L]

Answer 5

我相信這就是你要找的！

這通過

生成每個前綴可以具有的所有后綴的集合
查找位置總數（將每個子列表的長度相乘）
通過基於集合中其成員后綴 position 的讀取索引和絕對結果索引（最終結果中的已知位置）來旋轉每個后綴

import collections
import functools
import operator
import re

# initial input
starting_values = ["12R","102A","102L","250L"]

d = collections.defaultdict(list)  # use a set if duplicates are possible
for value in starting_values:
    numeric, postfix = re.match(r"(\d+)(.*)", value).groups()
    d[numeric].append(postfix)  # .* matches ""; consider (postfix or "_") to give value a size

# d is now a dictionary of lists where each key is the prefix
# and each value is a list of possible postfixes


# each set of postfixes multiplies the total combinations by its length
total_combinations = functools.reduce(
    operator.mul,
    (len(sublist) for sublist in d.values())
)

results = collections.defaultdict(list)
for results_pos in range(total_combinations):
    for index, (prefix, postfix_set) in enumerate(d.items()):
        results[results_pos].append(
            "{}{}".format(  # recombine the values
                prefix,     # numeric prefix
                postfix_set[(results_pos + index) % len(postfix_set)]
            ))

# results is now a dictionary mapping { result index: unique list }

顯示

# set width of column by longest prefix string
# need a collection for intermediate cols, but beyond scope of Q
col_width = max(len(str(k)) for k in results)
for k, v in results.items():
    print("{:<{w}}: {}".format(k, v, w=col_width))


0: ['12R', '102L', '250L']
1: ['12R', '102A', '250L']

具有更高級的輸入

["12R","102A","102L","250L","1234","1234A","1234C"]

0: ['12R', '102L', '250L', '1234']
1: ['12R', '102A', '250L', '1234A']
2: ['12R', '102L', '250L', '1234C']
3: ['12R', '102A', '250L', '1234']
4: ['12R', '102L', '250L', '1234A']
5: ['12R', '102A', '250L', '1234C']

您可以通過一set確認這些值確實是唯一的

final = set(",".join(x) for x in results.values())
for f in final:
    print(f)

12R,102L,250L,1234
12R,102A,250L,1234A
12R,102L,250L,1234C
12R,102A,250L,1234
12R,102L,250L,1234A
12R,102A,250L,1234C

筆記

在 cPython 中，正則表達式在第一次編譯后被緩存
列表成員乘數來自“如何將列表中的所有項目與 Python 相乘？”

從列表中創建具有多個變體的所有可能組合

問題描述

5 個解決方案

解決方案1
0 2020-05-31 14:00:49

解決方案2
0 2020-05-31 14:16:40

解決方案3
0 2020-05-31 15:41:05

解決方案4
0 2020-05-31 16:39:51

解決方案5
0 2020-10-12 19:07:16

從列表中創建具有多個變體的所有可能組合

問題描述

5 個解決方案

解決方案1 0 2020-05-31 14:00:49

解決方案2 0 2020-05-31 14:16:40

解決方案3 0 2020-05-31 15:41:05

解決方案4 0 2020-05-31 16:39:51

解決方案5 0 2020-10-12 19:07:16

解決方案1
0 2020-05-31 14:00:49

解決方案2
0 2020-05-31 14:16:40

解決方案3
0 2020-05-31 15:41:05

解決方案4
0 2020-05-31 16:39:51

解決方案5
0 2020-10-12 19:07:16