简体   繁体   中英

create all possible combinations with multiple variants from list

Ok so the problem is as follows:

let's say I have a list like this [12R,102A,102L,250L] what I would want is a list of all possible combinations, however for only one combination/number. so for the example above, the output I would like is:

[12R,102A,250L]
[12R,102L,250L]

my actual problem is a lot more complex with many more sites. Thanks for your help

edit: after reading some comments I guess this is slightly unclear. I have 3 unique numbers here, [12, 102, and 250] and for some numbers, I have different variations, for example [102A, 102L]. what I need is a way to combine the different positions[12,102,250] and all possible variations within. just like the lists, I presented above. they are the only valid solutions. [12R] is not. neither is [12R,102A,102L,250L]. so far I have done this with nested loops, but I have a LOT of variation within these numbers, so I can't really do that anymore

ill edit this again: ok so it seems as though there is still some confusion so I might extend the point I made before. what I am dealing with there is DNA. 12R means the 12th position in the sequence was changed to an R. so the solution [12R,102A,250L] means that the amino acid on position 12 is R, 102 is A 250 is L.

this is why a solution like [102L, 102R, 250L] is not usable, because the same position can not be occupied by 2 different amino acids.

thank you

So it works with ["10A","100B","12C","100R"] (case 1) and ['12R','102A','102L','250L'] (case 2)

import itertools as it

liste = ['12R','102A','102L','250L']

comb = []
for e in it.combinations(range(4), 3):
    e1 = liste[e[0]][:-1]
    e2 = liste[e[1]][:-1]
    e3 = liste[e[2]][:-1]
    if e1 != e2 and e2 != e3 and e3 != e1:
        comb.append([e1+liste[e[0]][-1], e2+liste[e[1]][-1], e3+liste[e[2]][-1]])
print(list(comb))
# case 1 : [['10A', '100B', '12C'], ['10A', '12C', '100R']]
# case 2 : [['12R', '102A', '250L'], ['12R', '102L', '250L']]

Try this:

from itertools import groupby
import re

def __genComb(arr, res=[]):
    for i in range(len(res), len(arr)):
        el=arr[i]
        if(len(el[1])==1):
            res+=el[1]
        else:
            for el_2 in el[1]:
                yield from __genComb(arr, res+[el_2])
            break
    if(len(res)==len(arr)): yield res

def genComb(arr):
    res=[(k, list(v)) for k,v in groupby(sorted(arr), key=lambda x: re.match(r"(\d*)", x).group(1))]
    yield from __genComb(res)

Sample output (using the input you provided):

test=["12R","102A","102L","250L"]

for el in genComb(test):
    print(el)

# returns:

['102A', '12R', '250L']
['102L', '12R', '250L']

You can use a recursive generator function:

from itertools import groupby as gb
import re

def combos(d, c = []):
  if not d:
     yield c
  else:
     for a, b in d[0]:
       yield from combos(d[1:], c + [a+b]) 

d = ['12R', '102A', '102L', '250L']
vals = [re.findall('^\d+|\w+$', i) for i in d]
new_d = [list(b) for _, b in gb(sorted(vals, key=lambda x:x[0]), key=lambda x:x[0])]
print(list(combos(new_d)))

Output:

[['102A', '12R', '250L'], ['102L', '12R', '250L']]
import re

def get_grouped_options(input):
     options = {}
     for option in input:
          m = re.match('([\d]+)([A-Z])$', option)
          if m:
               position = int(m.group(1))
               acid = m.group(2)
          else:
               continue
          if position not in options:
               options[position] = []
          options[position].append(acid)
     return options


def yield_all_combos(options):
     n = len(options)
     positions = list(options.keys())
     indices = [0] * n
     while True:
          yield ["{}{}".format(position, options[position][indices[i]])
                 for i, position in enumerate(positions)]
          j = 0
          indices[j] += 1
          while indices[j] == len(options[positions[j]]):
               # carry
               indices[j] = 0
               j += 1
               if j == n:
                    # overflow
                    return
               indices[j] += 1


input = ['12R', '102A', '102L', '250L']

options = get_grouped_options(input)

for combo in yield_all_combos(options):
     print("[{}]".format(",".join(combo)))

Gives:

[12R,102A,250L]
[12R,102L,250L]

I believe this is what you're looking for!

This works by

  • generating a collection of all the postfixes each prefix can have
  • finding the total count of positions (multiply the length of each sublist together)
  • rotating through each postfix by basing the read index off of both its member postfix position in the collection and the absolute result index (known location in final results)
import collections
import functools
import operator
import re

# initial input
starting_values = ["12R","102A","102L","250L"]

d = collections.defaultdict(list)  # use a set if duplicates are possible
for value in starting_values:
    numeric, postfix = re.match(r"(\d+)(.*)", value).groups()
    d[numeric].append(postfix)  # .* matches ""; consider (postfix or "_") to give value a size

# d is now a dictionary of lists where each key is the prefix
# and each value is a list of possible postfixes


# each set of postfixes multiplies the total combinations by its length
total_combinations = functools.reduce(
    operator.mul,
    (len(sublist) for sublist in d.values())
)

results = collections.defaultdict(list)
for results_pos in range(total_combinations):
    for index, (prefix, postfix_set) in enumerate(d.items()):
        results[results_pos].append(
            "{}{}".format(  # recombine the values
                prefix,     # numeric prefix
                postfix_set[(results_pos + index) % len(postfix_set)]
            ))

# results is now a dictionary mapping { result index: unique list }

displaying

# set width of column by longest prefix string
# need a collection for intermediate cols, but beyond scope of Q
col_width = max(len(str(k)) for k in results)
for k, v in results.items():
    print("{:<{w}}: {}".format(k, v, w=col_width))


0: ['12R', '102L', '250L']
1: ['12R', '102A', '250L']

with a more advanced input

["12R","102A","102L","250L","1234","1234A","1234C"]

0: ['12R', '102L', '250L', '1234']
1: ['12R', '102A', '250L', '1234A']
2: ['12R', '102L', '250L', '1234C']
3: ['12R', '102A', '250L', '1234']
4: ['12R', '102L', '250L', '1234A']
5: ['12R', '102A', '250L', '1234C']

You can confirm the values are indeed unique with a set

final = set(",".join(x) for x in results.values())
for f in final:
    print(f)

12R,102L,250L,1234
12R,102A,250L,1234A
12R,102L,250L,1234C
12R,102A,250L,1234
12R,102L,250L,1234A
12R,102A,250L,1234C

notes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM