简体   繁体   中英

Filtering out a generator

Whats the best way to filter out some subsets from a generator. For example I have a string "1023" and want to produce all possible combinations of each of the digits. All combinations would be:

['1', '0', '2', '3']
['1', '0', '23']
['1', '02', '3']
['1', '023']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']

I am not interested in a subset that contains a leading 0 on any of the items, so the valid ones are:

['1', '0', '2', '3']
['1', '0', '23']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']

I have two questions.

1) If using a generator, whats the best way to filter out the ones with leading zeroes. Currently, I generate all combinations then loop through it afterwards and only continuing if the subset is valid. For simplicity I am only printing the subset in the sample code. Assuming the generator that was created is very long or if it constains a lot of invalid subsets, its almost a waste to loop through the entire generator. Is there a way to stop the generator when it sees an invalid item (one with leading zero) then filter it off 'allCombinations'

2) If the above doesn't exist, whats a better way to generate these combinations (disregarding combinations with leading zeroes).

Code using a generator:

import itertools

def isValid(subset):         ## DIGITS WITH LEADING 0 IS NOT VALID
    valid = True
    for num in subset:
        if num[0] == '0' and len(num) > 1:
            valid = False
            break

    return valid

def get_combinations(source, comb):
    res = ""
    for x, action in zip(source, comb + (0,)):
        res += x
        if action == 0:
            yield res
            res = ""

digits = "1023"
allCombinations = [list(get_combinations(digits, c)) for c in itertools.product((0, 1), repeat=len(digits) - 1)]


for subset in allCombinations:   ## LOOPS THROUGH THE ENTIRE GENERATOR
    if isValid(subset):
        print(subset)

Filtering for an easy and obvious condition like "no leading zeros", it can be more efficiently done at the combination building level.

def generate_pieces(input_string, predicate):
    if input_string:
        if predicate(input_string):
            yield [input_string]
        for item_size in range(1, len(input_string)+1):
            item = input_string[:item_size]
            if not predicate(item):
                continue
            rest = input_string[item_size:]
            for rest_piece in generate_pieces(rest, predicate):
                yield [item] + rest_piece

Generating every combination of cuts, so long it's not even funny:

>>> list(generate_pieces('10002', lambda x: True))
[['10002'], ['1', '0002'], ['1', '0', '002'], ['1', '0', '0', '02'], ['1', '0', '0', '0', '2'], ['1', '0', '00', '2'], ['1', '00', '02'], ['1', '00', '0', '2'], ['1', '000', '2'], ['10', '002'], ['10', '0', '02'], ['10', '0', '0', '2'], ['10', '00', '2'], ['100', '02'], ['100', '0', '2'], ['1000', '2']]

Only those where no fragment has leading zeros:

>>> list(generate_pieces('10002', lambda x: not x.startswith('0')))
[['10002'], ['1000', '2']]

Substrings that start with a zero were never considered for the recursive step.

One common solution is to try filtering just before using yield . I have given you an example of filtering just before yield:

import itertools

def my_gen(my_string):

    # Create combinations
    for length in range(len(my_string)):
        for my_tuple in itertools.combinations(my_string, length+1):

            # This is the string you would like to output
            output_string = "".join(my_tuple)

            # filter here:
            if output_string[0] != '0':
                yield output_string


my_string = '1023'
print(list(my_gen(my_string)))

EDIT: Added in a generator comprehension alternative

import itertools

my_string = '1023'
my_gen = ("".join(my_tuple)[0] for length in range(len(my_string))
                      for my_tuple in itertools.combinations(my_string, length+1)
                      if "".join(my_tuple)[0] != '0')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM