簡體   English   中英

查找 Python 中所有可能的大小寫排列,避免重復

[英]Finding all possible case permutations in Python with variations avoiding duplicates

給定一個單詞和字符列表,我需要生成單詞變體(小寫、大寫等)和所有可能的排列以避免重復(不區分大小寫)。

例子

words = ["one", "two"]
chars = ["!"]

單詞變體:

one, One, ONE, oNE, two, Two, TWO, tWO

...和可能的(增量)排列:

one, one!, !one, One, One!, !One, ONE, ONE!, !ONE, two, two! ...,
...onetwo, onetwo!, !onetwo, Onetwo, Onetwo!, !Onetwo, ...
...OneTwo, OneTwo!, !OneTwo, ...
...twoone, twoone!, !twoone, ...etc.

但不是:

oneOne, oneONE, oneoNE, ...twoTwo, twoTWO, twotWO...

這是我的 Python 代碼:


words = ["one", "two"]
chars = ["2022", "!", "_"]

file_permuted = "permuted_words.txt"

transformed_words = []
words_to_permute = []
permuted_words = []
counter = 0
total_counter = 0

for word in words:

    # Apply the case transformations: word, WORD, Word, wORD
    lowercase_all = word.lower()
    uppercase_all = word.upper()
    capitalize_first = word.capitalize()
    toggle_case =  capitalize_first.swapcase()

    # Add the transformed words to the list
    transformed_words.append(lowercase_all)
    transformed_words.append(uppercase_all)
    transformed_words.append(capitalize_first)
    transformed_words.append(toggle_case)

words_to_permute = transformed_words + chars

print("Generating permutations...")
with open(file_permuted, "w") as f:
    for i in range(1, len(words_to_permute) + 1):
        for permutation in itertools.permutations(words_to_permute, i):
            len_set_permutation = len(set(list(map(lambda x: x.lower(), permutation))))
            if (len_set_permutation == len(permutation)):
                f.write("".join(permutation) + "\n")
                if (counter == 100):
                    total_counter += counter
                    print('Processed {0} items'.format(str(total_counter)))
                    counter = 0

                counter += 1

請給我一個更好/更優雅和有效的方法。

像這樣:

def word_casing (word):
    if 0 == len(word):
        yield ""
    else:
        char = word[0]
        for next_word in word_casing(word[1:]):
            yield char + next_word
            yield char.upper() + next_word

def word_casing_fn (word):
    def inner ():
        yield from word_casing(word)
    return inner

def char_listing_fn (char):
    def inner ():
        yield char
    return inner

def subsets (items):
    if 0 == len(items):
        yield []
    else:
        for s in subsets(items[1:]):
            yield s
            yield [items[0]] + s

def nonempty_subsets (items):
    for s in subsets(items):
        if 0 < len(s):
            yield s

def permutations (items):
    if 0 == len(items):
        yield []
    else:
        for i in range(len(items)):
            if 0 == i:
                for p in permutations(items[1:]):
                    yield [items[0]] + p
            else:
                (items[0], items[i]) = (items[i], items[0])
                for p in permutations(items[1:]):
                    yield [items[0]] + p
                (items[0], items[i]) = (items[i], items[0])

def list_fn_combinations (list_fns):
    if 1 == len(list_fns):
        for item in list_fns[0]():
            yield item
    elif 1 < len(list_fns):
        for item in list_fns[0]():
            for comb in list_fn_combinations(list_fns[1:]):
                yield item + comb

def all_combs (words, chars):
    list_fns = []
    for word in words:
        list_fns.append(word_casing_fn(word))
    for char in chars:
        list_fns.append(char_listing_fn(char))
    for s in nonempty_subsets(list_fns):
        for p in permutations(s):
            for c in list_fn_combinations(p):
                yield c


for w in all_combs(["one", "two"], ["!"]):
    print(w)

您可以逐步執行此操作,但請注意數據大小,它會很快變得巨大:

  • 准備每個W單詞的 4 個變體
  • 計算變化列表的笛卡爾積; 這將創建T = 4**W元組
  • 計算每個元組的排列; 這將創建P = T * W! 元組
  • 最后用C字符之一生成每個元組的版本,之前或之后,以及 output 原始元組及其所有版本; 項目總數將為V = P * (2*C+1)

因此,對於 2 個單詞和 1 個字符的最小示例,我們將獲得 96 個版本; 但是如果我們有 4 個單詞和 3 個字符,我們會生成一些 43k 版本: 4**4 * 4! * (2*3+1) 4**4 * 4! * (2*3+1) 對於 10 個單詞,我們會有一些 2.66E13 版本, 4**10 * 10! * (2*3+1) 4**10 * 10! * (2*3+1) ,或者,假設所有單詞的長度都是5,超過1PB的memory占用。 我們真的需要 go 作為發電機!

我的代碼:

def get_variants(w):
    w_cap = w.capitalize()
    return (w.upper(), w.lower(), w_cap, w_cap.swapcase())

def get_product(words):
    variants = [get_variants(w) for w in words]
    for p in product(*variants):
        yield p

def get_perms(words):
    for prod in get_product(words):
        for perm in permutations(prod):
            yield perm

def get_versions(words, chars):
    for p in get_perms(words):
        s = ''.join(p)
        yield s
        for ch in chars:
            yield ch+s
            yield s+ch

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM