查找 Python 中所有可能的大小寫排列，避免重復

Question

給定一個單詞和字符列表，我需要生成單詞變體（小寫、大寫等）和所有可能的排列以避免重復（不區分大小寫）。

例子

words = ["one", "two"]
chars = ["!"]

單詞變體：

one, One, ONE, oNE, two, Two, TWO, tWO

...和可能的（增量）排列：

one, one!, !one, One, One!, !One, ONE, ONE!, !ONE, two, two! ...,
...onetwo, onetwo!, !onetwo, Onetwo, Onetwo!, !Onetwo, ...
...OneTwo, OneTwo!, !OneTwo, ...
...twoone, twoone!, !twoone, ...etc.

但不是：

oneOne, oneONE, oneoNE, ...twoTwo, twoTWO, twotWO...

這是我的 Python 代碼：


words = ["one", "two"]
chars = ["2022", "!", "_"]

file_permuted = "permuted_words.txt"

transformed_words = []
words_to_permute = []
permuted_words = []
counter = 0
total_counter = 0

for word in words:

    # Apply the case transformations: word, WORD, Word, wORD
    lowercase_all = word.lower()
    uppercase_all = word.upper()
    capitalize_first = word.capitalize()
    toggle_case =  capitalize_first.swapcase()

    # Add the transformed words to the list
    transformed_words.append(lowercase_all)
    transformed_words.append(uppercase_all)
    transformed_words.append(capitalize_first)
    transformed_words.append(toggle_case)

words_to_permute = transformed_words + chars

print("Generating permutations...")
with open(file_permuted, "w") as f:
    for i in range(1, len(words_to_permute) + 1):
        for permutation in itertools.permutations(words_to_permute, i):
            len_set_permutation = len(set(list(map(lambda x: x.lower(), permutation))))
            if (len_set_permutation == len(permutation)):
                f.write("".join(permutation) + "\n")
                if (counter == 100):
                    total_counter += counter
                    print('Processed {0} items'.format(str(total_counter)))
                    counter = 0

                counter += 1

請給我一個更好/更優雅和有效的方法。

Answer 1

像這樣：

def word_casing (word):
    if 0 == len(word):
        yield ""
    else:
        char = word[0]
        for next_word in word_casing(word[1:]):
            yield char + next_word
            yield char.upper() + next_word

def word_casing_fn (word):
    def inner ():
        yield from word_casing(word)
    return inner

def char_listing_fn (char):
    def inner ():
        yield char
    return inner

def subsets (items):
    if 0 == len(items):
        yield []
    else:
        for s in subsets(items[1:]):
            yield s
            yield [items[0]] + s

def nonempty_subsets (items):
    for s in subsets(items):
        if 0 < len(s):
            yield s

def permutations (items):
    if 0 == len(items):
        yield []
    else:
        for i in range(len(items)):
            if 0 == i:
                for p in permutations(items[1:]):
                    yield [items[0]] + p
            else:
                (items[0], items[i]) = (items[i], items[0])
                for p in permutations(items[1:]):
                    yield [items[0]] + p
                (items[0], items[i]) = (items[i], items[0])

def list_fn_combinations (list_fns):
    if 1 == len(list_fns):
        for item in list_fns[0]():
            yield item
    elif 1 < len(list_fns):
        for item in list_fns[0]():
            for comb in list_fn_combinations(list_fns[1:]):
                yield item + comb

def all_combs (words, chars):
    list_fns = []
    for word in words:
        list_fns.append(word_casing_fn(word))
    for char in chars:
        list_fns.append(char_listing_fn(char))
    for s in nonempty_subsets(list_fns):
        for p in permutations(s):
            for c in list_fn_combinations(p):
                yield c


for w in all_combs(["one", "two"], ["!"]):
    print(w)

Answer 2

您可以逐步執行此操作，但請注意數據大小，它會很快變得巨大：

准備每個W單詞的 4 個變體
計算變化列表的笛卡爾積； 這將創建T = 4**W元組
計算每個元組的排列； 這將創建P = T * W! 元組
最后用C字符之一生成每個元組的版本，之前或之后，以及 output 原始元組及其所有版本； 項目總數將為V = P * (2*C+1)

因此，對於 2 個單詞和 1 個字符的最小示例，我們將獲得 96 個版本； 但是如果我們有 4 個單詞和 3 個字符，我們會生成一些 43k 版本： 4**4 * 4! * (2*3+1) 4**4 * 4! * (2*3+1) 。 對於 10 個單詞，我們會有一些 2.66E13 版本， 4**10 * 10! * (2*3+1) 4**10 * 10! * (2*3+1) ，或者，假設所有單詞的長度都是5，超過1PB的memory占用。 我們真的需要 go 作為發電機！

我的代碼：

def get_variants(w):
    w_cap = w.capitalize()
    return (w.upper(), w.lower(), w_cap, w_cap.swapcase())

def get_product(words):
    variants = [get_variants(w) for w in words]
    for p in product(*variants):
        yield p

def get_perms(words):
    for prod in get_product(words):
        for perm in permutations(prod):
            yield perm

def get_versions(words, chars):
    for p in get_perms(words):
        s = ''.join(p)
        yield s
        for ch in chars:
            yield ch+s
            yield s+ch

查找 Python 中所有可能的大小寫排列，避免重復

問題描述

2 個解決方案

解決方案1
0 2023-01-24 14:04:09

解決方案2
0 2023-01-26 14:05:13

查找 Python 中所有可能的大小寫排列，避免重復

問題描述

2 個解決方案

解決方案1 0 2023-01-24 14:04:09

解決方案2 0 2023-01-26 14:05:13

解決方案1
0 2023-01-24 14:04:09

解決方案2
0 2023-01-26 14:05:13