[英]Finding all possible case permutations in Python with variations avoiding duplicates
給定一個單詞和字符列表,我需要生成單詞變體(小寫、大寫等)和所有可能的排列以避免重復(不區分大小寫)。
例子
words = ["one", "two"]
chars = ["!"]
單詞變體:
one, One, ONE, oNE, two, Two, TWO, tWO
...和可能的(增量)排列:
one, one!, !one, One, One!, !One, ONE, ONE!, !ONE, two, two! ...,
...onetwo, onetwo!, !onetwo, Onetwo, Onetwo!, !Onetwo, ...
...OneTwo, OneTwo!, !OneTwo, ...
...twoone, twoone!, !twoone, ...etc.
但不是:
oneOne, oneONE, oneoNE, ...twoTwo, twoTWO, twotWO...
這是我的 Python 代碼:
words = ["one", "two"]
chars = ["2022", "!", "_"]
file_permuted = "permuted_words.txt"
transformed_words = []
words_to_permute = []
permuted_words = []
counter = 0
total_counter = 0
for word in words:
# Apply the case transformations: word, WORD, Word, wORD
lowercase_all = word.lower()
uppercase_all = word.upper()
capitalize_first = word.capitalize()
toggle_case = capitalize_first.swapcase()
# Add the transformed words to the list
transformed_words.append(lowercase_all)
transformed_words.append(uppercase_all)
transformed_words.append(capitalize_first)
transformed_words.append(toggle_case)
words_to_permute = transformed_words + chars
print("Generating permutations...")
with open(file_permuted, "w") as f:
for i in range(1, len(words_to_permute) + 1):
for permutation in itertools.permutations(words_to_permute, i):
len_set_permutation = len(set(list(map(lambda x: x.lower(), permutation))))
if (len_set_permutation == len(permutation)):
f.write("".join(permutation) + "\n")
if (counter == 100):
total_counter += counter
print('Processed {0} items'.format(str(total_counter)))
counter = 0
counter += 1
請給我一個更好/更優雅和有效的方法。
像這樣:
def word_casing (word):
if 0 == len(word):
yield ""
else:
char = word[0]
for next_word in word_casing(word[1:]):
yield char + next_word
yield char.upper() + next_word
def word_casing_fn (word):
def inner ():
yield from word_casing(word)
return inner
def char_listing_fn (char):
def inner ():
yield char
return inner
def subsets (items):
if 0 == len(items):
yield []
else:
for s in subsets(items[1:]):
yield s
yield [items[0]] + s
def nonempty_subsets (items):
for s in subsets(items):
if 0 < len(s):
yield s
def permutations (items):
if 0 == len(items):
yield []
else:
for i in range(len(items)):
if 0 == i:
for p in permutations(items[1:]):
yield [items[0]] + p
else:
(items[0], items[i]) = (items[i], items[0])
for p in permutations(items[1:]):
yield [items[0]] + p
(items[0], items[i]) = (items[i], items[0])
def list_fn_combinations (list_fns):
if 1 == len(list_fns):
for item in list_fns[0]():
yield item
elif 1 < len(list_fns):
for item in list_fns[0]():
for comb in list_fn_combinations(list_fns[1:]):
yield item + comb
def all_combs (words, chars):
list_fns = []
for word in words:
list_fns.append(word_casing_fn(word))
for char in chars:
list_fns.append(char_listing_fn(char))
for s in nonempty_subsets(list_fns):
for p in permutations(s):
for c in list_fn_combinations(p):
yield c
for w in all_combs(["one", "two"], ["!"]):
print(w)
您可以逐步執行此操作,但請注意數據大小,它會很快變得巨大:
W
單詞的 4 個變體T = 4**W
元組P = T * W!
元組C
字符之一生成每個元組的版本,之前或之后,以及 output 原始元組及其所有版本; 項目總數將為V = P * (2*C+1)
因此,對於 2 個單詞和 1 個字符的最小示例,我們將獲得 96 個版本; 但是如果我們有 4 個單詞和 3 個字符,我們會生成一些 43k 版本: 4**4 * 4! * (2*3+1)
4**4 * 4! * (2*3+1)
。 對於 10 個單詞,我們會有一些 2.66E13 版本, 4**10 * 10! * (2*3+1)
4**10 * 10! * (2*3+1)
,或者,假設所有單詞的長度都是5,超過1PB的memory占用。 我們真的需要 go 作為發電機!
我的代碼:
def get_variants(w):
w_cap = w.capitalize()
return (w.upper(), w.lower(), w_cap, w_cap.swapcase())
def get_product(words):
variants = [get_variants(w) for w in words]
for p in product(*variants):
yield p
def get_perms(words):
for prod in get_product(words):
for perm in permutations(prod):
yield perm
def get_versions(words, chars):
for p in get_perms(words):
s = ''.join(p)
yield s
for ch in chars:
yield ch+s
yield s+ch
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.