简体   繁体   中英

Find all English words that can be made from list of letters using each letter no more times than it appears in the list

I am trying to enter random sets of letters into a function so that it returns all the possible words from a text file that can be made up from these random letters, with a length between 4 to 9 characters. At the moment the code returns words made up of only the letters in the set but in some cases it will use an element more than once to make a word. I want it to output only the words that use each letter once. for example 'animal' will be return but it has used the letter 'a' twice to make the word.

letterList = ["a", "n", "i", "b", "s", "l", "s", "y", "m"] 

with open('american-english') as f:
    for w in f:
        w = w.strip()
        cond = all(i in letterList for i in w) and letterList[4] in w
        if 9 > len(w) >= 4 and cond:
            print(w)

A simple option might be to use your existing approach compare the count of each letter.

You could also try using itertools.permutations to generate all the possible 'words' from your letters and check if each one is in the dictionary. I suspect this will be slow as the number of permutations will be huge and most of them won't be words.

A common technique for finding anagrams is to sort the letters of both words alphabetically then do an equality comparison:

sorted(word1)==sorted(word2)

If this is True, word1 and word2 are anagrams. You could use this to reduce the number of comparisons as with this technique you would only need the permutations which are unique after sorting.

I have written a script to show all three working and allow you to benchmark them. My testing shows that the the unrefined itertools method scales very badly as the letter list gets longer. The counting method is mediocre but the refined itertools method is generally fastest. These could all be optimised further of course. Have a go with them.

import time
import itertools

letterList = list('catd')

#letter counting method
tic=time.time()
with open(r'D:/words_alpha.txt','r') as f:
    for word in f:
        if all([word.strip().count(letter) <= letterList.count(letter) for letter in word]):
            print(word.strip())
toc=time.time()
print(toc-tic)

#permutations with no refinement
tic=time.time()
with open(r'D:/words_alpha.txt','r') as f:
    for word in f:
        for n in range(1,len(letterList)+1):
            for pseudoword in itertools.permutations(letterList,n):
                if word.strip() == "".join(pseudoword):
                    print(word.strip())
toc=time.time()
print(toc-tic)

#permutations with anagram refinement
tic=time.time()
pwords=[]
for n in range(1, len(letterList) + 1):
    for pseudoword in itertools.permutations(letterList, n):
        if sorted(pseudoword) == list(pseudoword):
            pwords.append("".join(pseudoword))
print (pwords)
with open(r'D:/words_alpha.txt', 'r') as f:
    for word in f:
        if "".join(sorted(word.strip())) in pwords:
            print(word.strip())
toc=time.time()
print(toc-tic)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM