简体   繁体   中英

Word list from text file

I need to create a word list from a text file. The list is going to be used in a hangman code and needs to exclude the following from the list:

  1. duplicate words
  2. words containing less than 5 letters
  3. words that contain 'xx' as a substring
  4. words that contain upper case letters

the word list then needs to be output into file so that every word appears on its own line. The program also needs to output the number of words in the final list.

This is what I have, but it's not working properly.

def MakeWordList():
    infile=open(('possible.rtf'),'r')
    whole = infile.readlines()
    infile.close()

    L=[]
    for line in whole:
        word= line.split(' ')
        if word not in L:
            L.append(word)
            if len(word) in range(5,100):
                L.append(word)
                if not word.endswith('xx'):
                    L.append(word)
                    if word == word.lower():
                        L.append(word)
    print L

MakeWordList()

You're appending the word many times with this code,
You arn't actually filtering out the words at all, just adding them a different number of timed depending on how many if 's they pass.

you should combine all the if 's:

if word not in L and len(word) >= 5 and not 'xx' in word and word.islower():
    L.append(word)

Or if you want it more readable you can split them:

    if word not in L and len(word) >= 5:
        if not 'xx' in word and word.islower():
            L.append(word)

But don't append after each one.

Think about it: in your nested if-statements, ANY word that is not already in the list will make it through on your first line. Then if it is 5 or more characters, it will get added again (I bet), and again, etc. You need to rethink your logic in the if statements.

Improved code:

def MakeWordList():
    with open('possible.rtf','r') as f:
        data = f.read()
    return set([word for word in data if len(word) >= 5 and word.islower() and not 'xx' in word])

set(_iterable_) returns a set-type object that has no duplicates (all set items must be unique). [word for word...] is a list comprehension which is a shorter way of creating simple lists. You can iterate over every word in 'data' (this assumes each word is on a separate line). if len(word) >= 5 and word.islower() and not 'xx' in word accomplishes the final three requirements (must be more than 5 letters, have only lowercase letters, and cannot contain 'xx').

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM