简体   繁体   中英

Make a word list from any document in python

I am wanting to output a simple word list from any text document. I want every word listed but no duplicates. This is what I have but it doesn't do anything. I am fairly new to python. Thanks!

def MakeWordList():
    with open('text.txt','r') as f:
        data = f.read()
    return set([word for wordd])

for word in data loop basically iterates over data , which is string, so your word loop variable gets a single character in each iteration. You would want to use something like data.split() to loop over the list of words.

You can't iterate over the data you read like this, because they are a string so as a result you get consecutive characters, however you can split the string on spaces, which will give you a list of words

def MakeWordList():
    with open('possible.rtf','r') as f:
        data = f.read()
    return set([word for word in data.split(' ') if len(word) >= 5 and word.islower() and not 'xx' in word])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM