简体   繁体   中英

Python NLTK FreqDist - Listing words with a frequency greater than 1000

I'm trying to output every word that appears in my tokens more than 1000 times (> 1000) and save it to freq1000.

freq1000 = []

newtokens = []

for words in tokens:
    newtokens += words
FreqDist(newtokens)

fd_1 = FreqDist(newtokens)

for i in set(fd_1):
    if fd_1.count(i) == >1000:
        print(i)

This is my current code, I'm completly stuck after this and I'm not sure if there is a freqdist function I can use to help. I have saved the FreqDist to fd_1 successfully. I'm just unsure how to get an output of the words that appear more than 1000 times and save it to freq1000.

I would appreciate any help you can provide.

You can filter the words based on the frequency count using the freqDist.items() like below:

 list(filter(lambda x: x[1]>=1000, fd_1.items()))

Hope it helps :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM