简体   繁体   English

如何获得频率大于的单词列表?

[英]How to get a list of words with frequency greater than?

I have created a list of words from a dataframe and removed stop words from it.我从数据框中创建了一个单词列表并从中删除了停用词。 I want to create a list of words with frequency greater than some value n.我想创建一个频率大于某个值 n 的单词列表。 How do I do that.我怎么做。

Here is my code to generate the list:

tokenizer = RegexpTokenizer(r"\w+(?:[-']\w+)?")
wineData['description'] = wineData['description'].apply(lambda x: 
str.lower(x))
wineDataTokenized = wineData['description'].apply(lambda x: [el for el in 
tokenizer.tokenize(x) if el not in stop_words])
filteredList = chain.from_iterable(wineDataTokenized)
frequencyList = FreqDist(filteredList)
highFreq = list(frequencyList.keys())
wordstring = 'it was the best of times it was the worst of times '
wordstring += 'it was the age of wisdom it was the age of foolishness'
wordlist = wordstring.split()

wordfreq = []
for w in wordlist:
    wordfreq.append(wordlist.count(w))

print("String\n" + wordstring +"\n")
print("List\n" + str(wordlist) + "\n")
print("Frequencies\n" + str(wordfreq) + "\n")
print("Pairs\n" + str(zip(wordlist, wordfreq)))

source: https://programminghistorian.org/en/lessons/counting-frequencies来源: https : //programminghistorian.org/en/lessons/counting-frequencies

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM