![](/img/trans.png)
[英]Find relative count of most common words from set of sentences in Python
[英]Find most common words from set of sentences in Python
我在np.array中有5個句子,我想找到出現的最常見的n個單詞。 例如,如果n為3,我希望使用3個最常用的單詞。 我有一個例子如下:
0 oh i am she cool though might off her a brownie lol
1 so trash wouldnt do colors better tweet
2 love monkey brownie as much as a tweet
3 monkey get this tweet around i think
4 saw a brownie to make me some monkey
如果n為3,我希望將其打印出來:布朗尼,猴子,推特。 有沒有做這種事情的簡單方法?
您可以在CountVectorizer
的幫助下CountVectorizer
,如下所示:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
A = np.array(["oh i am she cool though might off her a brownie lol",
"so trash wouldnt do colors better tweet",
"love monkey brownie as much as a tweet",
"monkey get this tweet around i think",
"saw a brownie to make me some monkey" ])
n = 3
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(A)
vocabulary = vectorizer.get_feature_names()
ind = np.argsort(X.toarray().sum(axis=0))[-n:]
top_n_words = [vocabulary[a] for a in ind]
print (top_n_words)
['tweet', 'monkey', 'brownie']
希望這可以幫助!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.