從Python中的一組句子中查找最常見的單詞

Question

我在np.array中有5個句子，我想找到出現的最常見的n個單詞。 例如，如果n為3，我希望使用3個最常用的單詞。 我有一個例子如下：

0    oh i am she cool though might off her a brownie lol
1    so trash wouldnt do colors better tweet
2    love monkey brownie as much as a tweet
3    monkey get this tweet around i think
4    saw a brownie to make me some monkey

如果n為3，我希望將其打印出來：布朗尼，猴子，推特。 有沒有做這種事情的簡單方法？

Answer 1

您可以在CountVectorizer的幫助下CountVectorizer ，如下所示：

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

A = np.array(["oh i am she cool though might off her a brownie lol", 
              "so trash wouldnt do colors better tweet", 
              "love monkey brownie as much as a tweet",
              "monkey get this tweet around i think",
              "saw a brownie to make me some monkey" ])

n = 3
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(A)

vocabulary = vectorizer.get_feature_names()
ind  = np.argsort(X.toarray().sum(axis=0))[-n:]

top_n_words = [vocabulary[a] for a in ind]

print (top_n_words)
['tweet', 'monkey', 'brownie']

希望這可以幫助！

從Python中的一組句子中查找最常見的單詞

問題描述

1 個解決方案

解決方案1
2 已采納 2019-07-09 17:24:37

從Python中的一組句子中查找最常見的單詞

問題描述

1 個解決方案

解決方案1 2 已采納 2019-07-09 17:24:37

解決方案1
2 已采納 2019-07-09 17:24:37