使用python的NLTK計算動詞，名詞和其他詞性

Question

我有多個文本，我想根據各種詞性的使用來創建它們的配置文件，如名詞和動詞。 基本上，我需要計算每個詞性的使用次數。

我已經標記了文字，但我不確定如何進一步：

tokens = nltk.word_tokenize(text.lower())
text = nltk.Text(tokens)
tags = nltk.pos_tag(text)

如何將每個詞性的計數保存到變量中？

Answer 1

pos_tag方法為您提供（令牌，標記）對的列表：

tagged = [('the', 'DT'), ('dog', 'NN'), ('sees', 'VB'), ('the', 'DT'), ('cat', 'NN')]

如果您使用的是Python 2.7或更高版本，那么您只需使用以下命令即可：

>>> from collections import Counter
>>> counts = Counter(tag for word,tag in tagged)
>>> counts
Counter({'DT': 2, 'NN': 2, 'VB': 1})

要規范化計數（給出每個的比例），請執行以下操作：

>>> total = sum(counts.values())
>>> dict((word, float(count)/total) for word,count in counts.items())
{'DT': 0.4, 'VB': 0.2, 'NN': 0.4}

請注意，在舊版本的Python中，您必須自己實現Counter ：

>>> from collections import defaultdict
>>> counts = defaultdict(int)
>>> for word, tag in tagged:
...  counts[tag] += 1

>>> counts
defaultdict(<type 'int'>, {'DT': 2, 'VB': 1, 'NN': 2})

使用python的NLTK計算動詞，名詞和其他詞性

問題描述

1 個解決方案

解決方案1
30 已采納 2012-05-20 15:49:40

使用python的NLTK計算動詞，名詞和其他詞性

問題描述

1 個解決方案

解決方案1 30 已采納 2012-05-20 15:49:40

解決方案1
30 已采納 2012-05-20 15:49:40