NLTK子词组计数频率

Question

对于这句话：“我看到外面有一棵大树。一个男人在那棵大树下”

我如何计算tall tree的频率？ 我可以在配置中使用bigram，例如

bgs= nltk.bigrams(tokens)
fdist1= nltk.FreqDist(bgs)
pairs = fdist1.most_common(500)

但我所需要的只是数一个特定的副词。

Answer 1

@ uday1889的答案有一些缺陷：

>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> string.count("tall tree")
2
>>> string = "The see a stall tree outside. A man is under the tall trees"
>>> string.count("tall tree")
2
>>> string = "I would like to install treehouses at my yard"
>>> string.count("tall tree")
1

一个便宜的技巧是在str.count()填充空间：

>>> string = "I would like to install treehouses at my yard"
>>> string.count("tall tree")
1
>>> string.count(" tall tree ")
0
>>> string = "The see a stall tree outside. A man is under the tall trees"
>>> string.count(" tall tree ")
0
>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> string.count(" tall tree ")
1

但是如您所见，当子字符串位于句子的开头或结尾或标点符号旁边时，会出现一些问题。

>>> from nltk.util import ngrams
>>> from nltk import word_tokenize
>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> len([i for i in ngrams(word_tokenize(string),n=2) if i==('tall', 'tree')])
2
>>> string = "I would like to install treehouses at my yard"
>>> len([i for i in ngrams(word_tokenize(string),n=2) if i==('tall', 'tree')])
0

Answer 2

count（）方法应该做到这一点：

string = "I see a tall tree outside. A man is under the tall tree"
string.count("tall tree")

NLTK子词组计数频率

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-08-06 12:39:01

解决方案2
1 2015-08-06 08:13:44

NLTK子词组计数频率

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-08-06 12:39:01

解决方案2 1 2015-08-06 08:13:44

解决方案1
2 已采纳 2015-08-06 12:39:01

解决方案2
1 2015-08-06 08:13:44