NLTK count frequency of sub phrase

Question

For this sentence: "I see a tall tree outside. A man is under the tall tree"

How do I count the frequency of tall tree ? I can get use a bigram in collocation, such as

bgs= nltk.bigrams(tokens)
fdist1= nltk.FreqDist(bgs)
pairs = fdist1.most_common(500)

but all I need is to count a specific sub phrase.

Answer 1

@uday1889's answer has some flaws:

>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> string.count("tall tree")
2
>>> string = "The see a stall tree outside. A man is under the tall trees"
>>> string.count("tall tree")
2
>>> string = "I would like to install treehouses at my yard"
>>> string.count("tall tree")
1

A cheap hack would be to pad in the space in the str.count() :

>>> string = "I would like to install treehouses at my yard"
>>> string.count("tall tree")
1
>>> string.count(" tall tree ")
0
>>> string = "The see a stall tree outside. A man is under the tall trees"
>>> string.count(" tall tree ")
0
>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> string.count(" tall tree ")
1

But as you see there's some problems when the substring is at the start or end of a sentence or next to a punctuation.

>>> from nltk.util import ngrams
>>> from nltk import word_tokenize
>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> len([i for i in ngrams(word_tokenize(string),n=2) if i==('tall', 'tree')])
2
>>> string = "I would like to install treehouses at my yard"
>>> len([i for i in ngrams(word_tokenize(string),n=2) if i==('tall', 'tree')])
0

Answer 2

The count() method should do it:

string = "I see a tall tree outside. A man is under the tall tree"
string.count("tall tree")

NLTK count frequency of sub phrase

Question

2 answers

solution1
2 ACCPTED 2015-08-06 12:39:01

solution2
1 2015-08-06 08:13:44

NLTK count frequency of sub phrase

Question

2 answers

solution1 2 ACCPTED 2015-08-06 12:39:01

solution2 1 2015-08-06 08:13:44

solution1
2 ACCPTED 2015-08-06 12:39:01

solution2
1 2015-08-06 08:13:44