簡體   English   中英

NLTK子詞組計數頻率

[英]NLTK count frequency of sub phrase

對於這句話:“我看到外面有一棵大樹。一個男人在那棵大樹下”

我如何計算tall tree的頻率? 我可以在配置中使用bigram,例如

bgs= nltk.bigrams(tokens)
fdist1= nltk.FreqDist(bgs)
pairs = fdist1.most_common(500)

但我所需要的只是數一個特定的副詞。

@ uday1889的答案有一些缺陷:

>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> string.count("tall tree")
2
>>> string = "The see a stall tree outside. A man is under the tall trees"
>>> string.count("tall tree")
2
>>> string = "I would like to install treehouses at my yard"
>>> string.count("tall tree")
1

一個便宜的技巧是在str.count()填充空間:

>>> string = "I would like to install treehouses at my yard"
>>> string.count("tall tree")
1
>>> string.count(" tall tree ")
0
>>> string = "The see a stall tree outside. A man is under the tall trees"
>>> string.count(" tall tree ")
0
>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> string.count(" tall tree ")
1

但是如您所見,當子字符串位於句子的開頭或結尾或標點符號旁邊時,會出現一些問題。

>>> from nltk.util import ngrams
>>> from nltk import word_tokenize
>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> len([i for i in ngrams(word_tokenize(string),n=2) if i==('tall', 'tree')])
2
>>> string = "I would like to install treehouses at my yard"
>>> len([i for i in ngrams(word_tokenize(string),n=2) if i==('tall', 'tree')])
0

count()方法應該做到這一點:

string = "I see a tall tree outside. A man is under the tall tree"
string.count("tall tree")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM