針對特定單詞的NLTK搭配

Question

我知道如何使用NLTK獲得bigram和trigram搭配，並將它們應用到我自己的語料庫中。 代碼如下。

然而，我不確定（1）如何獲得特定單詞的搭配？ （2）NLTK是否具有基於對數似然比的配置度量？

import nltk
from nltk.collocations import *
from nltk.tokenize import word_tokenize

text = "this is a foo bar bar black sheep  foo bar bar black sheep foo bar bar black  sheep shep bar bar black sentence"

trigram_measures = nltk.collocations.TrigramAssocMeasures()
finder = TrigramCollocationFinder.from_words(word_tokenize(text))

for i in finder.score_ngrams(trigram_measures.pmi):
    print i

Answer 1

試試這段代碼：

import nltk
from nltk.collocations import *
bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()

# Ngrams with 'creature' as a member
creature_filter = lambda *w: 'creature' not in w


## Bigrams
finder = BigramCollocationFinder.from_words(
   nltk.corpus.genesis.words('english-web.txt'))
# only bigrams that appear 3+ times
finder.apply_freq_filter(3)
# only bigrams that contain 'creature'
finder.apply_ngram_filter(creature_filter)
# return the 10 n-grams with the highest PMI
print finder.nbest(bigram_measures.likelihood_ratio, 10)


## Trigrams
finder = TrigramCollocationFinder.from_words(
   nltk.corpus.genesis.words('english-web.txt'))
# only trigrams that appear 3+ times
finder.apply_freq_filter(3)
# only trigrams that contain 'creature'
finder.apply_ngram_filter(creature_filter)
# return the 10 n-grams with the highest PMI
print finder.nbest(trigram_measures.likelihood_ratio, 10)

它使用似然度量並過濾掉不包含“生物”一詞的Ngrams

Answer 2

問題1 - 嘗試：

target_word = "electronic" # your choice of word
finder.apply_ngram_filter(lambda w1, w2, w3: target_word not in (w1, w2, w3))
for i in finder.score_ngrams(trigram_measures.likelihood_ratio):
print i

這個想法是過濾掉你不想要的東西。 此方法通常用於過濾掉ngram特定部分中的單詞，您可以根據自己的內容進行調整。

Answer 3

至於問題＃2，是的！ NLTK在其關聯度量中具有似然比。 第一個問題仍然沒有答案！

http://nltk.org/api/nltk.metrics.html?highlight=likelihood_ratio#nltk.metrics.association.NgramAssocMeasures.likelihood_ratio

針對特定單詞的NLTK搭配

問題描述

3 個解決方案

解決方案1
11 已采納 2014-01-17 11:54:31

解決方案2
2 2014-01-17 04:22:01

解決方案3
0 2014-01-17 03:57:58

針對特定單詞的NLTK搭配

問題描述

3 個解決方案

解決方案1 11 已采納 2014-01-17 11:54:31

解決方案2 2 2014-01-17 04:22:01

解決方案3 0 2014-01-17 03:57:58

解決方案1
11 已采納 2014-01-17 11:54:31

解決方案2
2 2014-01-17 04:22:01

解決方案3
0 2014-01-17 03:57:58