如何檢查Python NLTK中的某個標簽？

Question

我一直在嘗試檢查標簽，以查看它是否是'NNP'已有一段時間了。

for key in words:
        temp.append(words[key])
        tagger = [key]
        tag = nltk.pos_tag(tagger)
        x = str(tag[0][1].strip())
        print(x is 'NNP')

代碼應該執行的操作是遍歷幾個鍵並檢查標簽是否為NNP。 無論何時標簽實際上是NNP，我的打印語句都會打印出False。 我使用type（tag [0] [1]）檢查它是否為str，是的。 我還剝離了字符串，並決定使用str函數來確保它是字符串。 似乎沒有任何作用。 我應該使用內置的NLTK函數或其他建議嗎？

Answer 1

比較字符串時，應始終使用==運算符而不是is ：

print(x == 'NNP')

使用is比較字符串對象本身的身份，而==檢查字符串對象是否相等。

例如：

>>> import nltk
>>> tag = nltk.pos_tag(['Google'])
>>> tag
[('Google', 'NNP')]
>>> tag[0][1]
'NNP'
>>> tag[0][1] is 'NNP'
False
>>> tag[0][1] == 'NNP'
True

Answer 2

這是POS標簽檢查的慣用用法：

>>> from nltk import pos_tag, word_tokenize
>>> text = 'Google is a friend of Facebook and Yahoo shouts at Microsoft because Stackoverflow is giving out hats.'
>>> for word, pos in pos_tag(word_tokenize(text)):
...     print word, pos
... 
Google NNP
is VBZ
a DT
friend NN
of IN
Facebook NNP
and CC
Yahoo NNP
shouts NNS
at IN
Microsoft NNP
because IN
Stackoverflow NNP
is VBZ
giving VBG
out RP
hats NNS
. .
>>> for word, pos in pos_tag(word_tokenize(text)):
...     if pos == 'NNP':
...             print word
... 
Google
Facebook
Yahoo
Microsoft
Stackoverflow

使用列表理解：

>>> [word for word, pos in pos_tag(word_tokenize(text)) if pos == 'NNP']
['Google', 'Facebook', 'Yahoo', 'Microsoft', 'Stackoverflow']

如何檢查Python NLTK中的某個標簽？

問題描述

2 個解決方案

解決方案1
3 已采納 2014-12-16 04:24:01

解決方案2
2 2014-12-16 14:52:21

如何檢查Python NLTK中的某個標簽？

問題描述

2 個解決方案

解決方案1 3 已采納 2014-12-16 04:24:01

解決方案2 2 2014-12-16 14:52:21

解決方案1
3 已采納 2014-12-16 04:24:01

解決方案2
2 2014-12-16 14:52:21