从NTLK pos_tag中仅删除'NN'字样'

Question

我有一个代码来使用NLTK查找名词和动词。

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk


sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence)
print sent

它返回：

[('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

如何从列表中仅删除“NN”字样。

Answer 1

您可以使用列表推导来删除'NN'元素：

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence)
print [s for s in sent if s[1] != 'NN']

Answer 2

a = [('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

c = [b  for b in a if b[-1] != 'NN']

Answer 3

我使用过滤功能：

>>> filter(lambda (word, tag): tag != 'NN', sent)
[('Hello', 'NNP'), ('my', 'PRP$'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

Answer 4

这是另一种方法（使用元组的优势）：

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence) 
sent_clean = [x for (x,y) in sent if y not in ('NN')]

print(sent_clean)

输出：

['Hello', 'my', 'is', 'Abhishek', 'Mitra']

说明：在代码中：

sent_clean = [x for (x,y) in sent if y not in ('NN')]

在POS标记句子中的每个单词后，您正在尝试提取由于POS标记而创建的元组的单词。 您指定要提取的条件是第二部分

同样，如果你想消除多个POS：

sent_clean2 = [x for (x,y) in sent if y not in ('PRP$', 'VBZ', 'NN')]

print(sent_clean2)

输出：

['Hello', 'Abhishek', 'Mitra']

从NTLK pos_tag中仅删除'NN'字样'

问题描述

4 个解决方案

解决方案1
3 2013-08-15 11:52:59

解决方案2
0 2013-08-15 11:55:36

解决方案3
0 2013-08-15 15:40:57

解决方案4
0 2019-03-14 15:58:07

从NTLK pos_tag中仅删除&#39;NN&#39;字样&#39;

问题描述

4 个解决方案

解决方案1 3 2013-08-15 11:52:59

解决方案2 0 2013-08-15 11:55:36

解决方案3 0 2013-08-15 15:40:57

解决方案4 0 2019-03-14 15:58:07

从NTLK pos_tag中仅删除'NN'字样'

解决方案1
3 2013-08-15 11:52:59

解决方案2
0 2013-08-15 11:55:36

解决方案3
0 2013-08-15 15:40:57

解决方案4
0 2019-03-14 15:58:07