简体   繁体   English

从NTLK pos_tag中仅删除'NN'字样'

[英]Remove only 'NN' words' from NTLK pos_tag

I have a code to find the nouns and verbs using NLTK. 我有一个代码来使用NLTK查找名词和动词。

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk


sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence)
print sent

It returns: 它返回:

[('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

How can i remove only the 'NN' words from the list. 如何从列表中仅删除“NN”字样。

You could use a list comprehension to remove the 'NN' elements: 您可以使用列表推导来删除'NN'元素:

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence)
print [s for s in sent if s[1] != 'NN']
a = [('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

c = [b  for b in a if b[-1] != 'NN']

I'd use filter function: 我使用过滤功能:

>>> filter(lambda (word, tag): tag != 'NN', sent)
[('Hello', 'NNP'), ('my', 'PRP$'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

Here's one more way of doing it (using the advantage of tuples): 这是另一种方法(使用元组的优势):

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence) 
sent_clean = [x for (x,y) in sent if y not in ('NN')]

print(sent_clean)

Output: 输出:

['Hello', 'my', 'is', 'Abhishek', 'Mitra']

Explanation: In the code: 说明:在代码中:

sent_clean = [x for (x,y) in sent if y not in ('NN')]

After you POS tag every word in your sentence, you are trying to extract the word for a tuple created due to POS tag. 在POS标记句子中的每个单词后,您正在尝试提取由于POS标记而创建的元组的单词。 The condition you specify to extract is the second part 您指定要提取的条件是第二部分

Similarly, if you want to eliminate multiple POS: 同样,如果你想消除多个POS:

sent_clean2 = [x for (x,y) in sent if y not in ('PRP$', 'VBZ', 'NN')]

print(sent_clean2)

Output: 输出:

['Hello', 'Abhishek', 'Mitra']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM