Remove only 'NN' words' from NTLK pos_tag

Question

I have a code to find the nouns and verbs using NLTK.

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk


sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence)
print sent

It returns:

[('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

How can i remove only the 'NN' words from the list.

Answer 1

You could use a list comprehension to remove the 'NN' elements:

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence)
print [s for s in sent if s[1] != 'NN']

Answer 2

a = [('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

c = [b  for b in a if b[-1] != 'NN']

Answer 3

I'd use filter function:

>>> filter(lambda (word, tag): tag != 'NN', sent)
[('Hello', 'NNP'), ('my', 'PRP$'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

Answer 4

Here's one more way of doing it (using the advantage of tuples):

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence) 
sent_clean = [x for (x,y) in sent if y not in ('NN')]

print(sent_clean)

Output:

['Hello', 'my', 'is', 'Abhishek', 'Mitra']

Explanation: In the code:

sent_clean = [x for (x,y) in sent if y not in ('NN')]

After you POS tag every word in your sentence, you are trying to extract the word for a tuple created due to POS tag. The condition you specify to extract is the second part

Similarly, if you want to eliminate multiple POS:

sent_clean2 = [x for (x,y) in sent if y not in ('PRP$', 'VBZ', 'NN')]

print(sent_clean2)

Output:

['Hello', 'Abhishek', 'Mitra']

Remove only 'NN' words' from NTLK pos_tag

Question

4 answers

solution1
3 2013-08-15 11:52:59

solution2
0 2013-08-15 11:55:36

solution3
0 2013-08-15 15:40:57

solution4
0 2019-03-14 15:58:07

Remove only 'NN' words' from NTLK pos_tag

Question

4 answers

solution1 3 2013-08-15 11:52:59

solution2 0 2013-08-15 11:55:36

solution3 0 2013-08-15 15:40:57

solution4 0 2019-03-14 15:58:07

solution1
3 2013-08-15 11:52:59

solution2
0 2013-08-15 11:55:36

solution3
0 2013-08-15 15:40:57

solution4
0 2019-03-14 15:58:07