I've been using a custom trained nltk pos_tagger and sometimes I get obvious verbs (ending with ING or ED) come in as NN's. How do I get the tagger to process all NN's through an additional regexpTagger just to find the additional verbs?
I've included some sample code for the secondary regex tagger.
from nltk.tag.sequential import RegexpTagger
rgt = RegexpTagger(
(r'.*ing$', 'VBG'), # gerunds
(r'.*ed$', 'VBD'), # past tense verbs
])
Thanks
Here is tri_gram tagger which is backed off by bi-gram (which is backed off by uni-gram) and the primary back-off tragger being the regex tragger. So, the last tagging here will be left to regex if any of the other tagger fails to tag it on the basis of rules defined here. Hope this helps you to build your own regex tagger of your rules.
from nltk.corpus import brown
import sys
from nltk import pos_tag
from nltk.tokenize import word_tokenize
import nltk
from nltk import ne_chunk
def tri_gram():
##Trigram tagger done by training data from brown corpus
b_t_sents=brown.tagged_sents(categories='news')
##Making n-gram tagger using Turing backoff
default_tagger = nltk.RegexpTagger(
[(r'^-?[0-9]+(.[0-9]+)?$', 'CD'), # cardinal numbers
(r'(The|the|A|a|An|an)$', 'AT'), # articles
(r'.*able$', 'JJ'), # adjectives
(r'.*ness$', 'NN'), # nouns formed from adjectives
(r'.*ly$', 'RB'), # adverbs
(r'.*s$', 'NNS'), # plural nouns
(r'.*ing$', 'VBG'), # gerunds
(r'.*ed$', 'VBD'), # past tense verbs
(r'.*', 'NN') # nouns (default)
])
u_gram_tag=nltk.UnigramTagger(b_t_sents,backoff=default_tagger)
b_gram_tag=nltk.BigramTagger(b_t_sents,backoff=u_gram_tag)
t_gram_tag=nltk.TrigramTagger(b_t_sents,backoff=b_gram_tag)
##pos of given text
f_read=open(sys.argv[1],'r')
given_text=f_read.read();
segmented_lines=nltk.sent_tokenize(given_text)
for text in segmented_lines:
words=word_tokenize(text)
sent = t_gram_tag.tag(words)
print ne_chunk(sent)
tri_gram()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.