简体   繁体   中英

Regex parser for a Spanish text

I am trying to define a grammar in order to retrieve quantity and fruit from a text with Regex parser. Apparently there is a problem in the grammar because in the result I can only see the quantity. I paste below an example text and the code I am using. The HMM tagger was trained with cess_esp corpus.

grammar = r""" 
  fruits: {<NCFP000>} 
  quantity:{<Z>}
"""
regex_parser = nltk.RegexpParser(grammar)
cp = nltk.RegexpParser(grammar)
example=['quiero 3 cervezas']

for sent in example:
    tokens = nltk.word_tokenize(sent)
    taggex = hmm_tagger.tag(tokens)
print(taggex)
result = cp.parse(taggex)
result.draw()

Try to use NLTK tagger instead of Markov one:

taggex = nltk.pos_tag(tokens)

I checked it and it should work on your code as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM