Regex parser for a Spanish text

Question

I am trying to define a grammar in order to retrieve quantity and fruit from a text with Regex parser. Apparently there is a problem in the grammar because in the result I can only see the quantity. I paste below an example text and the code I am using. The HMM tagger was trained with cess_esp corpus.

grammar = r""" 
  fruits: {<NCFP000>} 
  quantity:{<Z>}
"""
regex_parser = nltk.RegexpParser(grammar)
cp = nltk.RegexpParser(grammar)
example=['quiero 3 cervezas']

for sent in example:
    tokens = nltk.word_tokenize(sent)
    taggex = hmm_tagger.tag(tokens)
print(taggex)
result = cp.parse(taggex)
result.draw()

Answer 1

Try to use NLTK tagger instead of Markov one:

taggex = nltk.pos_tag(tokens)

I checked it and it should work on your code as well.

Regex parser for a Spanish text

Question

1 answers

solution1
1 ACCPTED 2021-01-11 12:08:23

Regex parser for a Spanish text

Question

1 answers

solution1 1 ACCPTED 2021-01-11 12:08:23

solution1
1 ACCPTED 2021-01-11 12:08:23