简体   繁体   中英

How do I extract noun/ verbal phrases for portuguese?

I've found various tools to extract verbal and noun phrases in English, including in some questions here in stackoverflow. Yet, the techniques I've found only seem to work for English texts. I've tried spacy and textblob but they won't return anything for Portuguese texts (works perfectly in English).

Here is what I've tried for Portuguese: Spacy to extract specific noun phrase The chunk in doc.noun_chunks works perfectly for English, but does anyone knows an already existent technique for Portuguese? I'm searching everywhere I know.

noun_chunks is implemented for each language individually because the base noun phrases will look different: what order do determiners and adjectives appear in, what are the relevant dependency relations and part-of-speech tags, etc.

Some of the minor details may be different, but I would guess that Portuguese noun chunks are fairly similar to Spanish noun chunks, so you could use the Spanish noun chunks iterator as a starting point. Both Spanish and Portuguese use dependency relations and simple POS tags from Universal Dependencies so I hope it would be easy to adapt.

Spacy doesn't have any built-in verb phrase extractors, but the basic idea would be similar to noun chunks: define patterns based on POS tags and dependency trees to identify the phrases you want to extract.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM