简体   繁体   中英

How to get POS tags of compound words with stanford

I used Stanford POS Tagger to tag parts of speech in a sentence, i used the following code:

private static MaxentTagger tagger = new MaxentTagger(".../english-left3words-distsim.tagger");
String tags= tagger.tagString(st);   //st is a string 

That gives a result when words are not compound. But what I want is to get the POS Tag of compound words like "go back", computer science", "picking up".

Any ideas?

According to the documentation for the tagString method

"This method tokenizes the input into words"

Also, the models are trained to identify and tag words (tokens). Suggested solutions :

  1. Write a custom annotator that depends on (runs after) the POS tagger and when it finds a compound pattern eg "go back" it can annotate the first token with your custom annotator. You can identify these patterns by creating a dictionary and/or matching grammar patterns. The latter may additionally require the dependency parser .
  2. Use tokensregex . This offers the ability to implement regular expressions that operate on tokens and their annotations instead of characters.
  3. Train new models that can identify multi-token or compound words.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM