How to get POS tags of compound words with stanford

Question

I used Stanford POS Tagger to tag parts of speech in a sentence, i used the following code:

private static MaxentTagger tagger = new MaxentTagger(".../english-left3words-distsim.tagger");
String tags= tagger.tagString(st);   //st is a string

That gives a result when words are not compound. But what I want is to get the POS Tag of compound words like "go back", computer science", "picking up".

Any ideas?

Answer 1

According to the documentation for the tagString method

"This method tokenizes the input into words"

Also, the models are trained to identify and tag words (tokens). Suggested solutions :

Write a custom annotator that depends on (runs after) the POS tagger and when it finds a compound pattern eg "go back" it can annotate the first token with your custom annotator. You can identify these patterns by creating a dictionary and/or matching grammar patterns. The latter may additionally require the dependency parser .
Use tokensregex . This offers the ability to implement regular expressions that operate on tokens and their annotations instead of characters.
Train new models that can identify multi-token or compound words.

How to get POS tags of compound words with stanford

Question

1 answers

solution1
0 2015-09-08 12:47:15

How to get POS tags of compound words with stanford

Question

1 answers

solution1 0 2015-09-08 12:47:15

solution1
0 2015-09-08 12:47:15