I am trying lemmatization with stanford corenlp following this question. My environment is:-
my code snippet is:-
//...........lemmatization starts........................
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting";
Annotation document = pipeline.process(text);
List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);
for(edu.stanford.nlp.util.CoreMap sentence: sentences)
{
for(CoreLabel token: sentence.get(TokensAnnotation.class))
{
String word = token.get(TextAnnotation.class);
String lemma = token.get(LemmaAnnotation.class);
System.out.println("lemmatized version :" + lemma);
}
}
//...........lemmatization ends.........................
the output i get is:-
lemmatized version :painting
where i expect
lemmatized version :paint
Please enlighten me.
The problem in this example is that the word painting can be the present participle of to paint or a noun and the output of the lemmatizer depends on the part-of-speech tag assigned to the original word.
If you run the tagger only on the fragment painting , then there is no context that could help the tagger (or a human) to decide how the word should be tagged. In this case it picked the tag NN
and the lemma of the noun painting is in fact painting .
If you run the same code with the sentence "I am painting a flower." the tagger should correctly tag painting as VBG
and the lemmatizer should return paint .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.