Stanford CorpNLP returning wrong results

Question

I am trying lemmatization with stanford corenlp following this question. My environment is:-

Java 1.7
Eclipse 3.4.0
StandfordCoreNLP version 3.4.1 ( downloaded from here ).

my code snippet is:-

//...........lemmatization starts........................

    Properties props = new Properties(); 
    props.put("annotators", "tokenize, ssplit, pos, lemma"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
    String text = "painting"; 
    Annotation document = pipeline.process(text);  

    List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(edu.stanford.nlp.util.CoreMap sentence: sentences) 

    {    
        for(CoreLabel token: sentence.get(TokensAnnotation.class))
        {       
            String word = token.get(TextAnnotation.class);      
            String lemma = token.get(LemmaAnnotation.class); 
            System.out.println("lemmatized version :" + lemma);
        }
    }

    //...........lemmatization ends.........................

the output i get is:-

lemmatized version :painting

where i expect

lemmatized version :paint

Please enlighten me.

Answer 1

The problem in this example is that the word painting can be the present participle of to paint or a noun and the output of the lemmatizer depends on the part-of-speech tag assigned to the original word.

If you run the tagger only on the fragment painting , then there is no context that could help the tagger (or a human) to decide how the word should be tagged. In this case it picked the tag NN and the lemma of the noun painting is in fact painting .

If you run the same code with the sentence "I am painting a flower." the tagger should correctly tag painting as VBG and the lemmatizer should return paint .

Stanford CorpNLP returning wrong results

Question

1 answers

solution1
2 ACCPTED 2015-02-23 18:58:30

Stanford CorpNLP returning wrong results

Question

1 answers

solution1 2 ACCPTED 2015-02-23 18:58:30

solution1
2 ACCPTED 2015-02-23 18:58:30