Stanford CorpNLP返回錯誤結果

Question

在這個問題之后，我正在嘗試使用斯坦福大學的corenlp進行殘局化。 我的環境是：

Java 1.7
Eclipse 3.4.0
StandfordCoreNLP版本3.4.1（從此處下載）。

我的代碼段是：

//...........lemmatization starts........................

    Properties props = new Properties(); 
    props.put("annotators", "tokenize, ssplit, pos, lemma"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
    String text = "painting"; 
    Annotation document = pipeline.process(text);  

    List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(edu.stanford.nlp.util.CoreMap sentence: sentences) 

    {    
        for(CoreLabel token: sentence.get(TokensAnnotation.class))
        {       
            String word = token.get(TextAnnotation.class);      
            String lemma = token.get(LemmaAnnotation.class); 
            System.out.println("lemmatized version :" + lemma);
        }
    }

    //...........lemmatization ends.........................

我得到的輸出是：-

lemmatized version :painting

我在哪里

lemmatized version :paint

請賜教。

Answer 1

本例中的問題是，字繪畫可以現在分詞的油漆或名詞和lemmatizer的輸出取決於分配給原始字的部分的語音標簽。

如果僅在片段畫上運行標記器，則沒有上下文可以幫助標記器（或人類）決定應如何標記單詞。 在這種情況下，它選擇了標記NN ，而名詞繪畫的引理實際上是繪畫。

如果您在句子“我在畫花”上運行相同的代碼。 標記者應該正確地將繪畫標記為VBG ，lemmatizer應該返回繪畫。

Stanford CorpNLP返回錯誤結果

問題描述

1 個解決方案

解決方案1
2 已采納 2015-02-23 18:58:30

Stanford CorpNLP返回錯誤結果

問題描述

1 個解決方案

解決方案1 2 已采納 2015-02-23 18:58:30

解決方案1
2 已采納 2015-02-23 18:58:30