如何使用Stanford NLP解析器中的BagOfWordsAnnotation？

Question

我在注釋列表中找不到與詞袋有關的任何內容。 我發現有一個用於獲取單詞袋的注釋類，我假設它被用作：

coreMap.get(CoreAnnotations.BagOfWordsAnnotation.class);

但我不知道應該啟用哪個注釋器。 到目前為止，我已經嘗試過：

tokenize, ssplit, pos, lemma, ner, parse, sentiment, natlog, openie

但沒有運氣。

如何使用Stanford NLP解析器中的BagOfWordsAnnotation ？

Answer 1

那不是僅使用標記化注釋的輸出嗎？ 還是更復雜的詞法化輸出？ （取決於您的用例）像這樣，例如：

Properties props;
props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
pipeline = new StanfordCoreNLP(props);

public static List<String> lemmatize(String documentText)
{
    List<String> lemmas = new LinkedList<String>();
    Annotation document = new Annotation(documentText);

    pipeline.annotate(document);
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);
    for(CoreMap sentence: sentences) {
        for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
            lemmas.add(token.get(LemmaAnnotation.class));
        }
    }
    return lemmas;
}

我從未聽說過該注釋器，如果存在，我會感到有些驚訝，因為它基本上是令牌化，也許還添加了一些停用詞剝離功能，您可以輕松地自己動手，或者使用其他（較少面向NLP的代碼，像Lucene這樣的更面向IR的軟件包。

如何使用Stanford NLP解析器中的BagOfWordsAnnotation？

問題描述

1 個解決方案

解決方案1
0 2017-07-17 15:06:33

如何使用Stanford NLP解析器中的BagOfWordsAnnotation？

問題描述

1 個解決方案

解決方案1 0 2017-07-17 15:06:33

解決方案1
0 2017-07-17 15:06:33