如何使用Stanford NLP解析器中的BagOfWordsAnnotation？

Question

我在注释列表中找不到与词袋有关的任何内容。 我发现有一个用于获取单词袋的注释类，我假设它被用作：

coreMap.get(CoreAnnotations.BagOfWordsAnnotation.class);

但我不知道应该启用哪个注释器。 到目前为止，我已经尝试过：

tokenize, ssplit, pos, lemma, ner, parse, sentiment, natlog, openie

但没有运气。

如何使用Stanford NLP解析器中的BagOfWordsAnnotation ？

Answer 1

那不是仅使用标记化注释的输出吗？ 还是更复杂的词法化输出？ （取决于您的用例）像这样，例如：

Properties props;
props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
pipeline = new StanfordCoreNLP(props);

public static List<String> lemmatize(String documentText)
{
    List<String> lemmas = new LinkedList<String>();
    Annotation document = new Annotation(documentText);

    pipeline.annotate(document);
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);
    for(CoreMap sentence: sentences) {
        for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
            lemmas.add(token.get(LemmaAnnotation.class));
        }
    }
    return lemmas;
}

我从未听说过该注释器，如果存在，我会感到有些惊讶，因为它基本上是令牌化，也许还添加了一些停用词剥离功能，您可以轻松地自己动手，或者使用其他（较少面向NLP的代码，像Lucene这样的更面向IR的软件包。

如何使用Stanford NLP解析器中的BagOfWordsAnnotation？

问题描述

1 个解决方案

解决方案1
0 2017-07-17 15:06:33

如何使用Stanford NLP解析器中的BagOfWordsAnnotation？

问题描述

1 个解决方案

解决方案1 0 2017-07-17 15:06:33

解决方案1
0 2017-07-17 15:06:33