corenlp 的共指解析太長了

Question

我有一個問題，我想解決一個文檔的共指問題，我正在嘗試運行以下鏈接提供的示例

import edu.stanford.nlp.hcoref.CorefCoreAnnotations;
import edu.stanford.nlp.hcoref.data.CorefChain;
import edu.stanford.nlp.hcoref.data.Mention;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;

import java.util.Properties;

public class CorefExample {

  public static void main(String[] args) throws Exception {

    Annotation document = new Annotation("Barack Obama was born in Hawaii.  He is the president.  Obama was elected in 2008.");
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    pipeline.annotate(document);
    System.out.println("---");
    System.out.println("coref chains");
    for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
      System.out.println("\t"+cc);
    }
    for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
      System.out.println("---");
      System.out.println("mentions");
      for (Mention m : sentence.get(CorefCoreAnnotations.CorefMentionsAnnotation.class)) {
        System.out.println("\t"+m);
       }
    }
  }
}

只需要解決一個句子，這大約是我的程序運行的一個小時。 正常嗎？ 我花了大約一個小時才得到結果

我已經用這個選項運行了程序：-Xmx4g

Answer 1

您是否嘗試使用 6GB 內存？ 在文檔中，他們提到新版本的 CoreNLP 使用神經網絡進行共指解析，因此它會比基於規則的算法慢，並且需要更多的 RAM。 在我的情況下，它很慢，兩個句子和 4 GB RAM 的內存不足。

英文的神經系統示例命令使用 5GB 用於 3.7.0 CoreNLP 版本：

java -Xmx5g -cp stanford-corenlp-3.7.0.jar:stanford-corenlp-models-3.7.0.jar:* edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,mention,coref -coref.algorithm neural -file example_file.txt

您也可以嘗試使用參數指定您喜歡哪種算法：

coref.algorithm

例如，

PropertiesUtils.asProperties("annotators", "your annotators","coref.algorithm","neural");

有三種可用的方法可供選擇。

在基本示例代碼中，他們使用“ dcoref ”而不是“ coref ”作為共指解析注釋器，這是確定性方法，速度更快，准確度較低。

Answer 2

我沒有遇到這個特殊問題，但使用 CoreNLP 和 coref 作為注釋器屬性也耗盡了堆空間。 問題是我多次創建new StanfordCoreNLP(props) ，而不是使用相同的對象。

corenlp 的共指解析太長了

問題描述

2 個解決方案

解決方案1
0 2017-04-07 14:18:24

解決方案2
0 2021-02-04 09:54:59

corenlp 的共指解析太長了

問題描述

2 個解決方案

解決方案1 0 2017-04-07 14:18:24

解決方案2 0 2021-02-04 09:54:59

解決方案1
0 2017-04-07 14:18:24

解決方案2
0 2021-02-04 09:54:59