斯坦福核心NLP - 理解共指解析

Question

我在理解上一版斯坦福NLP工具中對coref解析器所做的更改時遇到了一些麻煩。 例如，下面是一個句子和相應的CorefChainAnnotation：

The atom is a basic unit of matter, it consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.

{1=[1 1, 1 2], 5=[1 3], 7=[1 4], 9=[1 5]}

我不確定我理解這些數字的含義。 查看源代碼也沒有任何幫助。

謝謝

Answer 1

我一直在使用coreference依賴圖，我開始使用這個問題的另一個答案。 過了一會兒，雖然我意識到上面這個算法並不完全正確。 它產生的輸出甚至不接近我的修改版本。

對於使用這篇文章的任何人來說，這里是我最終得到的算法，它也過濾掉了自引用，因為每個代表性的注意事項也提到了自己，並且很多提及僅引用自己。

Map<Integer, CorefChain> coref = document.get(CorefChainAnnotation.class);

for(Map.Entry<Integer, CorefChain> entry : coref.entrySet()) {
    CorefChain c = entry.getValue();

    //this is because it prints out a lot of self references which aren't that useful
    if(c.getCorefMentions().size() <= 1)
        continue;

    CorefMention cm = c.getRepresentativeMention();
    String clust = "";
    List<CoreLabel> tks = document.get(SentencesAnnotation.class).get(cm.sentNum-1).get(TokensAnnotation.class);
    for(int i = cm.startIndex-1; i < cm.endIndex-1; i++)
        clust += tks.get(i).get(TextAnnotation.class) + " ";
    clust = clust.trim();
    System.out.println("representative mention: \"" + clust + "\" is mentioned by:");

    for(CorefMention m : c.getCorefMentions()){
        String clust2 = "";
        tks = document.get(SentencesAnnotation.class).get(m.sentNum-1).get(TokensAnnotation.class);
        for(int i = m.startIndex-1; i < m.endIndex-1; i++)
            clust2 += tks.get(i).get(TextAnnotation.class) + " ";
        clust2 = clust2.trim();
        //don't need the self mention
        if(clust.equals(clust2))
            continue;

        System.out.println("\t" + clust2);
    }
}

您的例句的最終輸出如下：

representative mention: "a basic unit of matter" is mentioned by:
The atom
it

通常“原子”最終成為代表性的提及，但在這種情況下它並不令人驚訝。 輸出稍微更精確的另一個例子是以下句子：

革命戰爭發生在18世紀，這是美國的第一次戰爭。

產生以下輸出：

representative mention: "The Revolutionary War" is mentioned by:
it
the first war in the United States

Answer 2

第一個數字是一個集群ID（代表標記，代表同一個實體），參見SieveCoreferenceSystem#coref(Document)源代碼。 對數字不在CorefChain＃toString（）中：

public String toString(){
    return position.toString();
}

其中position是一組提到實體的CorefChain.getCorefMentions()對（讓他們使用CorefChain.getCorefMentions() ）。 這是一個完整代碼（在groovy中）的示例，它顯示了如何從位置到令牌：

class Example {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
        props.put("dcoref.score", true);
        pipeline = new StanfordCoreNLP(props);
        Annotation document = new Annotation("The atom is a basic unit of matter, it   consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.");

        pipeline.annotate(document);
        Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);

        println aText

        for(Map.Entry<Integer, CorefChain> entry : graph) {
          CorefChain c =   entry.getValue();                
          println "ClusterId: " + entry.getKey();
          CorefMention cm = c.getRepresentativeMention();
          println "Representative Mention: " + aText.subSequence(cm.startIndex, cm.endIndex);

          List<CorefMention> cms = c.getCorefMentions();
          println  "Mentions:  ";
          cms.each { it -> 
              print aText.subSequence(it.startIndex, it.endIndex) + "|"; 
          }         
        }
    }
}

輸出（我不明白's'來自哪里）：

The atom is a basic unit of matter, it consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.
ClusterId: 1
Representative Mention: he
Mentions: he|atom |s|
ClusterId: 6
Representative Mention:  basic unit 
Mentions:  basic unit |
ClusterId: 8
Representative Mention:  unit 
Mentions:  unit |
ClusterId: 10
Representative Mention: it 
Mentions: it |

Answer 3

這些是注釋器的最新結果。

[1,1] 1原子
[1,2] 1一個基本的物質單位
[1,3] 1它
[1,6] 6個帶負電荷的電子
[1,5] 5帶負電的電子雲

標記如下：

[Sentence number,'id']  Cluster_no  Text_Associated

屬於同一群集的文本指的是相同的上下文。

斯坦福核心NLP - 理解共指解析

問題描述

3 個解決方案

解決方案1
17 2011-12-16 13:43:58

解決方案2
8 已采納 2011-07-06 12:42:35

解決方案3
0 2017-07-18 07:00:50

斯坦福核心NLP - 理解共指解析

問題描述

3 個解決方案

解決方案1 17 2011-12-16 13:43:58

解決方案2 8 已采納 2011-07-06 12:42:35

解決方案3 0 2017-07-18 07:00:50

解決方案1
17 2011-12-16 13:43:58

解決方案2
8 已采納 2011-07-06 12:42:35

解決方案3
0 2017-07-18 07:00:50