繁体   English   中英

从NER获取全名

[英]Getting full names from NER

通过阅读文档并使用API​​,看起来CoreNLP会告诉我每个令牌的NER标签,但这无助于我从句子中提取全名。 例如:

Input: John Wayne and Mary have coffee
CoreNLP Output: (John,PERSON) (Wayne,PERSON) (and,O) (Mary,PERSON) (have,O) (coffee,O)
Desired Result: list of PERSON ==> [John Wayne, Mary]

除非我错过了一些标志,否则我相信要做到这一点,我将需要解析标记并将随后标记为PERSON的连续标记粘合在一起。

有人可以确认这确实是我需要做的吗? 我最想知道的是CoreNLP中是否有某些标志或实用程序可以为我做类似的事情。 如果有人拥有可以执行此操作并希望共享的实用程序(理想情况下为Java,因为我正在使用Java API),即可获得奖励积分:

谢谢!

PS:有一个非常类似的问题在这里 ,这似乎表明答案是“滚你自己”,但它从来没有得到任何人的证实。

您可能正在寻找实体提及,而不是NER标记。 例如,使用简单API

new Sentence("Jimi Hendrix was the greatest").nerTags()
[PERSON, PERSON, O, O, O]

new Sentence("Jimi Hendrix was the greatest").mentions()
[Jimi Hendrix]

上面的链接提供了一个使用传统的StanfordCoreNLP管道的传统非简单API的示例

这在此链接的基本Java API示例中显示:

https://stanfordnlp.github.io/CoreNLP/api.html

这是完整的Java API示例,其中有关于实体提及的部分:

import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.ie.util.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.*;
import java.util.*;


public class BasicPipelineExample {

  public static String text = "Joe Smith was born in California. " +
      "In 2017, he went to Paris, France in the summer. " +
      "His flight left at 3:00pm on July 10th, 2017. " +
      "After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
      "He sent a postcard to his sister Jane Smith. " +
      "After hearing about Joe's trip, Jane decided she might go to France one day.";

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = new Properties();
    // set the list of annotators to run
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
    // set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
    props.setProperty("coref.algorithm", "neural");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = new CoreDocument(text);
    // annnotate the document
    pipeline.annotate(document);
    // examples

    // 10th token of the document
    CoreLabel token = document.tokens().get(10);
    System.out.println("Example: token");
    System.out.println(token);
    System.out.println();

    // text of the first sentence
    String sentenceText = document.sentences().get(0).text();
    System.out.println("Example: sentence");
    System.out.println(sentenceText);
    System.out.println();

    // second sentence
    CoreSentence sentence = document.sentences().get(1);

    // list of the part-of-speech tags for the second sentence
    List<String> posTags = sentence.posTags();
    System.out.println("Example: pos tags");
    System.out.println(posTags);
    System.out.println();

    // list of the ner tags for the second sentence
    List<String> nerTags = sentence.nerTags();
    System.out.println("Example: ner tags");
    System.out.println(nerTags);
    System.out.println();

    // constituency parse for the second sentence
    Tree constituencyParse = sentence.constituencyParse();
    System.out.println("Example: constituency parse");
    System.out.println(constituencyParse);
    System.out.println();

    // dependency parse for the second sentence
    SemanticGraph dependencyParse = sentence.dependencyParse();
    System.out.println("Example: dependency parse");
    System.out.println(dependencyParse);
    System.out.println();

    // kbp relations found in fifth sentence
    List<RelationTriple> relations =
        document.sentences().get(4).relations();
    System.out.println("Example: relation");
    System.out.println(relations.get(0));
    System.out.println();

    // entity mentions in the second sentence
    List<CoreEntityMention> entityMentions = sentence.entityMentions();
    System.out.println("Example: entity mentions");
    System.out.println(entityMentions);
    System.out.println();

    // coreference between entity mentions
    CoreEntityMention originalEntityMention = document.sentences().get(3).entityMentions().get(1);
    System.out.println("Example: original entity mention");
    System.out.println(originalEntityMention);
    System.out.println("Example: canonical entity mention");
    System.out.println(originalEntityMention.canonicalEntityMention().get());
    System.out.println();

    // get document wide coref info
    Map<Integer, CorefChain> corefChains = document.corefChains();
    System.out.println("Example: coref chains for document");
    System.out.println(corefChains);
    System.out.println();

    // get quotes in document
    List<CoreQuote> quotes = document.quotes();
    CoreQuote quote = quotes.get(0);
    System.out.println("Example: quote");
    System.out.println(quote);
    System.out.println();

    // original speaker of quote
    // note that quote.speaker() returns an Optional
    System.out.println("Example: original speaker of quote");
    System.out.println(quote.speaker().get());
    System.out.println();

    // canonical speaker of quote
    System.out.println("Example: canonical speaker of quote");
    System.out.println(quote.canonicalSpeaker().get());
    System.out.println();

  }

}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM