简体   繁体   English

stanford core nlp java输出

[英]stanford core nlp java output

I'm a newbie with Java and Stanford NLP toolkit and trying to use them for a project. 我是Java和Stanford NLP工具包的新手,并尝试将它们用于项目。 Specifically, I'm trying to use Stanford Corenlp toolkit to annotate a text (with Netbeans and not command line) and I tried to use the code provided on http://nlp.stanford.edu/software/corenlp.shtml#Usage (Using the Stanford CoreNLP API).. question is: can anybody tell me how I can get the output in a file so that I can further process it? 具体来说,我正在尝试使用Stanford Corenlp工具包来注释文本(使用Netbeans而不是命令行),我尝试使用http://nlp.stanford.edu/software/corenlp.shtml#Usage上提供的代码(使用Stanford CoreNLP API)..问题是:有人能告诉我如何在文件中获取输出以便我可以进一步处理它吗?

I've tried printing the graphs and the sentence to the console, just to see the content. 我已经尝试将图形和句子打印到控制台,只是为了查看内容。 That works. 这样可行。 Basically what I'd need is to return the annotated document, so that I can call it from my main class and output a text file (if that's possible). 基本上我需要的是返回带注释的文档,这样我就可以从我的主类中调用它并输出一个文本文件(如果可能的话)。 I'm trying to look in the API of stanford corenlp, but I don't really know what is the best way to return such kind of information, given my lack of experience. 我正在尝试查看stanford corenlp的API,但由于缺乏经验,我不知道返回此类信息的最佳方法是什么。

Here is the code: 这是代码:

Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = "the quick fox jumps over the lazy dog";

    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);

    // run all Annotators on this text
    pipeline.annotate(document);

    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(CoreMap sentence: sentences) {
      // traversing the words in the current sentence
      // a CoreLabel is a CoreMap with additional token-specific methods
      for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
        // this is the text of the token
        String word = token.get(TextAnnotation.class);
        // this is the POS tag of the token
        String pos = token.get(PartOfSpeechAnnotation.class);
        // this is the NER label of the token
        String ne = token.get(NamedEntityTagAnnotation.class);       
      }

      // this is the parse tree of the current sentence
      Tree tree = sentence.get(TreeAnnotation.class);

      // this is the Stanford dependency graph of the current sentence
      SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
    }

    // This is the coreference link graph
    // Each chain stores a set of mentions that link to each other,
    // along with a method for getting the most representative mention
    // Both sentence and token offsets start at 1!
    Map<Integer, CorefChain> graph = 
      document.get(CorefChainAnnotation.class);

Once you have any or all of the natural language analyses shown in your code example, all you need to do is send them to a file in the normal Java fashion, eg, with a FileWriter for text format output. 一旦您拥有代码示例中显示的任何或所有自然语言分析,您需要做的就是以普通的Java方式将它们发送到文件,例如,使用FileWriter进行文本格式输出。 Concretely, here's a simple complete example that shows output sent to files (if you give it appropriate command-line arguments): 具体来说,这是一个简单的完整示例,显示发送到文件的输出(如果您给它适当的命令行参数):

import java.io.*;
import java.util.*;

import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

public class StanfordCoreNlpDemo {

  public static void main(String[] args) throws IOException {
    PrintWriter out;
    if (args.length > 1) {
      out = new PrintWriter(args[1]);
    } else {
      out = new PrintWriter(System.out);
    }
    PrintWriter xmlOut = null;
    if (args.length > 2) {
      xmlOut = new PrintWriter(args[2]);
    }

    StanfordCoreNLP pipeline = new StanfordCoreNLP();
    Annotation annotation;
    if (args.length > 0) {
      annotation = new Annotation(IOUtils.slurpFileNoExceptions(args[0]));
    } else {
      annotation = new Annotation("Kosgi Santosh sent an email to Stanford University. He didn't get a reply.");
    }

    pipeline.annotate(annotation);
    pipeline.prettyPrint(annotation, out);
    if (xmlOut != null) {
      pipeline.xmlPrint(annotation, xmlOut);
    }
    // An Annotation is a Map and you can get and use the various analyses individually.
    // For instance, this gets the parse tree of the first sentence in the text.
    List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
    if (sentences != null && sentences.size() > 0) {
      CoreMap sentence = sentences.get(0);
      Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
      out.println();
      out.println("The first sentence parsed is:");
      tree.pennPrint(out);
    }
  }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM