简体   繁体   English

如何使用Stanford CoreNLP从评论中获得总体情绪

[英]How to achieve the overall sentiment from a review using Stanford CoreNLP

I am using Stanford CoreNLP in order to obtain the sentiment analysis of 25,000 movie reviews. 我正在使用Stanford CoreNLP来获取对25,000条电影评论的情感分析。 However, I have achieved getting the sentiment of each sentence, of each review, but I was wondering if anyone knew how I could get the sentiment of the overall review instead of each sentence in the review? 但是,我已经获得了每个句子,每个评论的情感,但是我想知道是否有人知道如何获得整个评论的情感,而不是评论中的每个句子?

The code im using is: 我使用的代码是:

import java.io.*;
import java.util.*;

import edu.stanford.nlp.coref.CorefCoreAnnotations;

import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

/** This class demonstrates building and using a Stanford CoreNLP pipeline. */
public class sentimentMain {

  /** Usage: java -cp "*" StanfordCoreNlpDemo [inputFile [outputTextFile [outputXmlFile]]] */
  public static void main(String[] args) throws IOException {
    // set up optional output files
    PrintWriter out;
    if (args.length > 1) {
      out = new PrintWriter(args[1]);
    } else {
      out = new PrintWriter(System.out);
    }
    PrintWriter xmlOut = null;
    if (args.length > 2) {
      xmlOut = new PrintWriter(args[2]);
    }

    // Create a CoreNLP pipeline. To build the default pipeline, you can just use:
    //   StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // Here's a more complex setup example:
    //   Properties props = new Properties();
    //   props.put("annotators", "tokenize, ssplit, pos, lemma, ner, depparse");
    //   props.put("ner.model", "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz");
    //   props.put("ner.applyNumericClassifiers", "false");
    //   StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // Add in sentiment
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref, sentiment");

    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    File[] files = new File("C:/stanford-corenlp-full-2016-10-31/dataset").listFiles();

    String line = null;

    try{
        for (File file : files) {
            if (file.exists()) {
                BufferedReader in = new BufferedReader(new FileReader(file));
                while((line = in.readLine()) != null)
                {
                    Annotation document = new Annotation(line);

                    // run all the selected Annotators on this text
                    pipeline.annotate(document);

                    // this prints out the results of sentence analysis to file(s) in good formats
                    pipeline.prettyPrint(document, out);
                    if (xmlOut != null) {
                      pipeline.xmlPrint(document, xmlOut);
                    }

                    // An Annotation is a Map with Class keys for the linguistic analysis types.
                    // You can get and use the various analyses individually.
                    // For instance, this gets the parse tree of the first sentence in the text.
                    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
                    if (sentences != null && ! sentences.isEmpty()) {
                      CoreMap sentence = sentences.get(0);
                      /*out.println("The keys of the first sentence's CoreMap are:");
                      out.println(sentence.keySet());
                      out.println();
                      out.println("The first sentence is:");
                      out.println(sentence.toShorterString());
                      out.println();
                      out.println("The first sentence tokens are:");*/
                      for (CoreMap token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
                        //out.println(token.toShorterString());
                      }
                      Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
                      //out.println();
                      //out.println("The first sentence parse tree is:");
                      tree.pennPrint(out);
                      //out.println();
                      //out.println("The first sentence basic dependencies are:");
                      //out.println(sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class).toString(SemanticGraph.OutputFormat.LIST));
                      //out.println("The first sentence collapsed, CC-processed dependencies are:");
                      SemanticGraph graph = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
                      //out.println(graph.toString(SemanticGraph.OutputFormat.LIST));

                      // Access coreference. In the coreference link graph,
                      // each chain stores a set of mentions that co-refer with each other,
                      // along with a method for getting the most representative mention.
                      // Both sentence and token offsets start at 1!
                      //out.println("Coreference information");
                      Map<Integer, CorefChain> corefChains =
                          document.get(CorefCoreAnnotations.CorefChainAnnotation.class);
                      if (corefChains == null) { return; }
                      for (Map.Entry<Integer,CorefChain> entry: corefChains.entrySet()) {
                        //out.println("Chain " + entry.getKey());
                        for (CorefChain.CorefMention m : entry.getValue().getMentionsInTextualOrder()) {
                          // We need to subtract one since the indices count from 1 but the Lists start from 0
                          List<CoreLabel> tokens = sentences.get(m.sentNum - 1).get(CoreAnnotations.TokensAnnotation.class);
                          // We subtract two for end: one for 0-based indexing, and one because we want last token of mention not one following.
                          /*out.println("  " + m + ", i.e., 0-based character offsets [" + tokens.get(m.startIndex - 1).beginPosition() +
                                  ", " + tokens.get(m.endIndex - 2).endPosition() + ")");*/
                        }
                      }
                      //out.println();
                      out.println("The first sentence overall sentiment rating is " + sentence.get(SentimentCoreAnnotations.SentimentClass.class));
                    }
                }
                in.close();
                //showFiles(file.listFiles()); // Calls same method again.
            } else {
                System.out.println("File: " + file.getName() + file.toString());
            }
        }
    }catch(NullPointerException e){
        e.printStackTrace();
    }
    IOUtils.closeIgnoringExceptions(out);
    IOUtils.closeIgnoringExceptions(xmlOut);
  }

}

NOTE: most of the code was commented so only the relevant output was viewed on the console 注意:大多数代码都已注释,因此只能在控制台上查看相关的输出

情感模型仅设计为在句子上运行并返回句子的情感,我们没有获取完整文档情感的任何方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM