简体   繁体   中英

Stanford Sentiment Analysis is biased towards negative?

I am doing some research into the existing sentiment analyzer apps. I am currently looking at Stanford CoreNlp/Sentiment Analysis 3.8.0 and what I noticed on my test data is the predictions seem to be biased towards the negative. Here are a few examples that come back Negative:

  1. NY is where I ultimately want to spend my teaching career and the opportunity was too good to refuse. - Negative
  2. I understand it is a duty to be an effective and influential teacher yet I am eager to put forth the hours before, during and after school hours to make certain I am an available resource to my students. - Negative
  3. From my personal experience, I've learned many necessary life skills in the classroom and my most influential teachers were my motivators and supporters. - Negative

I checked and there is just one possible model to use (so I don't think there are any levers to push there - I don't want to train a model). I could use a different/better(maybe?) POS and that could give me a different prediction, but I am a bit mystified as all the blogs/comments I read about Stanford's library were positive and my results are so bad. Am I missing something?

The code:

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    Annotation document = pipeline.process(text);
    pipeline.annotate(document);

    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
    int mainSentiment=0; int longest = 0;
    SimpleMatrix matrix = null;
    for (CoreMap sentence : sentences) {
        String s_sentiment = sentence.get(SentimentCoreAnnotations.SentimentClass.class);

        Tree tree = sentence
                .get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
        int sentiment = RNNCoreAnnotations.getPredictedClass(tree);
        matrix = RNNCoreAnnotations.getPredictions(tree);

        System.out.println(sentence);
        System.out.println(sentiment + "-" +s_sentiment + "\t" + matrix.elementMaxAbs());
    }

Possible values for the scores: 0 Very Negative 1 Negative 2 Neutral 3 Positive 4 Very Positive

If you are using this library in a production application are you finding the results reliable to drive actions off of it?

First of all, as of version 3.3.1 there is not just one model to pass as an argument to the option sentiment.model but rather two (sadly, this doesn't seem to be mentioned anywhere on the site):

  • A four-class model ( Very negative , Negative , Neutral , Positive , Very positive ) edu/stanford/nlp/models/sentiment/sentiment.ser.gz
  • A two-class model ( Negative , Neutral , Positive ) edu/stanford/nlp/models/sentiment/sentiment.binary.ser.gz

This is not part of the standard model set but rather the additional models-english model ; In order to use it, you need to obtain it, which could be documented a bit better. The appropriate Maven artifact dependency would be

<dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>${stanford-corenlp.version}</version>
        <classifier>models-english</classifier>
        <scope>runtime</scope>
</dependency>

As described in their 2013 paper , they used a corpus of movie reviews to create their model(s), and it's very possible that this data is sub-optimal for analyzing the type of language you are: For example, looking for too good to refuse in their corpus gives no results at all despite it being a relatively common term.

I myself have also tried to use their pre-trained models to analyze conversational language with results that weren't bad but weren't amazing either: The accuracy of just creating a list of positive and negative patterns and looking for them in my texts was not significantly different from that of using the sentiment analyzer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM