简体   繁体   中英

Stanford coreNLP sentiment without splitting sentences

I have files I'm feeding to coreNLP's sentiment tagger. I have already broken the files up into individual sentences and thus want to return one tag per file. How can I make the java command return one tag.

The command looks like this java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin and outputs as follows:

Annotation pipeline timing information:
TokenizerAnnotator: 0.0 sec.
WordsToSentencesAnnotator: 0.0 sec.
TOTAL: 0.0 sec. for 8 tokens at 296.3 tokens/sec.
Pipeline setup: 0.0 sec.
Total time for StanfordCoreNLP pipeline: 8.7 sec.

C:\stanford-corenlp-full-2015-04-20>java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin
Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec].
Adding annotator sentiment
Reading in text from stdin.
Please enter one sentence per line.
Processing will end when EOF is reached.

Computer is fun. Not too fun.
  Positive
  Neutral

How could I make the output a single tag similar to what I did below by removing the punctuation:

Computer is fun Not too fun.
  Positive  

It seems I should be able to do this easily since there is the -ssplit.isOneSentence and to my understanding the sentiment tagger uses ssplit but I don't know how to rework my command to incorporate it (I have read command line documentation ).

It looks like there was a bug in SentimentPipeline as it shouldn't split sentences within a line when you use the -stdin option. I fixed that now but unless you compile your own version, this won't help you until we release the next version of CoreNLP.

But there is also an alternative (and presumably better) way to get sentiment labels for sentences using a CoreNLP pipeline.

The following command runs the same code as your command but at the same time it allows you to specify more options (including the -ssplit.eolonly option) for the individual annotators.

java -cp "*" -mx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators "tokenize,ssplit,parse,sentiment" -ssplit.eolonly

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM