简体   繁体   中英

Speed up CoreNLP Sentiment Analysis

Can anybody think of a way to speed up my CoreNLP Sentiment Analysis (below)?

I initialize the CoreNLP pipeline once on server startup:

// Initialize the CoreNLP text processing pipeline
public static Properties props = new Properties();
public static StanfordCoreNLP pipeline;

// Set text processing pipeline's annotators
props.setProperty("annotators", "tokenize, ssplit, pos, parse, sentiment");
// Use Shift-Reduce Constituency Parsing (O(n),
// http://nlp.stanford.edu/software/srparser.shtml) vs CoreNLP's default
// Probabilistic Context-Free Grammar Parsing (O(n^3))
props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
pipeline = new StanfordCoreNLP(props);

Then I call the pipeline from my Controller:

String text = 'A sample string.'
Annotation annotation = pipeline.process(text);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
    Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
    int sentiment = RNNCoreAnnotations.getPredictedClass(tree);
    ...
}

I've profiled the code -- the line Annotation annotation = pipeline.process(text) , which is CoreNLP's main processing call, is very slow. A request with 100 calls to my controller takes an average of 1.07 seconds. The annotation is taking ~7ms per call. I need to reduce that to ~2ms.

I can't remove any of the annotators because sentiment relies on all of them. I'm already using the Shift-Reduce Constituency Parser because it is much faster than the default Context-Free Grammar Parser.

Are there any other parameters I can tune to significantly speed this up?

Having the same issue. I've also tried the SR Beam, which was even slower than the PCFG! Based on Stanford benchmarks, SR Beam should be much faster than PCFG, and only slightly slower than SR.

I guess other than using the SR parser instead of the PCFG, the only remaining way to improve the speed might be playing with the tokenizer options...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM