简体   繁体   中英

Use StanfordCoreNLP in parallel

This thread contains a nice example on how to use a wrapper for Stanfords CoreNLP library. Here is the exmaple I am using:

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')
res = nlp.annotate("I love you. I hate him. You are nice. He is dumb",
                   properties={
                       'annotators': 'sentiment',
                       'outputFormat': 'json',
                       'timeout': 1000,
                   })
for s in res["sentences"]:
    print("%d: '%s': %s %s" % (
        s["index"],
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

Say I have +10000 sentences that I want to analyze like in this example. Is it possible to process these in parallel and multithread it?

Not sure about this approach. In java I have singleton class setup with corenlp and the pipeline I want to use. I then call a method on the singleton, with multiple threads using the same instance, that takes a few sentences and it annotates them and does some work with the result. So this type of multi threading does work. I have been doing this for a few years and have no issues

Could you re factor your code to do this? So setup your pipeline then call annotate on a few sentences at a time with your thread pool? Shouldn't be too much effort.

Hope that makes sense.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM