简体   繁体   中英

How can I run command line CoreNLP in multi-threads?

I need to parse a lot of documents (around 0.3 million). As suggested in the stanford web, I created a file named filelist.txt which contains paths of all the files to be parsed.

https://stanfordnlp.github.io/CoreNLP/cmdline.html

Then I called the CoreNLP as below.

java -mx20g -cp "$SCRIPT/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse -ssplit.eolonly tokenize.whitespace true -filelist filelist.txt -outputDirectory $OUTDIR

But the CPU usage is just 100%, which means CoreNLP seems not to use multi-threads. Thus, the parsing is too slow (approximately 10sec per document.).

When I run CoreNLP without -filelist option, it runs as multi-threads.

Is there any options or ways to use multi threads in CoreNLP?

我相信命令行参数-threads k应该在k线程上注释文件列表。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM