![](/img/trans.png)
[英]How to get the equivalent Stanford-Core-nlp Python output in Java
[英]Stanford-Core-NLP giving Java errors for text tokenization
因此,我尝试使用StanfordCore NLP来对文本进行标记化,以使用此git repo进行文本汇总。 我已经为Java-8设置了环境变量,并且正在使用python 2.7。 当我运行此命令时:
echo "This is text tokenization" | java -cp C:\Users\Harshit\Downloads\stanford-corenlp-full-2016-10-31\stanford-corenlp-full-2016-10-31\stanford-corenlp-3.7.0.jar\ edu.stanford.nlp.process.PTBTokenizer.class
它工作正常,并给出如下输出:
“这个
是
文本
标记化”
但是当我使用命令时:
python make_datafiles.py /path/to/cnn/stories /path/to/dailymail/stories.
我收到此错误:
'"java -cp"' is not recognized as an internal or external command,
operable program or batch file.
Exception: The tokenized stories directory cnn_stories_tokenized contains 0 files, but it should contain the same number as C:\Users\Harshit\Downloads\cnn_stories_tokenized\cnn_stories_tokenized (which has 92579 files). Was there an error during tokenization?
如何解决此问题并标记数据文件?
您能否检查Java路径是否正确配置?
检查Java路径的步骤:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.