Stanford-Core-NLP为文本标记化提供Java错误

Question

因此，我尝试使用StanfordCore NLP来对文本进行标记化，以使用此git repo进行文本汇总。 我已经为Java-8设置了环境变量，并且正在使用python 2.7。 当我运行此命令时：

echo "This is text tokenization" | java -cp C:\Users\Harshit\Downloads\stanford-corenlp-full-2016-10-31\stanford-corenlp-full-2016-10-31\stanford-corenlp-3.7.0.jar\ edu.stanford.nlp.process.PTBTokenizer.class

它工作正常，并给出如下输出：

“这个

是

文本

标记化”

但是当我使用命令时：

python make_datafiles.py /path/to/cnn/stories /path/to/dailymail/stories.

我收到此错误：

'"java -cp"' is not recognized as an internal or external command,
operable program or batch file.
Exception: The tokenized stories directory cnn_stories_tokenized contains 0 files, but it should contain the same number as C:\Users\Harshit\Downloads\cnn_stories_tokenized\cnn_stories_tokenized (which has 92579 files). Was there an error during tokenization?

如何解决此问题并标记数据文件？

Answer 1

您能否检查Java路径是否正确配置？

检查Java路径的步骤：

转到cmd。
Java版本
屏幕上应显示Java版本，例如“ java version 1.x.xxx”
如果没有，请配置Java路径。 您可以从下面的链接获取帮助以配置Java路径用于Java安装的环境变量

Stanford-Core-NLP为文本标记化提供Java错误

问题描述

1 个解决方案

解决方案1
0 2018-11-12 12:16:06

Stanford-Core-NLP为文本标记化提供Java错误

问题描述

1 个解决方案

解决方案1 0 2018-11-12 12:16:06

解决方案1
0 2018-11-12 12:16:06