简体   繁体   English

在Java的Stanford NLP解析器中设置选项

[英]Set options in the Stanford NLP Parser in Java

I am trying to use the Stanford NLP Parser to parse POS tagged data. 我正在尝试使用Stanford NLP解析器来解析POS标记的数据。 Since my data is already tagged and tokenized I am trying to use the setOptionFlags() method to inform the parser about this like, 由于我的数据已被标记和标记,因此我试图使用setOptionFlags()方法通知解析器有关此信息,

LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
lp.setOptionFlags(new String[]{"-sentences", "newline", "-tokenized", "-tagSeparator", "_", "-tokenizerFactory", "edu.stanford.nlp.process.WhitespaceTokenizer", "-tokenizerMethod", "newCoreLabelTokenizerFactory"});

However, I keep getting an exception, 但是,我不断遇到异常,

Exception in thread "main" java.lang.IllegalArgumentException: Unknown option: -sentences

I have searched online through the Javadocs provided and this is the way that it is done in their examples. 我已经通过提供的Javadocs在线搜索了,这是在他们的示例中完成的方法。 Please help! 请帮忙!

The options for tokenization, tag separator, etc. are not options for the parser, sensu stricto, but for the DocumentPreprocessor that is used to build input to the parser in the main method of LexicalizedParser . 用于标记化,标签分离器等的选项是不解析器,狭义的选项,但是对于DocumentPreprocessor是,用于建立输入到的主要方法解析器LexicalizedParser For the actual parser, the input is a list of tokens, and these are parsed. 对于实际的解析器,输入是令牌列表,并且将对这些令牌进行解析。 Hence, you can't give these options as parser options with setOptions() . 因此,您不能使用setOptions()将这些选项作为解析器选项使用。

If you've got a List of tokens, you can put them straight into the parser with this method in LexicalizedParser : public Tree parse(List<? extends HasWord> lst) . 如果你有令牌的列表,你可以把它们直接进入分析器用这种方法在LexicalizedParserpublic Tree parse(List<? extends HasWord> lst) If the items in the list implement HasTag (eg, a TaggedWord or a CoreLabel ) and have a non-null tag, then that will be used by the parser in parsing the sentence. 如果列表中的项目实现了HasTag (例如TaggedWordCoreLabel )并且具有非null标签,则解析器将在解析句子时使用该标签。

If you want to use a DocumentPreprocessor to split up text with tokenized tagged words, then you need to create a DocumentPreprocessor and then to set things up (a bit manually, sorry) with the methods like setTagDelimiter(String s) . 如果要使用DocumentPreprocessor将带有标记标记的单词的文本分割开,则需要创建一个DocumentPreprocessor ,然后使用setTagDelimiter(String s)类的方法(有点手动,抱歉)进行设置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM