简体   繁体   English

JAVA:如何使用Gazettes与Stanford NLP?

[英]JAVA: How use Gazettes with Stanford NLP?

I read this faq but i not understand. 我读了这个常见问题,但我不明白。 I try with this code: 我尝试使用此代码:

   Properties pp=new Properties();  
   pp.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse");
   pp.put("ner.useSUTime","false");

   pp.put("useGazettes","true");
   pp.put("gazette","C:\\gaz.txt");

   StanfordCoreNLP s=new StanfordCoreNLP(pp);

This is String: "Dan became a member of the Music friends association in 2008" 这是字符串:“丹成为2008年音乐朋友协会的成员”

the gazette file is: 公报文件是:

  CLASS Music friends association 

But "Music friends association" is not recognized by NER. 但NER不承认“音乐朋友协会”。

Where am I wrong? 我哪里错了?

The answer is given there: 答案是:

If a gazette is used, this does not guarantee that words in the gazette are always used as a member of the intended class, and it does not guarantee that words outside the gazette will not be chosen. 如果使用宪报,则不保证宪报中的文字总是被用作预期类别的成员,并不保证不会选择公报之外的文字。 It simply provides another feature for the CRF to train against. 它只是为CRF提供了另一个训练功能。 If the CRF has higher weights for other features, the gazette features may be overwhelmed. 如果CRF对其他功能具有更高的权重,则公报功能可能会不堪重负。

So there is not guarantee that your phrase will be tagged in any way. 因此,无法保证您的短语会以任何方式被标记。 The alternative is 替代方案是

either the regexner or the tokensregex tools included in Stanford CoreNLP Stanford CoreNLP中包含的regexner或tokensregex工具

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM