简体   繁体   中英

JAVA: How use Gazettes with Stanford NLP?

I read this faq but i not understand. I try with this code:

   Properties pp=new Properties();  
   pp.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse");
   pp.put("ner.useSUTime","false");

   pp.put("useGazettes","true");
   pp.put("gazette","C:\\gaz.txt");

   StanfordCoreNLP s=new StanfordCoreNLP(pp);

This is String: "Dan became a member of the Music friends association in 2008"

the gazette file is:

  CLASS Music friends association 

But "Music friends association" is not recognized by NER.

Where am I wrong?

The answer is given there:

If a gazette is used, this does not guarantee that words in the gazette are always used as a member of the intended class, and it does not guarantee that words outside the gazette will not be chosen. It simply provides another feature for the CRF to train against. If the CRF has higher weights for other features, the gazette features may be overwhelmed.

So there is not guarantee that your phrase will be tagged in any way. The alternative is

either the regexner or the tokensregex tools included in Stanford CoreNLP

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM