简体   繁体   中英

How to use NNDEP parser in Stanford parser to process Chinese data

We are currently using the NNDEP parser in Stanford parser to process Chinese data, expecting to obtain useful syntax trees. Below is what we used to set the parameters:

java -cp "./*" edu.stanford.nlp.parser.nndep.DependencyParser -language chinese -model edu/stanford/nlp/models/parser/nndep/CTB_CoNLL_params.txt.gz -tagger.model edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger -escaper edu.stanford.nlp.trees.international.pennchinese.ChineseEscaper -textFile INPUT_FILE

However, the output is not as same as the grammar relations described in the paper Discriminative reordering with Chinese grammatical relations features . If we have two sentences: 1. 我把他打了, 2. 我打了他, the result we obtained is as following:

   SUB(把-2, 我-1)
   root(ROOT-0, 把-2)
   SUB(打了。-4, 他-3)
   VMOD(把-2, 打了。-4)


   SUB(打了-2, 我-1)
   root(ROOT-0, 打了-2)
   OBJ(打了-2, 他。-3)

which is similar to the result outputted from the default English parser.

We referred to the manual and read the source code , and we could not find any clue. Therefore, could anyone please let us know how to set the right parameter to process Chinese data in a right way? Many thanks!

This command will generate the dependencies referenced in that paper:

java -Xmx6g -cp "*:." -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -file sample_chinese_text.txt -props StanfordCoreNLP-chinese.properties -outputFormat text -parse.originalDependencies

The key is using the "parse" annotator instead of "depparse".

Note: StanfordCoreNLP-chinese.properties can be found in the stanford-corenlp-3.5.2-models-chinese.jar or stanford-chinese-corenlp-2015-04-20-models.jar if you want to exam the settings

Note: We distribute some more models that work with the "parse" annotator, and they can be found in stanford-parser-3.5.2-models.jar on Maven or the models jar distributed with the standard parser:

http://nlp.stanford.edu/software/lex-parser.shtml

The issue here is that the NN dependency parser does not output the Stanford Dependencies for Chinese referenced in the paper you are referring to, the NN dependency parser uses a different type of dependencies

Here are some relevant papers that discuss what the NN dependency parser creates:

http://cs.stanford.edu/~danqi/papers/emnlp2014.pdf

http://stp.lingfil.uu.se/nodalida/2007/pdf/NODALIDA16.pdf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM