简体   繁体   中英

How can I stop Stanford CoreNLP from segmenting my sentence

I have segmented resources and resources that match my segmented sentences.

How can I stop Stanford CoreNLP from segmenting my sentence before generating the parsing tree?

I am doing works on Chinese.

Your description is not very precise, so I'm not sure if I interpret your question correctly. It sounds like you want to feed the parser a list of tokens without having corenlp doing any tokenisation, right? If so, it would be useful to know which parser you are using. But with both, you can just feed it a list of tokens and corenlp will not jump in and mess up your tokenisation. I haven't worked with the chinese resources, but the following could help you (if you have done tokenisation before already, and splitting on whitespace results in proper tokenisation):

    String sentence = "I can't do that .";
    ArrayList<HasWord> hwl = new ArrayList<HasWord>();
    String[] tokens = sentence.split(" ");
    for (String t : tokens){
     HasWord hw = new Word();
     hw.setWord(t);
     hwl.add(hw);
    }
    LexicalizedParser lexParser = LexicalizedParser.loadModel("<path to chinese lex parsing here>","-maxLength", "70");
    Tree cTree = lexParser.parse(hwl);
    System.out.println("c tree:" + cTree);


    DependencyParser parser = DependencyParser.loadFromModelFile("<chinese model for dep parsing here>");
    MaxentTagger tagger = new MaxentTagger("<path to your tagger file goes here");
    List<TaggedWord> tagged = tagger.tagSentence(hwl);
    GrammaticalStructure gs = parser.predict(tagged);
    System.out.println("dep tree:" + gs.typedDependencies());

Deleting the stderr lines that are written, this results in:

c tree:(ROOT (S (MPN (FM I) (FM can't)) (VVFIN do) (ADJD that) ($. .)))
dep tree:[nsubj(can't-2, I-1), root(ROOT-0, can't-2), xcomp(can't-2, do-3), dobj(do-3, that-4), punct(can't-2, .-5)]

hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM