簡體   English   中英

如何在斯坦福依賴解析器中保留標點符號

[英]How to keep punctuation in Stanford dependency parser

我正在使用斯坦福 CoreNLP(01.2016 版本),我想在依賴關系中保留標點符號。 當您從命令行運行它時,我找到了一些方法來執行此操作,但是我沒有找到有關提取依賴關系的 java 代碼的任何內容。

這是我當前的代碼。 它有效,但不包含標點符號:

Annotation document = new Annotation(text);

        Properties props = new Properties();

        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");

        props.setProperty("ssplit.newlineIsSentenceBreak", "always");

        props.setProperty("ssplit.eolonly", "true");

        props.setProperty("pos.model", modelPath1);

        props.put("parse.model", modelPath );

        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        pipeline.annotate(document);

        LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,

                "-maxLength", "200", "-retainTmpSubcategories");

        TreebankLanguagePack tlp = new PennTreebankLanguagePack();

        GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();

        List<CoreMap> sentences = document.get(SentencesAnnotation.class);

        for (CoreMap sentence : sentences) {

            List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);               

            Tree parse = lp.apply(words);

            GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
            Collection<TypedDependency> td = gs.typedDependencies();

            parsedText += td.toString() + "\n";

任何類型的依賴關系對我來說都可以,基本的、打字的、折疊的等等。我只想包括標點符號。

提前致謝,

您在這里做了很多額外的工作,因為您通過 CoreNLP 運行解析器一次,然后再次調用lp.apply(words)

獲取帶有標點符號的依賴樹/圖的最簡單方法是使用 CoreNLP 選項parse.keepPunct如下。

Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.setProperty("parse.model", modelPath);
props.setProperty("parse.keepPunct", "true");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

pipeline.annotate(document);

for (CoreMap sentence : sentences) {
   //Pick whichever representation you want
   SemanticGraph basicDeps = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
   SemanticGraph collapsed = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
   SemanticGraph ccProcessed = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
}

句子注釋對象將依賴樹/圖存儲為SemanticGraph 如果您需要TypedDependency對象的列表,請使用typedDependencies()方法。 例如,

List<TypedDependency> dependencies = basicDeps.typedDependencies();

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM