[英]How to keep punctuation in Stanford dependency parser
我正在使用斯坦福 CoreNLP(01.2016 版本),我想在依賴關系中保留標點符號。 當您從命令行運行它時,我找到了一些方法來執行此操作,但是我沒有找到有關提取依賴關系的 java 代碼的任何內容。
這是我當前的代碼。 它有效,但不包含標點符號:
Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.put("parse.model", modelPath );
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,
"-maxLength", "200", "-retainTmpSubcategories");
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree parse = lp.apply(words);
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection<TypedDependency> td = gs.typedDependencies();
parsedText += td.toString() + "\n";
任何類型的依賴關系對我來說都可以,基本的、打字的、折疊的等等。我只想包括標點符號。
提前致謝,
您在這里做了很多額外的工作,因為您通過 CoreNLP 運行解析器一次,然后再次調用lp.apply(words)
。
獲取帶有標點符號的依賴樹/圖的最簡單方法是使用 CoreNLP 選項parse.keepPunct
如下。
Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.setProperty("parse.model", modelPath);
props.setProperty("parse.keepPunct", "true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
for (CoreMap sentence : sentences) {
//Pick whichever representation you want
SemanticGraph basicDeps = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
SemanticGraph collapsed = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
SemanticGraph ccProcessed = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
}
句子注釋對象將依賴樹/圖存儲為SemanticGraph
。 如果您需要TypedDependency
對象的列表,請使用typedDependencies()
方法。 例如,
List<TypedDependency> dependencies = basicDeps.typedDependencies();
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.