![](/img/trans.png)
[英]How to use BagOfWordsAnnotation from Stanford NLP parser?
[英]Stanford NLP Parser. How to splitt the Tree?
如果我以首頁為例:
The strongest rain ever recorded in India shut down
the financial hub of Mumbai, snapped communication
lines, closed airports and forced thousands of people
to sleep in their offices or walk home during the night,
officials said today.
斯坦福解析器:
LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
Tree parse = lexicalizedParser.parse(text);
TreePrint treePrint = new TreePrint("penn, typedDependencies");
treePrint.printTree(parse);
提供以下樹:
(ROOT
(S
(S
(NP
(NP (DT The) (JJS strongest) (NN rain))
(VP
(ADVP (RB ever))
(VBN recorded)
(PP (IN in)
(NP (NNP India)))))
(VP
(VP (VBD shut)
(PRT (RP down))
(NP
(NP (DT the) (JJ financial) (NN hub))
(PP (IN of)
(NP (NNP Mumbai)))))
(, ,)
(VP (VBD snapped)
(NP (NN communication) (NNS lines)))
(, ,)
(VP (VBD closed)
(NP (NNS airports)))
(CC and)
(VP (VBD forced)
(NP
(NP (NNS thousands))
(PP (IN of)
(NP (NNS people))))
(S
(VP (TO to)
(VP
(VP (VB sleep)
(PP (IN in)
(NP (PRP$ their) (NNS offices))))
(CC or)
(VP (VB walk)
(NP (NN home))
(PP (IN during)
(NP (DT the) (NN night))))))))))
(, ,)
(NP (NNS officials))
(VP (VBD said)
(NP-TMP (NN today)))
(. .)))
現在,我想根據其結構拆分Tree以獲得子句。 因此,在此示例中,我想將樹拆分成以下部分:
因此,第一個答案是使用遞歸算法來打印所有從根到葉的路徑。
這是我嘗試的代碼:
public static void main(String[] args) throws IOException {
LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
Tree tree = lexicalizedParser.parse("In a ceremony that was conspicuously short on pomp and circumstance at a time of austerity, Felipe, 46, took over from his father, King Juan Carlos, 76.");
printAllRootToLeafPaths(tree, new ArrayList<String>());
}
private static void printAllRootToLeafPaths(Tree tree, List<String> path) {
if(tree != null) {
if(tree.isLeaf()) {
path.add(tree.nodeString());
}
if(tree.children().length == 0) {
System.out.println(path);
} else {
for(Tree child : tree.children()) {
printAllRootToLeafPaths(child, path);
}
}
path.remove(tree.nodeString());
}
}
當然,這段代碼是完全不合邏輯的,因為如果我僅將葉子添加到路徑中,將永遠不會進行遞歸調用,因為葉子沒有子代。 這里的問題是,所有的實詞都是葉子,因此該算法只會打印出葉子中的單個詞:
[The]
[strongest]
[rain]
[ever]
[recorded]
[in]
[India]
[shut]
[down]
[the]
[financial]
[hub]
[of]
[Mumbai]
[,]
[snapped]
[communication]
[lines]
[,]
[closed]
[airports]
[and]
[forced]
[thousands]
[of]
[people]
[to]
[sleep]
[in]
[their]
[offices]
[or]
[walk]
[home]
[during]
[the]
[night]
[,]
[officials]
[said]
[today]
[.]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.