簡體   English   中英

斯坦福大學NLP解析器。 如何劈樹?

[英]Stanford NLP Parser. How to splitt the Tree?

如果我以首頁為例:

The strongest rain ever recorded in India shut down 
the financial hub of Mumbai, snapped communication 
lines, closed airports and forced thousands of people 
to sleep in their offices or walk home during the night, 
officials said today.

斯坦福解析器:

LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");

Tree parse = lexicalizedParser.parse(text);
TreePrint treePrint = new TreePrint("penn, typedDependencies");

treePrint.printTree(parse);

提供以下樹:

(ROOT
(S
  (S
    (NP
      (NP (DT The) (JJS strongest) (NN rain))
      (VP
        (ADVP (RB ever))
        (VBN recorded)
        (PP (IN in)
          (NP (NNP India)))))
    (VP
      (VP (VBD shut)
        (PRT (RP down))
        (NP
          (NP (DT the) (JJ financial) (NN hub))
          (PP (IN of)
            (NP (NNP Mumbai)))))
      (, ,)
      (VP (VBD snapped)
        (NP (NN communication) (NNS lines)))
      (, ,)
      (VP (VBD closed)
        (NP (NNS airports)))
      (CC and)
      (VP (VBD forced)
        (NP
          (NP (NNS thousands))
          (PP (IN of)
            (NP (NNS people))))
        (S
          (VP (TO to)
            (VP
              (VP (VB sleep)
                (PP (IN in)
                  (NP (PRP$ their) (NNS offices))))
              (CC or)
              (VP (VB walk)
                (NP (NN home))
                (PP (IN during)
                  (NP (DT the) (NN night))))))))))
  (, ,)
  (NP (NNS officials))
  (VP (VBD said)
    (NP-TMP (NN today)))
  (. .)))

現在,我想根據其結構拆分Tree以獲得子句。 因此,在此示例中,我想將樹拆分成以下部分:

  • 印度有史以來最強降雨
  • 最強的降雨關閉了孟買的金融中心
  • 最強的雨刮斷了通訊線
  • 最強的降雨關閉了機場
  • 最強的降雨迫使成千上萬人在辦公室睡覺
  • 最強的雨水迫使成千上萬的人晚上回家

我怎樣才能做到這一點?


因此,第一個答案是使用遞歸算法來打印所有從根到葉的路徑。

這是我嘗試的代碼:

public static void main(String[] args) throws IOException {
    LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");

    Tree tree = lexicalizedParser.parse("In a ceremony that was conspicuously short on pomp and circumstance at a time of austerity, Felipe, 46, took over from his father, King Juan Carlos, 76.");

    printAllRootToLeafPaths(tree, new ArrayList<String>());
}

private static void printAllRootToLeafPaths(Tree tree, List<String> path) {
    if(tree != null) {
        if(tree.isLeaf()) {
            path.add(tree.nodeString());
        }

        if(tree.children().length == 0) {
            System.out.println(path);
        } else {
            for(Tree child : tree.children()) {
                printAllRootToLeafPaths(child, path);
            }
        }

        path.remove(tree.nodeString());
    }
}

當然,這段代碼是完全不合邏輯的,因為如果我僅將葉子添加到路徑中,將永遠不會進行遞歸調用,因為葉子沒有子代。 這里的問題是,所有的實詞都是葉子,因此該算法只會打印出葉子中的單個詞:

[The]
[strongest]
[rain]
[ever]
[recorded]
[in]
[India]
[shut]
[down]
[the]
[financial]
[hub]
[of]
[Mumbai]
[,]
[snapped]
[communication]
[lines]
[,]
[closed]
[airports]
[and]
[forced]
[thousands]
[of]
[people]
[to]
[sleep]
[in]
[their]
[offices]
[or]
[walk]
[home]
[during]
[the]
[night]
[,]
[officials]
[said]
[today]
[.]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM