在Python中從XML創建鄰接矩陣

Question

我正在嘗試創建t_lemma的鄰接矩陣（可以忽略其他元素（例如， 節點類型 ，ord等），出於完全考慮的目的，我將它們包括在內），這意味着t_lemma是其中的父級-從此表示對（捷克語）句子進行句法分析的XML文檔，其中t_lemma表示特定單詞的中性形狀。

目前，我正在使用Python的cElementTree庫，但是如果我要求的內容是不可能的，或者使用cElementTree很難實現計算時，我願意使用其他方法

<t_tree id="t_tree-cs-s1-root">
    <atree.rf>a_tree-cs-s1-root</atree.rf>
    <ord>0</ord>
    <children id="t_tree-cs-s1-n107">
        <children>
            <LM id="t_tree-cs-s1-n108">
                <nodetype>complex</nodetype>
                <ord>1</ord>
                <t_lemma>muž</t_lemma>
                <functor>ACT</functor>
                <formeme>n:1</formeme>
                <is_clause_head>0</is_clause_head>
                <clause_number>1</clause_number>
                <a>
                    <lex.rf>a_tree-cs-s1-n1</lex.rf>
                </a>
                <gram>
                    <sempos>n.denot</sempos>
                    <gender>anim</gender>
                    <number>sg</number>
                    <negation>neg0</negation>
                </gram>
            </LM>
            <LM id="t_tree-cs-s1-n109">
                <nodetype>complex</nodetype>
                <ord>3</ord>
                <t_lemma>strom</t_lemma>
                <functor>PAT</functor>
                <formeme>n:4</formeme>
                <is_clause_head>0</is_clause_head>
                <clause_number>1</clause_number>
                <a>
                    <lex.rf>a_tree-cs-s1-n3</lex.rf>
                </a>
                <gram>
                    <sempos>n.denot</sempos>
                    <gender>inan</gender>
                    <number>sg</number>
                    <negation>neg0</negation>
                </gram>
            </LM>
        </children>
        <nodetype>complex</nodetype>
        <ord>2</ord>
        <t_lemma>zasadit</t_lemma>
        <functor>PRED</functor>
        <formeme>v:fin</formeme>
        <sentmod>enunc</sentmod>
        <is_clause_head>1</is_clause_head>
        <clause_number>1</clause_number>
        <a>
            <lex.rf>a_tree-cs-s1-n2</lex.rf>
        </a>
        <gram>
            <sempos>v</sempos>
            <verbmod>ind</verbmod>
            <deontmod>decl</deontmod>
            <tense>ant</tense>
            <aspect>cpl</aspect>
            <resultative>res0</resultative>
            <dispmod>disp0</dispmod>
            <iterativeness>it0</iterativeness>
            <negation>neg0</negation>
            <diathesis>act</diathesis>
        </gram>
    </children>
</t_tree>

該XML表示的是一棵看起來像這樣的樹：

t_tree

我想要得到的是一個像這樣的矩陣。

        muž     strom    zasadit
muž     1       0       -1

storm   0       1       -1

zasadit 1       1       1

Answer 1

我想出了一個可以在測試過的大樹上使用的答案，盡管我不得不考慮元素<ord> -表示句子中單詞的順序-消除了出現以下情況時會出現的問題像這樣的句子： “男人和女人，日夜行走。”

       walking
      /       \
   and        and
  /   \       /  \
man  woman  day  night

僅考慮<t_lemma>會導致對(child->parent)函數的解釋不清楚，即：我們將有兩個和 s分別指向這兩個詞： 男人，女人，白天，黑夜，如下所示：

element  parent
_______________
man      and
woman    and
day      and
night    and
and      walking
and      walking

上一張表變成了以下表格：

element  parent
_______________
man:1    and:2
woman:3  and:2
day:5    and:6
night:7  and:6
and:2    walking:4
and:6    walking:4

因此，這是功能性的Python代碼：

parentDictionary = {}
def getchildlemma(element, parent):
    for i in element.findall("*"):
        if i.tag == "t_lemma":
            e = i.text

            for i in element.findall("*"):
                if i.tag == "ord":
                    e = e +":"+ i.text
            parentDictionary[e] = parent
            parent = e
        else:
            e = parent
    for i in element.findall("*"):
        if i.tag == "children" or i.tag == "LM":
            getchildlemma(i,parent)

在Python中從XML創建鄰接矩陣

問題描述

1 個解決方案

解決方案1
0 2019-01-31 14:20:41

在Python中從XML創建鄰接矩陣

問題描述

1 個解決方案

解決方案1 0 2019-01-31 14:20:41

解決方案1
0 2019-01-31 14:20:41