简体   繁体   English

使用xpath获取相应的XML节点

[英]Get the corresponding XML nodes with xpath

I have a XML file (actually is a xliff file) where a node has 2 children nodes with identical substructure (which is not known a priori, can be very complex and changes for each <trans-unit> ). 我有一个XML文件(实际上是一个xliff文件),其中一个节点有2个具有相同子结构的子节点(先验不知道,可能非常复杂并且每个<trans-unit>都有变化)。 I'm working with python and lxml library... Example: 我正在使用python和lxml库...示例:

<trans-unit id="tu4" xml:space="preserve">
    <seg-source>
        <mrk mid="0" mtype="seg">
            <g id="1">...</g>
            <g id="2">...</g>
            <g id="3">...</g>
            <bx id="7"/>...
        </mrk>
        <mrk mid="1" mtype="seg">...</mrk>
        <mrk mid="2" mtype="seg">...
            <ex id="7"/>
            <g id="8"> FROM HERE </g>
        </mrk>
   </seg-source>
   <target xml:lang="en">
        <mrk mid="0" mtype="seg">
            <g id="1">...</g>
            <g id="2">...</g>
            <g id="3">...</g>
            <bx id="7"/>...
        </mrk>
        <mrk mid="1" mtype="seg">...</mrk>
        <mrk mid="2" mtype="seg">...
            <ex id="7"/>
            <g id="8"> TO HERE </g>
        </mrk>
   </target>
</trans-unit>

As you can see, the 2 nodes <seg-source> and <target> have exactly the same sub-structure. 如您所见,2个节点<seg-source><target>具有完全相同的子结构。 My goal is to navigate to each node of <seg-source> , get the text and the tail of that node (and I know how to do that with xpath), translate them and finally (and THIS IS what I don't know how to do) assign to the corresponding node in the <target> the translation... 我的目标是导航到<seg-source>每个节点,获取该节点的文本和尾部(我知道如何使用xpath),翻译它们,最后(这就是我不知道的)怎么办)分配到<target>相应节点的翻译...

In other words... suppose I get the node "FROM HERE"... how can I get the node "TO HERE"?. 换句话说......假设我得到节点“从这里”......我怎么能得到节点“到这里”?

if you want to pair them all you could just zip the nodes together so you can access the matching codes from each: 如果你想将它们配对,你只需将节点压缩在一起就可以从每个节点访问匹配的代码:

from lxml import etree

tree = etree.fromstring(x)
nodes = iter(tree.xpath("//*[self::seg-source or self::target]"))
for seq, tar in zip(nodes, nodes):
    # each node will be the matching nodes from each seq-source and target
    print(seq.xpath(".//*"))
    print(tar.xpath(".//*"))

Since there are only two in any/each trans-unit you can just use nodes = iter(tree.xpath("//trans-unit/*")) so the names of the nodes inside don't matter. 由于在任何/每个trans-unit中只有两个,您可以使用nodes = iter(tree.xpath("//trans-unit/*"))因此内部节点的名称无关紧要。

nodes = iter(tree.xpath("/trans-unit/*"))
for seq, tar in zip(nodes, nodes):
    print(seq.xpath(".//*"))
    print(tar.xpath(".//*"))

If we run the code on your sample and print each id node you can see the output gets one from each: 如果我们在您的示例上运行代码并打印每个id节点,您可以看到输出从每个节点获取一个:

In [2]: from lxml import etree

In [3]: tree = etree.fromstring(x)

In [4]: nodes = iter(tree.xpath("//trans-unit/*"))

In [5]: for seq, tar in zip(nodes, nodes):
   ...:         print(seq.xpath(".//g[@id='8']/text()"))
   ...:         print(tar.xpath(".//g[@id='8']/text()"))
   ...:     
[' FROM HERE ']
[' TO HERE ']

Each node is the corresponding node from each child of trans-unit: 每个节点都是来自反式单元的每个子节点的对应节点:

In [7]: for seq, tar in zip(nodes, nodes):
   ...:         print(seq.tag, tar.tag)
   ...:         for n1, n2 in zip(seq.xpath(".//*"),tar.xpath(".//*")):
   ...:                 print(n1.tag, n2.tag)
   ...:         
('seg-source', 'target')
('mrk', 'mrk')
('g', 'g')
('g', 'g')
('g', 'g')
('bx', 'bx')
('mrk', 'mrk')
('mrk', 'mrk')
('ex', 'ex')
('g', 'g')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM