简体   繁体   English

python lxml遍历所有标签

[英]python lxml loop through all tags

I have a dict mapping each xml tag to a dict key. 我有一个字典,将每个xml标签映射到字典键。 I want to loop through each tag and text field in the xml, and compare it with the associated dict key value which is the key in another dict. 我想遍历xml中的每个标记和文本字段,并将其与关联的dict键值进行比较,该键值是另一个dict中的键。

<2gMessage>
    <Request>
        <pid>daemon</pid>
        <emf>123456</emf>
        <SENum>2041788209</SENum>
        <MM>
            <MID>jbr1</MID>
            <URL>http://jimsjumbojoint.com</URL>
        </MM>
        <AppID>reddit</AppID>
        <CCS>
            <Mode>
                <SomeDate>true</CardPresent>
                <Recurring>false</Recurring>
            </Mode>
            <Date>
                <ASCII>B4788250000028291^RRR^15121015432112345601</ASCII>
            </Date>
            <Amount>100.00</Amount>
        </CCS>
    </Request>
</2gMessage>

The code I have so far: 我到目前为止的代码:

parser = etree.XMLParser(ns_clean=True, remove_blank_text=True)
tree   = etree.fromstring(strRequest, parser)
for tag in tree.xpath('//Request'):
    subfields = tag.getchildren()
    for subfield in subfields:
        print (subfield.tag, subfield.text)
return strRequest

But, this only prints the tags which are direct children of Request, I want to be able to access the subchildren on children if it is an instance in the same loop. 但是,这仅打印作为Request的直接子代的标签,如果它是同一循环中的实例,我希望能够访问子代上的子代。 I don't want to hardcode values, as the tags and structure could be changed. 我不想对值进行硬编码,因为标签和结构可以更改。

You could try with iter() function. 您可以尝试使用iter()函数。 It will traverse through all the children elements. 它将遍历所有子元素。 The comparison of the length is to print only those that has no children: 长度的比较是仅打印没有子项的那些:

A complete script like this one: 像这样的完整脚本:

from lxml import etree
tree = etree.parse('xmlfile')
for tag in tree.iter():
    if not len(tag):
        print (tag.tag, tag.text)

Yields: 产量:

pid daemon
emf 123456
SENum 2041788209
MID jbr1
URL http://jimsjumbojoint.com
AppID reddit
CardPresent true
Recurring false
ASCII B4788250000028291^RRR^15121015432112345601
Amount 100.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM