简体   繁体   English

您如何遍历使用Python的elmenttree模块解析的Element树?

[英]How do you walk an Element tree parsed using Python's elmenttree module?

So I spent about half the day yesterday playing around in the interactive python command line trying to figure out how to navigate this ElementTree, and it's confusing the crap out of me. 因此,昨天我花了大约半天的时间在交互式python命令行中玩耍,试图弄清楚如何浏览此ElementTree,这使我很困惑。 As per this site https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree I loaded the tree by doing 根据这个网站https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree我这样做是在加载树

import xml.etree.ElementTree as ET
tree = ET.parse('nmaptest.xml')
root = tree.getroot()

And then I was going through the examples, and trying to figure out how to access and iterate through each element. 然后,我浏览了这些示例,并试图找出如何访问和遍历每个元素。 Just as soon as I think I'm starting to get how it's pieced together, I can't get it to do what I want it to. 我以为我已经开始了解如何将其组合在一起,就无法让它做自己想要的事情。

Ultimately I'd like to parse it and dump the pertinent data into a database for later comparison (or maybe write a script that will simply compare two xml docs, but that's looking beyond my abilities currently.) 最终,我想将其解析并将相关数据转储到数据库中以供以后比较(或者也许编写一个脚本来简单地比较两个xml文档,但是目前看来超出了我的能力范围。)

I've tried things like the following 我已经尝试过以下方法

for host in root.iter('host'):
    print host.attrib['name']
    for address in host.iter('address'):
            print address.attrib['addr']
            for port in host.iter('port'):
                    print port.attrib['portid']

In an attempt to print out the hostname, IP address, and ports open for each... It doesn't quite work, it's almost like hostname and address are in completely different worlds, though I can't see why that would be. 为了打印出每个主机的主机名,IP地址和端口……这不太起作用,这几乎就像主机名和地址位于完全不同的世界中一样,尽管我不知道为什么会这样。 I also found out that you can access the address by simply doing 我还发现,您只需执行以下操作即可访问该地址

print host[1].attrib['addr']

But I can't find any kind of consistency in when things are indexed by an integer, such as above (since host[3] doesn't appear to be hostname like you'd think it would be, logically, and host[2] seems to be hostnames, but has no .attrib or anything), when they're an attribute, and when they're also a dictionary key. 但是在以整数为索引进行索引时,我找不到任何形式的一致性(例如,因为上面的原因,因为host [3]似乎不是主机名,就逻辑上来说,它和host [2] ]似乎是主机名,但没有.attrib或任何东西),当它们是属性时,以及它们也是字典键时。 It seems like sometimes when I THINK I've found what I'm looking for, instead of seeing something like 似乎有时候,当我思考时,我找到了想要的东西,而不是看到类似

for host in root.iter('host'):
    print host[1].attrib

{'addrtype': 'ipv4', 'addr': '10.1.102.255'}

I'll do a .attrib on something and see empty brackets {} like when I do 我将在某物上执行.attrib,然后像我一样看到空括号{}

for host in root.iter('host'):
    print host[2].attrib

So I'm not understanding how it parses the document at all... I don't suppose anyone can help clear it up or point me to some documentation that might help me? 因此,我根本不了解它是如何解析文档的。。。我不认为任何人都可以帮助清除它,或指向我提供一些可能对我有帮助的文档?

Here's a sample entry from the XML output... 这是XML输出中的示例条目...

<host starttime="1408488852" endtime="1408499159"><status state="up" reason="user-set" reason_ttl="0"/>
  <address addr="X.X.X.X" addrtype="ipv4"/>
  <hostnames>
      <hostname name="computername.domainname.com" type="PTR"/>
  </hostnames>
  <ports>
    <extraports state="filtered" count="986">
      <extrareasons reason="no-responses" count="986"/>
    </extraports>
    <port protocol="tcp" portid="X"><state state="open" reason="syn-ack" reason_ttl="127"/>    <service name="X" method="table" conf="3"/></port>
    <port protocol="tcp" portid="X"><state state="open" reason="syn-ack" reason_ttl="127"/>    <service name="X" method="table" conf="3"/></port>
    <port protocol="tcp" portid="X"><state state="open" reason="syn-ack" reason_ttl="127"/>    <service name="X" method="table" conf="3"/></port>
  </ports>
  <times srtt="332" rttvar="164" to="100000"/>
</host>    

With this code, 有了这段代码,

for host in root.iter('host'):
    print host.attrib['name']

you are trying to access the name attribute of the host element. 您正在尝试访问host元素的name属性。 But it is the hostname element that has that attribute. 但是具有该属性的是hostname元素。

Here is one way to get the data that you wanted to extract (assuming that there is one or more host elements as children of a common root element in nmaptest.xml): 这是获取要提取的数据的一种方法(假设nmaptest.xml中有一个或多个host元素作为公共根元素的子元素):

import xml.etree.ElementTree as ET
tree = ET.parse('nmaptest.xml')

hosts = tree.findall(".//host")

for host in hosts:
    for elem in host.iter():
        if elem.tag == "hostname":
            print elem.attrib['name']
        if elem.tag == "address":
            print elem.attrib['addr']
        if elem.tag == "port":
            print elem.attrib['portid']

Output: 输出:

X.X.X.X
computername.domainname.com
X
X
X

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM