简体   繁体   English

如何使用python从XML文件中仅解析和获取所需的XML元素?

[英]How to parse and fetch only the desired XML elements from an XML file using python?

I have an XML file which looks like this: 我有一个看起来像这样的XML文件:

<rpc-reply xmlns:junos="http://xml.juniper.net/junos/15.1R5/junos">
    <vlan-information xmlns="http://xml.juniper.net/junos/15.1R5/junos-esw" junos:style="brief">
        <vlan-terse/>
        <vlan>
            <vlan-instance>0</vlan-instance>
            <vlan-name>ACRS-Dev2</vlan-name>
            <vlan-create-time>Fri Jan  1 00:37:59 2010
            </vlan-create-time>
            <vlan-status>Enabled</vlan-status>
            <vlan-owner>static</vlan-owner>
            <vlan-tag>0</vlan-tag>
            <vlan-index>2</vlan-index>
            <vlan-l3-interface>vlan.15 (UP)</vlan-l3-interface>
            <vlan-l3-interface-address>10.8.25.1/24</vlan-l3-interface-address>
            <vlan-protocol-port>Port Mode</vlan-protocol-port>
            <vlan-members-count>7</vlan-members-count>
            <vlan-members-upcount>6</vlan-members-upcount>
        </vlan>
        <vlan>
            <vlan-instance>0</vlan-instance>
            <vlan-name>default</vlan-name>
            <vlan-create-time>Fri Jan  1 00:37:59 2010
            </vlan-create-time>
            <vlan-status>Enabled</vlan-status>
            <vlan-owner>static</vlan-owner>
            <vlan-tag>0</vlan-tag>
            <vlan-index>3</vlan-index>
            <vlan-l3-interface>vlan.11 (UP)</vlan-l3-interface>
            <vlan-l3-interface-address>10.8.27.1/24</vlan-l3-interface-address>
            <vlan-protocol-port>Port Mode</vlan-protocol-port>
            <vlan-members-count>12</vlan-members-count>
            <vlan-members-upcount>2</vlan-members-upcount>
        </vlan>
    </vlan-information>
</rpc-reply>

From this, I only want the <vlan-name> and <vlan-l3-interface-address> tags which are to be parsed and saved in a dict/json like variable with it's format being: 由此,我只希望将<vlan-name><vlan-l3-interface-address>标记解析并保存在dict / json之类的变量中,其格式为:

{'Vlan-Name' : vlan_name, 'Interface-Address' : interface_addr}

and then add these dict/json for each element in a list of dicts/json. 然后为dicts / json列表中的每个元素添加这些dict / json。 This is my code for parsing and insertion of the json in list: 这是我用于解析和插入列表中的json的代码:

root = tree.getroot()
nw_pool = []
nw_json = {}
for child in root:
    for items in child:
        for item1 in items:
            if 'vlan-l3-interface-address' in item1.tag:
                interface_addr = item1.text
                nw_json['Interface-Address'] = interface_addr
            elif 'vlan-name' in item1.tag:
                vlan_name = item1.text
                nw_json['Vlan-Name'] = vlan_name
                nw_pool.append(nw_json)
print(nw_pool)

But when I print the nw_pool , it gives me an output where the json of the last element found is repeated instead of giving me distinct dicts for each element. 但是当我打印nw_pool ,它给了我一个输出,在该输出中重复找到的最后一个元素的json,而不是为每个元素提供不同的命令。

Output: 输出:

[{'Vlan-Name': 'default', 'Interface-Address': '10.8.27.1/24'}, {'Vlan-Name': 'default', 'Interface-Address': '10.8.27.1/24'}]

Whereas my desired output is: 而我想要的输出是:

[{'Vlan-Name': 'ACRS-Dev2', 'Interface-Address': '10.8.25.1/24'}, {'Vlan-Name': 'default', 'Interface-Address': '10.8.27.1/24'}] 

Can somebody help me with this? 有人可以帮我吗? Thanks in advance. 提前致谢。

You are overwriting the existing dict, while you need a new one for every iteration. 您将覆盖现有字典,而每次迭代都需要一个新字典。 So, you need to put nw_json = {} in another place: 因此,您需要将nw_json = {}放在另一个位置:

root = tree.getroot()
nw_pool = []
for child in root:
    for items in child:
        nw_json = {}   # Work with new dict
        for item1 in items:
            if 'vlan-l3-interface-address' in item1.tag:
                interface_addr = item1.text
                nw_json['Interface-Address'] = interface_addr
            elif 'vlan-name' in item1.tag:
                vlan_name = item1.text
                nw_json['Vlan-Name'] = vlan_name
                nw_pool.append(nw_json)
print(nw_pool)

The problem in your code is you have initiated the dict() object prior to the loop so the data has been overwritten in the flow. 代码中的问题是您在循环之前启动了dict()对象,因此流中的数据已被覆盖。

@Hoenie's answer gives clarity about your mistake. @Hoenie的答案可以使您清楚地知道自己的错误。

Adding to that, I would suggest you to try BeautifulSoup for parsing XML as it is simple and easy to understand. 除此之外,我建议您尝试使用BeautifulSoup解析XML,因为它简单易懂。 Try the below code. 试试下面的代码。

from bs4 import BeautifulSoup

fileObj = open('test.xml').read()
soup = BeautifulSoup(fileObj, 'lxml')
vlans = soup.findAll('vlan')
nw_pool = []
for vlan in vlans:
    nw_json = dict()
    nw_json['Interface-Address'] = vlan.find('vlan-l3-interface-address').text
    nw_json['Vlan-Names'] = vlan.find('vlan-name').text
    nw_pool.append(nw_json)
print(nw_pool) # O/P [{'Interface-Address': '10.8.25.1/24', 'Vlan-Names': 'ACRS-Dev2'}, {'Interface-Address': '10.8.27.1/24', 'Vlan-Names': 'default'}]

Cheers! 干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM