[英]How to parse xml elements to python from a very large xml file?
I am currently working on a program that has 20 or so scripts and can be called from one python file that uses the subprocess library to call these scripts. 我目前正在开发一个包含20个左右脚本的程序,可以从一个使用子进程库调用这些脚本的python文件中调用该程序。 Each script has 3 parameters in which the user mus currently enter using argparse: the ip address, the username, and the password.
每个脚本都有3个参数,用户当前可以使用argparse在其中输入:ip地址,用户名和密码。 These scripts automate the testing of networking devices and such.
这些脚本可以自动测试网络设备等。
Now instead of having the user enter these parameters on the command line, I want to extract these values from an XML file that has about 5,000 lines of code that my company has generated. 现在,我不想让用户在命令行上输入这些参数,而是想从一个XML文件中提取这些值,该XML文件包含我公司生成的大约5,000行代码。 What is the best way I can extract the info I need so the user doesn't have to manually type in the parameters ?
我可以提取所需信息的最佳方法是什么,以便用户不必手动输入参数?
I have done some research and unfortunately I am not able to understand the best way to do this. 我已经进行了一些研究,但不幸的是我无法理解执行此操作的最佳方法。 Here is a sample excerpt of the xml file:
这是xml文件的样本摘录:
<sheet>
<name>7_managementHosts</name>
<data>
<name>MgtHosts</name>
<key>
<name>Rack U-Location</name>
<value>U30</value>
<value>U29</value>
<value>U28</value>
</key>
<key>
<name>Default Component Name</name>
<value>sms01</value>
<value>sms02</value>
<value>sms03</value>
</key>
<key>
<name>DNS hostname (FQDN)</name>
<value>sms01.de1000.local</value>
<value>sms02.de1000.local</value>
<value>sms03.de1000.local</value>
</key>
<key>
<name>DNS suffix for management interface</name>
<value>de1000.local</value>
<value>de1000.local</value>
<value>de1000.local</value>
</key>
<key>
<name>Keyboard layout</name>
<value>US Default</value>
<value>US Default</value>
<value>US Default</value>
</key>
<key>
<name>root user password</name>
<value>myPassword</value>
<value>myPassword</value>
<value>myPassword</value>
</key>
It is a really long XML file but the tree is like this and I really don't know the best way to go about this. 这是一个非常长的XML文件,但是树是这样的,我真的不知道实现此目标的最佳方法。 Thanks for the help !
谢谢您的帮助 !
Using python standard XML lib (And assuming you would like to collect the data under 'key' element) 使用python 标准XML库 (并假设您想在'key'元素下收集数据)
import xml.etree.ElementTree as ET
import pprint
xml = '''<sheet>
<name>7_managementHosts</name>
<data>
<name>MgtHosts</name>
<key>
<name>Rack U-Location</name>
<value>U30</value>
<value>U29</value>
<value>U28</value>
</key>
<key>
<name>Default Component Name</name>
<value>sms01</value>
<value>sms02</value>
<value>sms03</value>
</key>
<key>
<name>DNS hostname (FQDN)</name>
<value>sms01.de1000.local</value>
<value>sms02.de1000.local</value>
<value>sms03.de1000.local</value>
</key>
<key>
<name>DNS suffix for management interface</name>
<value>de1000.local</value>
<value>de1000.local</value>
<value>de1000.local</value>
</key>
<key>
<name>Keyboard layout</name>
<value>US Default</value>
<value>US Default</value>
<value>US Default</value>
</key>
<key>
<name>root user password</name>
<value>myPassword</value>
<value>myPassword</value>
<value>myPassword</value>
</key>
</data>
</sheet>'''
data = {}
root = ET.fromstring(xml)
keys = root.findall('.//data/key')
for key in keys:
data[key.find('name').text] = [v.text for v in key.findall('value')]
pprint.pprint(data)
output 产量
{'DNS hostname (FQDN)': ['sms01.de1000.local',
'sms02.de1000.local',
'sms03.de1000.local'],
'DNS suffix for management interface': ['de1000.local',
'de1000.local',
'de1000.local'],
'Default Component Name': ['sms01', 'sms02', 'sms03'],
'Keyboard layout': ['US Default', 'US Default', 'US Default'],
'Rack U-Location': ['U30', 'U29', 'U28'],
'root user password': ['myPassword', 'myPassword', 'myPassword']}
Example with BeautifulSoup
, just to get you started with the module: BeautifulSoup
示例,只是让您开始使用该模块:
data = '''
<sheet>
<name>7_managementHosts</name>
<data>
<name>MgtHosts</name>
<key>
<name>Rack U-Location</name>
<value>U30</value>
<value>U29</value>
<value>U28</value>
</key>
<key>
<name>Default Component Name</name>
<value>sms01</value>
<value>sms02</value>
<value>sms03</value>
</key>
<key>
<name>DNS hostname (FQDN)</name>
<value>sms01.de1000.local</value>
<value>sms02.de1000.local</value>
<value>sms03.de1000.local</value>
</key>
<key>
<name>DNS suffix for management interface</name>
<value>de1000.local</value>
<value>de1000.local</value>
<value>de1000.local</value>
</key>
<key>
<name>Keyboard layout</name>
<value>US Default</value>
<value>US Default</value>
<value>US Default</value>
</key>
<key>
<name>root user password</name>
<value>myPassword</value>
<value>myPassword</value>
<value>myPassword</value>
</key>
'''
from bs4 import BeautifulSoup
data = BeautifulSoup(data, 'lxml')
parsed = [[v.text for v in key.select('name, value')] for key in data.select('key')]
# just for pretty printing, all the data are in `parsed` variable
from textwrap import shorten
for row_num, row in enumerate(zip(*parsed), 0):
if row_num == 0:
print(''.join('{: ^25}'.format(shorten(d, 25)) for d in ['Row Number'] + list(row)))
else:
print(''.join('{: ^25}'.format(shorten(d, 25)) for d in [str(row_num)] + list(row)))
Prints: 打印:
Row Number Rack U-Location Default Component Name DNS hostname (FQDN) DNS suffix for [...] Keyboard layout root user password
1 U30 sms01 sms01.de1000.local de1000.local US Default myPassword
2 U29 sms02 sms02.de1000.local de1000.local US Default myPassword
3 U28 sms03 sms03.de1000.local de1000.local US Default myPassword
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.