[英]Parsing XML Attributes with Python
I am trying to parse out all the green highlighted attributes (some sensitive things have been blacked out), I have a bunch of XML files all with similar formats, I already know how to loop through all of them individually them I am having trouble parsing out the specific attributes though.我正在尝试解析所有绿色突出显示的属性(一些敏感内容已被涂黑),我有一堆格式相似的 XML 文件,我已经知道如何单独遍历所有这些文件我在解析时遇到问题出具体的属性虽然。
I need the text in the attributes: name="text1"
我需要属性中的文本:
name="text1"
from从
project logLevel="verbose" version="2.0" mainModule="Main" name="text1">
destinationDir="/text2"
from destinationDir="/text2"
来自
put label="Put Files" destinationDir="/Trigger/FPDMMT_INBOUND">
destDir="/text3"
from destDir="/text3"
来自
copy disabled="false" version="1.0" label="Archive Files" destDir="/text3" suffix="">
I am using我在用
import csv
import os
import re
import xml.etree.ElementTree as ET
tree = ET.parse(XMLfile_path)
item = tree.getroot()[0]
root = tree.getroot()
print (item.get("name"))
print (root.get("name"))
This outputs:这输出:
Main
text1
The item.get pulls the line at index [0] which is the first line root in the tree which is <module The root.get pulls from the first line <project item.get 拉取索引 [0] 处的行,这是树中的第一行根,即 <module root.get 从第一行 <project 拉取
I know there's a way to search for exactly the right part of the root/tree with something like:我知道有一种方法可以搜索根/树的正确部分,例如:
test = root.find('./project/module/ftp/put')
print (test.get("destinationDir"))
I need to be able to jump directly to the thing I need and output the attributes I need.我需要能够直接跳转到我需要的东西并输出我需要的属性。
Any help would be appreciated任何帮助,将不胜感激
Thanks.谢谢。
Simplified copy of your XML: XML 的简化副本:
xml = '''<project logLevel="verbose" version="2.0" mainModule="Main" name="hidden">
<module name="Main">
<createWorkspace version="1.0"/>
<ftp version="1.0" label="FTP connection to PRD">
<put label="Put Files" destinationDir="destination1">
</put>
</ftp>
<ftp version="1.0" label="FTP connection to PRD">
<put label="Put Files" destinationDir="destination2">
</put>
</ftp>
<copy disabled="false" destDir="destination3">
</copy>
</module>
</project>
'''
# solution using ETree
from xml.etree import ElementTree as ET
root = ET.fromstring(xml)
name = root.get('name')
ftp_destination_dir1 = root.findall('./module/ftp/put')[0].get('destinationDir')
ftp_destination_dir2 = root.findall('./module/ftp/put')[1].get('destinationDir')
copy_destination_dir = root.find('./module/copy').get('destDir')
print(name)
print(ftp_destination_dir1)
print(ftp_destination_dir2)
print(copy_destination_dir)
# solution using lxml
from lxml import etree as et
root = et.fromstring(xml)
name = root.get('name')
ftp_destination_dirs = root.xpath('./module/ftp/put/@destinationDir')
copy_destination_dir = root.xpath('./module/copy/@destDir')[0]
print(name)
print(ftp_destination_dirs[0])
print(ftp_destination_dirs[1])
print(copy_destination_dir)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.