[英]Writing with lxml emitting no whitespace even when pretty_print=True
I'm using the lxml
library to read an xml template, insert/change some elements, and save the resulting xml. 我正在使用
lxml
库读取xml模板,插入/更改一些元素,然后保存生成的xml。 One of the elements which I'm creating on the fly using the etree.Element
and etree.SubElement
methods: 我正在使用
etree.Element
和etree.SubElement
方法etree.Element
创建的元素之一:
tree = etree.parse(r'xml_archive\templates\metadata_template_pts.xml')
root = tree.getroot()
stream = []
for element in root.iter():
if isinstance(element.tag, basestring):
stream.append(element.tag)
# Find "keywords" element and insert a new "theme" element
if element.tag == 'keywords' and 'theme' not in stream:
theme = etree.Element('theme')
themekt = etree.SubElement(theme, 'themekt').text = 'None'
for tk in themekeys:
themekey = etree.SubElement(theme, 'themekey').text = tk
element.insert(0, theme)
prints to the screen nicely print etree.tostring(theme, pretty_print=True)
: 打印到屏幕上很好地
print etree.tostring(theme, pretty_print=True)
:
<theme>
<themekt>None</themekt>
<themekey>Hydrogeology</themekey>
<themekey>Stratigraphy</themekey>
<themekey>Floridan aquifer system</themekey>
<themekey>Geology</themekey>
<themekey>Regional Groundwater Availability Study</themekey>
<themekey>USGS</themekey>
<themekey>United States Geological Survey</themekey>
<themekey>thickness</themekey>
<themekey>altitude</themekey>
<themekey>extent</themekey>
<themekey>regions</themekey>
<themekey>upper confining unit</themekey>
<themekey>FAS</themekey>
<themekey>base</themekey>
<themekey>geologic units</themekey>
<themekey>geology</themekey>
<themekey>extent</themekey>
<themekey>inlandWaters</themekey>
</theme>
However, when using etree.ElementTree(root).write(out_xml_file, method='xml', pretty_print=True)
to write out the xml, this element gets flattened in the output file: 但是,当使用
etree.ElementTree(root).write(out_xml_file, method='xml', pretty_print=True)
写出xml时,此元素在输出文件中变平:
<theme><themekt>None</themekt><themekey>Hydrogeology</themekey><themekey>Stratigraphy</themekey><themekey>Floridan aquifer system</themekey><themekey>Geology</themekey><themekey>Regional Groundwater Availability Study</themekey><themekey>USGS</themekey><themekey>United States Geological Survey</themekey><themekey>thickness</themekey><themekey>altitude</themekey><themekey>extent</themekey><themekey>regions</themekey><themekey>upper confining unit</themekey><themekey>FAS</themekey><themekey>base</themekey><themekey>geologic units</themekey><themekey>geology</themekey><themekey>extent</themekey><themekey>inlandWaters</themekey></theme>
The rest of the file is written nicely, but this particular element is causing (purely aesthetic) trouble. 文件的其余部分写得很好,但是此特定元素引起了(纯粹是美观的)麻烦。 Any ideas of what I'm doing wrong?
关于我在做什么错的任何想法?
Below is a snippet of markup from the template xml file (save this as "template.xml" to run with code snippet at bottom). 以下是模板xml文件中的标记片段(将其另存为“ template.xml”,以便在底部使用代码片段运行)。 The flattening of tags only occurs when I parse an existing file and insert a new element, not when the xml is created from scratch using
lxml
. 仅当我解析现有文件并插入新元素时才进行标签的拼合,而不是使用
lxml
从头开始创建xml时才进行。
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="fgdc_classic.xsl"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://water.usgs.gov/GIS/metadata/usgswrd/fgdc-std-001-1998.xsd">
<keywords>
<theme>
<themekt>ISO 19115 Topic Categories</themekt>
<themekey>environment</themekey>
<themekey>geoscientificInformation</themekey>
<themekey>inlandWaters</themekey>
</theme>
<place>
<placekt>None</placekt>
<placekey>Florida</placekey>
<placekey>Georgia</placekey>
<placekey>Alabama</placekey>
<placekey>South Carolina</placekey>
</place>
</keywords>
</metadata>
Below is a snippet of code to be used with the snippet of markup (above): 以下是要与标记代码段一起使用的代码段(以上):
# Create new theme element to insert into root
themekeys = ['Hydrogeology', 'Stratigraphy', 'inlandWaters']
tree = etree.parse(r'template.xml')
root = tree.getroot()
stream = []
for element in root.iter():
if isinstance(element.tag, basestring):
stream.append(element.tag)
# Edit theme keywords
if element.tag == 'keywords':
theme = etree.Element('theme')
themekt = etree.SubElement(theme, 'themekt').text = 'None'
for tk in themekeys:
themekey = etree.SubElement(theme, 'themekey').text = tk
element.insert(0, theme)
# Write XML to new file
out_xml_file = 'test.xml'
etree.ElementTree(root).write(out_xml_file, method='xml', pretty_print=True)
with open(out_xml_file, 'r') as f:
lines = f.readlines()
with open(out_xml_file, 'w') as f:
f.write('<?xml version="1.0" encoding="UTF-8"?>\n')
for line in lines:
f.write(line)
If you replace this line: 如果替换此行:
tree = etree.parse(r'template.xml')
with these lines: 这些行:
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(r'template.xml', parser)
then it will work as expected. 然后它将按预期工作。 The trick is to use an XMLParser that has the
remove_blank_text
option set to True
. 诀窍是使用XMLParser ,该XMLParser的
remove_blank_text
选项设置为True
。 Any existing ignorable whitespace will be removed and will therefore not disrupt the subsequent pretty-printing. 任何现有的可忽略空白都将被删除,因此不会中断后续的漂亮打印。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.