简体   繁体   English

python-如何将空树节点作为空字符串写入xml文件

[英]python - how to write empty tree node as empty string to xml file

I want to remove elements of a certain tag value and then write out the .xml file WITHOUT any tags for those deleted elements; 我想删除某个标签值的元素,然后写出.xml文件,而对于那些删除的元素不添加任何标签; is my only option to create a new tree? 是创建新树的唯一选择?

There are two options to remove/delete an element: 有两个选项可以删除/删除元素:

clear() Resets an element. clear()重置一个元素。 This function removes all subelements, clears all attributes, and sets the text and tail attributes to None. 此函数删除所有子元素,清除所有属性,并将text和tail属性设置为None。

At first I used this and it works for the purpose of removing the data from the element but I'm still left with an empty element: 最初,我使用了它,它的目的是从元素中删除数据 ,但我仍然留下一个空元素:

# Remove all elements from the tree that are NOT "job" or "make" or "build" elements
log = open("debug.log", "w")
for el in root.iter(*):

    if el.tag != "job" and el.tag != "make" and el.tag != "build":
        print("removed = ", el.tag, el.attrib, file=log)
        el.clear()
    else:
        print("NOT", el.tag, el.attrib, file=log)

log.close()
tree.write("make_and_job_tree.xml", short_empty_elements=False)

The problem is that xml.etree.ElementTree.ElementTree.write() still writes out empty tags no matter what: 问题是xml.etree.ElementTree.ElementTree.write()xml.etree.ElementTree.ElementTree.write() 仍会写出空标记:

...The keyword-only short_empty_elements parameter controls the formatting of elements that contain no content. ...仅关键字short_empty_elements参数控制不包含任何内容的元素的格式。 If True (the default), they are emitted as a single self-closed tag , otherwise they are emitted as a pair of start/end tags . 如果为True(默认值),则将它们作为单个自闭标签发出,否则将作为一对开始/结束标签发出。

Why isn't there an option to just not print out those empty tags! 为什么不选择不打印那些空标签! Whatever. 随你。

So then I thought I might try 所以我想我可以尝试

remove(subelement) Removes subelement from the element. remove(subelement)元素中移除子元素。 Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents. 与find *方法不同,此方法根据实例标识而不是标签值或内容来比较元素。

But this only operates on the child elements. 但这仅对子元素起作用。

So I'd have to do something like : 所以我必须做些类似的事情

for el in root.iter(*):
    for subel in el:
        if subel.tag != "make" and subel.tag != "job" and subel.tag != "build":
            el.remove(subel)

But there's a big problem here: I'm invalidating the iterator by removing elements, right? 但是这里有一个大问题:我通过删除元素来使迭代器无效,对吗?

Is it enough to simply check if the element is empty by adding if subel ?: 通过添加if subel来简单地检查元素是否为空if subel

if subel and subel.tag != "make" and subel.tag != "job" and subel.tag != "build"

Or do I have to get a new iterator to the tree elements every time I invalidate it? 还是我每次使树元素无效时都必须为其添加新的迭代器?

Remember: I just wanted to write out the xml file with no tags for the empty elements. 记住:我只是想写出没有空元素标签的xml文件。

Here's an example. 这是一个例子。

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

Let's say I want to remove any mention of neighbor . 假设我要删除任何关于neighbor Ideally, I'd want this output after the removal: 理想情况下,删除后我希望此输出:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
    </country>
</data>

Problem, is when I run the code using clear() (see first code block up above) and write it to a file, I get this: 问题是,当我使用clear()运行代码(请参见上面的第一个代码块)并将其写入文件时,我得到了:

<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor></neighbor><neighbor></neighbor></country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor></neighbor></country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor></neighbor><neighbor></neighbor></country>
</data>

Notice neighbor still appears. 注意neighbor仍然出现。

I know I could easily run a regex over the output but there's gotta be a way (or another Python api) that does this on the fly instead of requiring me to touch my .xml file again. 我知道我可以轻松地对输出运行正则表达式,但是必须有一种方法(或其他Python API)可以即时执行此操作,而无需再次触摸.xml文件。

import lxml.etree as et

xml  = et.parse("test.xml")

for node in xml.xpath("//neighbor"):
    node.getparent().remove(node)


xml.write("out.xml",encoding="utf-8",xml_declaration=True)

Using elementTree, we need to find the parents of the neighbor nodes then find the neighbor nodes inside that parent and remove them: 使用elementTree,我们需要找到neighbor nodes inside that parent parents of the neighbor nodes然后找到该neighbor nodes inside that parentneighbor nodes inside that parent并将其删除:

from xml.etree import ElementTree as et

xml  = et.parse("test.xml")


for parent in xml.getroot().findall(".//neighbor/.."):
      for child in parent.findall("./neighbor"):
          parent.remove(child)


xml.write("out.xml",encoding="utf-8",xml_declaration=True)

Both will give you: 两者都会给你:

<?xml version='1.0' encoding='utf-8'?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        </country>
</data>

Using your attribute logic and modifying the xml a bit like below: 使用属性逻辑并修改xml,如下所示:

x = """<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
           <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>"""

Using lxml: 使用lxml:

import lxml.etree as et

xml = et.fromstring(x)

for node in xml.xpath("//neighbor[not(@make) and not(@job) and not(@make)]"):
    node.getparent().remove(node)
print(et.tostring(xml))

Would give you: 会给你:

 <data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        </country>
</data>

The same logic in ElementTree: ElementTree中的逻辑相同:

from xml.etree import ElementTree as et

xml = et.parse("test.xml").getroot()

atts = {"build", "job", "make"}

for parent in xml.findall(".//neighbor/.."):
    for child in parent.findall(".//neighbor")[:]:
        if not atts.issubset(child.attrib):
            parent.remove(child)

If you are using iter: 如果您使用iter:

from xml.etree import ElementTree as et

xml = et.parse("test.xml")

for parent in xml.getroot().iter("*"):
    parent[:] = (child for child in parent if child.tag != "neighbor")

You can see we get the exact same output: 您可以看到我们得到了完全相同的输出:

In [30]: !cat /home/padraic/untitled6/test.xml
<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">#
      <neighbor name="Austria" direction="E"/>
        <rank>1</rank>
        <neighbor name="Austria" direction="E"/>
        <year>2008</year>
      <neighbor name="Austria" direction="E"/>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>
In [31]: paste
def test():
    import lxml.etree as et
    xml = et.parse("/home/padraic/untitled6/test.xml")
    for node in xml.xpath("//neighbor"):
        node.getparent().remove(node)
    a = et.tostring(xml)
    from xml.etree import ElementTree as et
    xml = et.parse("/home/padraic/untitled6/test.xml")
    for parent in xml.getroot().iter("*"):
        parent[:] = (child for child in parent if child.tag != "neighbor")
    b = et.tostring(xml.getroot())
    assert  a == b

## -- End pasted text --

In [32]: test()

Whenever modifying XML documents is needed, consider also XSLT , the special-purpose language part of the XSL family which includes XPath. 每当需要修改XML文档时,也要考虑XSLT ,它是XSL系列的专用语言部分,其中包括XPath。 XSLT is designed specifically to transform XML files. XSLT专为转换XML文件而设计。 Pythoners are not quick to recommend it but it avoids the need of loops or nested if/then logic in general purpose code. Pythoner并没有很快推荐它,但是它避免了通用代码中循环或嵌套if / then逻辑的需要。 Python's lxml module can run XSLT 1.0 scripts using the libxslt processor. Python的lxml模块可以使用libxslt处理器运行XSLT 1.0脚本。

Below transformation runs the identity transform to copy document as is and then runs an empty template match on <neighbor> to remove it: 下面的转换运行标识转换以按原样复制文档,然后在<neighbor>上运行空模板匹配以将其删除:

XSLT Script (save as an .xsl file to be loaded just like source .xml, both of which are well-formed xml files) XSLT脚本(另存为.xsl文件,就像源.xml一样加载,两者都是格式正确的xml文件)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- IDENTITY TRANSFORM TO COPY XML AS IS -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- EMPTY TEMPLATE TO REMOVE NEIGHBOR WHEREVER IT EXISTS -->  
  <xsl:template match="neighbor"/>

</xsl:transform>

Python Script Python脚本

import lxml.etree as et

# LOAD XML AND XSL DOCUMENTS
xml  = et.parse("Input.xml")
xslt = et.parse("Script.xsl")

# TRANSFORM TO NEW TREE
transform = et.XSLT(xslt)
newdom = transform(xml)

# CONVERT TO STRING
tree_out = et.tostring(newdom, encoding='UTF-8', pretty_print=True,  xml_declaration=True)

# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()

The trick here is to find the parent (the country node), and delete the neighbor from there. 这里的技巧是找到父节点(国家节点),然后从那里删除邻居。 In this example, I am using ElementTree because I am somewhat familiar with it: 在此示例中,我使用ElementTree是因为对它有些熟悉:

import xml.etree.ElementTree as ET

if __name__ == '__main__':
    with open('debug.log') as f:
        doc = ET.parse(f)

        for country in doc.findall('.//country'):
            for neighbor in country.findall('neighbor'):
                country.remove(neighbor)

        ET.dump(doc)  # Display

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM