简体   繁体   English

合并XML元素,同时使用python保留内容

[英]Merging XML elements while keeping the contents using python

I've been looing around for a method to remove an element from an XML document,while keeping the contents, using Python, but i haven't been able to find an answer that works. 我一直在寻找一种方法来从XML文档中删除元素,同时使用Python保留内容,但是我一直无法找到有效的答案。

Basically, i received an XML document in the following format (example): 基本上,我收到了以下格式的XML文档(示例):

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

What i have to do is to merge element2 and element3 into element1 such that the output XML document looks like: 我要做的是将element2和element3合并到element1中,以便输出XML文档如下所示:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

I would appreciate some tips on my (hopefully) simple problem. 对于(希望)这个简单的问题,我将不胜感激。

Note: I am somewhat new to Python as well, so bear with me. 注意:我也是Python的新手,所以请多多包涵。

This might not be the prettiest of solutions, but since there's no other answer yet... 这可能不是最漂亮的解决方案,但是由于没有其他答案了……

You could just search for, eg, </element1><element1> and replace it with the empty string. 您可以只搜索</element1><element1>并将其替换为空字符串。

xml = """<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>"""

import re
print re.sub(r"\s*</element1>\s*<element1>", "", xml)

Or more generally, re.sub(r"\\s*</([a-zA-Z0-9_]+)>\\s*<\\1>", "", xml) to merge all consecutive instances of the same element, by matching the first element name as a group and then looking for that same group with \\1 . 或更一般而言, re.sub(r"\\s*</([a-zA-Z0-9_]+)>\\s*<\\1>", "", xml)合并相同的所有连续实例元素,方法是将第一个元素名称匹配为一个组,然后使用\\1查找相同的组。

Output, in both cases: 在两种情况下的输出:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

For more complex documents, you might want to use one of Python's many XML libraries instead. 对于更复杂的文档,您可能想要使用Python的许多XML库之一

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM