[英]How to keep sequence of XML tags even add/remove a tag using python
[英]How to remove Root tag and keep rest all row tags in an xml using python
我有下面的XML文件。
<root>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
<catalog>
<book id="bk102">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>45.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
<catalog>
<book id="bk103">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>46.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
</root>
我想通过消除标签来创建另一个XML。 因此,我的新XML看起来像-
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
<catalog>
<book id="bk102">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>45.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
<catalog>
<book id="bk103">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>46.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
以下是我的代码,通过消除和保留所有必要的行标记,我可以生成字节类。 但最终无法将我的字节类转换为xml格式并出现以下错误:
xml.etree.ElementTree.ParseError:文档元素后出现垃圾:第11行,第0列
你能帮忙吗?
import xml.etree.ElementTree as ET
base_tree = ET.parse('input.xml')
catalog = list(base_tree.getroot())
elemList = []
for elem in catalog:
getele = ET.tostring(elem, 'utf-8')
elemList.append(getele)
byt = b''.join(elemList)
print(byt)
mytree = ET.ElementTree(ET.fromstring(byt))
dis = str(ET.tostring(mytree.getroot()), 'utf-8')
您可以为此使用列表 。
with open('input.xml') as input_file:
text = input_file.read()
catalog = list(ET.fromstring(text))[0]
ET.tostring(catalog, encoding='utf8', method='xml')
虽然结果字符串将不是有效的XML。
根元素对于XML是必不可少的。
对于仅文本处理,也许我们可以做
import re
pattern = re.compile("<[/]{0,1}root>")
removed = re.sub(pattern, '', "<root>something</root>");
print(removed)
?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.