如何使用Python删除Root标记并在xml中保留所有行标记

Question

我有下面的XML文件。

<root>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
<catalog>
   <book id="bk102">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>45.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
<catalog>
   <book id="bk103">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>46.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
</root>

我想通过消除标签来创建另一个XML。 因此，我的新XML看起来像-

<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
<catalog>
   <book id="bk102">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>45.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
<catalog>
   <book id="bk103">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>46.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>

以下是我的代码，通过消除和保留所有必要的行标记，我可以生成字节类。 但最终无法将我的字节类转换为xml格式并出现以下错误：

xml.etree.ElementTree.ParseError：文档元素后出现垃圾：第11行，第0列

你能帮忙吗？

import xml.etree.ElementTree as ET

base_tree = ET.parse('input.xml')
catalog = list(base_tree.getroot())
elemList = []
for elem in catalog:
  getele = ET.tostring(elem, 'utf-8')
  elemList.append(getele)

byt = b''.join(elemList)
print(byt)

mytree = ET.ElementTree(ET.fromstring(byt))
dis = str(ET.tostring(mytree.getroot()), 'utf-8')

Answer 1

您可以为此使用列表。

with open('input.xml') as input_file:
    text = input_file.read()
    catalog = list(ET.fromstring(text))[0]
    ET.tostring(catalog, encoding='utf8', method='xml')

虽然结果字符串将不是有效的XML。

Answer 2

根元素对于XML是必不可少的。

对于仅文本处理，也许我们可以做

import re
pattern = re.compile("<[/]{0,1}root>")
removed = re.sub(pattern, '', "<root>something</root>");

print(removed)

？

如何使用Python删除Root标记并在xml中保留所有行标记

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-11-22 05:16:06

解决方案2
0 2018-11-22 04:36:58

如何使用Python删除Root标记并在xml中保留所有行标记

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-11-22 05:16:06

解决方案2 0 2018-11-22 04:36:58

解决方案1
1 已采纳 2018-11-22 05:16:06

解决方案2
0 2018-11-22 04:36:58