如何删除 XML 文件中所有出现的元素？

Question

I'd like to edit a KML file and remove all occurences of ExtendedData elements, wherever they are located in the file.我想编辑一个 KML 文件并删除所有出现的 ExtendedData 元素，无论它们位于文件中的什么位置。

Here's the input XML file:这是输入 XML 文件：

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>

  <Style id="placemark-red">
    <IconStyle>
      <Icon>
        <href>http://maps.me/placemarks/placemark-red.png</href>
      </Icon>
    </IconStyle>
  </Style>

  <name>My track</name>

  <ExtendedData xmlns:mwm="https://maps.me">
    <mwm:name>
      <mwm:lang code="default">Blah</mwm:lang>
    </mwm:name>
    <mwm:lastModified>2020-04-05T14:17:18Z</mwm:lastModified>
  </ExtendedData>

  <Placemark>
    <name></name>
        …
    <ExtendedData xmlns:mwm="https://maps.me">
      <mwm:localId>0</mwm:localId>
      <mwm:visibility>1</mwm:visibility>
    </ExtendedData>
  </Placemark>
</Document>
</kml>

And here's the code that 1) only removes the outermost occurence, and 2) requires adding the namespace to find it:下面是 1) 仅删除最外层出现的代码，以及 2) 需要添加命名空间才能找到它：

from lxml import etree
from pykml import parser
from pykml.factory import KML_ElementMaker as KML

with open("input.xml") as f:
  doc = parser.parse(f)
root = doc.getroot()

ns = "{http://earth.google.com/kml/2.2}"

for pm in root.Document.getchildren():
    #No way to get rid of namespace, for easier search?
    if pm.tag==f"{ns}ExtendedData":
        root.Document.remove(pm)

    #How to remove innermost occurence of ExtendedData?

print(etree.tostring(doc, pretty_print=True))

Is there a way to remove all occurences in one go, or should I parse the whole tree?有没有办法删除一个 go 中的所有出现，或者我应该解析整个树？

Thank you.谢谢你。

Edit: The BeautifulSoup solution below requires adding an option "BeautifulSoup(my_xml,features="lxml")" to avoid the warning "No parser was explicitly specified".编辑：下面的 BeautifulSoup 解决方案需要添加一个选项“BeautifulSoup(my_xml,features="lxml")”以避免警告“没有明确指定解析器”。

Answer 1

Here's a solution using BeautifulSoup:这是使用 BeautifulSoup 的解决方案：

soup = BeautifulSoup(my_xml) # this is your xml

while True: 
    elem = soup.find("extendeddata")
    if not elem:
        break
    elem.decompose()

Here's the output for your data:这是您的数据的 output：

<?xml version="1.0" encoding="UTF-8"?>
<html>
 <body>
  <kml xmlns="http://earth.google.com/kml/2.2">
   <document>
    <style id="placemark-red">
     <IconStyle>
      <Icon>
        <href>http://maps.me/placemarks/placemark-red.png</href>
      </Icon>
    </IconStyle>
    </style>
    <name>
     My track
    </name>
    <placemark>
     <name>
     </name>
    </placemark>
   </document>
  </kml>
 </body>
</html>

Answer 2

If you know the XML structure, try:如果你知道 XML 结构，试试：

xml_root = ElementTree.parse(filename_path).getroot()
elem = xml_root.find('./ExtendedData')
xml_root.remove(elem)

or或者

xml_root = ElementTree.parse(filename_path).getroot()
p_elem = xml_root.find('/Placemark')
c_elem = xml_root.find('/Placemark/ExtendedData')
p_elem.remove(c_elem)

play with this ideas:)玩这个想法:)

if you don't know the xml structure, I think you need to parse the whole tree.如果您不知道 xml 结构，我认为您需要解析整个树。

Answer 3

Simply run the empty template with Identity Transform using XSLT 1.0 which Python's lxml can run.只需使用 Python 的lxml可以运行的 XSLT 1.0 运行带有Identity Transform的空模板。 No for / while loops or if logic needed.没有for / while循环或if需要逻辑。 To handle the default namespace, define a prefix like doc :要处理默认命名空间，请定义一个前缀，如doc ：

XSLT (save a.xsl file, a special.xml file) XSLT （保存一个.xsl文件，一个特殊的.xml文件）

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:doc="http://earth.google.com/kml/2.2">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>

    <!-- REMOVE ALL OCCURRENCES OF NODE -->
    <xsl:template match="doc:ExtendedData"/>

</xsl:stylesheet>

Python Python

import lxml.etree as et

# LOAD XML AND XSL SOURCES
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# TRANSFORM INPUT
transform = et.XSLT(xsl)
result = transform(xml)

# PRINT TO SCREEN
print(result)

# SAVE TO FILE
with open('Output.kml', 'wb') as f:
    f.write(result)

如何删除 XML 文件中所有出现的元素？

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-06-19 14:59:42

解决方案2
0 2020-06-19 15:44:54

解决方案3
0 2020-06-19 21:41:49

如何删除 XML 文件中所有出现的元素？

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-06-19 14:59:42

解决方案2 0 2020-06-19 15:44:54

解决方案3 0 2020-06-19 21:41:49

解决方案1
1 已采纳 2020-06-19 14:59:42

解决方案2
0 2020-06-19 15:44:54

解决方案3
0 2020-06-19 21:41:49