简体   繁体   English

如何删除 XML 文件中所有出现的元素?

[英]How to remove all occurences of element in XML file?

I'd like to edit a KML file and remove all occurences of ExtendedData elements, wherever they are located in the file.我想编辑一个 KML 文件并删除所有出现的 ExtendedData 元素,无论它们位于文件中的什么位置。

Here's the input XML file:这是输入 XML 文件:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>

  <Style id="placemark-red">
    <IconStyle>
      <Icon>
        <href>http://maps.me/placemarks/placemark-red.png</href>
      </Icon>
    </IconStyle>
  </Style>

  <name>My track</name>

  <ExtendedData xmlns:mwm="https://maps.me">
    <mwm:name>
      <mwm:lang code="default">Blah</mwm:lang>
    </mwm:name>
    <mwm:lastModified>2020-04-05T14:17:18Z</mwm:lastModified>
  </ExtendedData>

  <Placemark>
    <name></name>
        …
    <ExtendedData xmlns:mwm="https://maps.me">
      <mwm:localId>0</mwm:localId>
      <mwm:visibility>1</mwm:visibility>
    </ExtendedData>
  </Placemark>
</Document>
</kml>

And here's the code that 1) only removes the outermost occurence, and 2) requires adding the namespace to find it:下面是 1) 仅删除最外层出现的代码,以及 2) 需要添加命名空间才能找到它:

from lxml import etree
from pykml import parser
from pykml.factory import KML_ElementMaker as KML

with open("input.xml") as f:
  doc = parser.parse(f)
root = doc.getroot()

ns = "{http://earth.google.com/kml/2.2}"

for pm in root.Document.getchildren():
    #No way to get rid of namespace, for easier search?
    if pm.tag==f"{ns}ExtendedData":
        root.Document.remove(pm)

    #How to remove innermost occurence of ExtendedData?

print(etree.tostring(doc, pretty_print=True))

Is there a way to remove all occurences in one go, or should I parse the whole tree?有没有办法删除一个 go 中的所有出现,或者我应该解析整个树?

Thank you.谢谢你。


Edit: The BeautifulSoup solution below requires adding an option "BeautifulSoup(my_xml,features="lxml")" to avoid the warning "No parser was explicitly specified".编辑:下面的 BeautifulSoup 解决方案需要添加一个选项“BeautifulSoup(my_xml,features="lxml")”以避免警告“没有明确指定解析器”。

Here's a solution using BeautifulSoup:这是使用 BeautifulSoup 的解决方案:

soup = BeautifulSoup(my_xml) # this is your xml

while True: 
    elem = soup.find("extendeddata")
    if not elem:
        break
    elem.decompose()

Here's the output for your data:这是您的数据的 output:

<?xml version="1.0" encoding="UTF-8"?>
<html>
 <body>
  <kml xmlns="http://earth.google.com/kml/2.2">
   <document>
    <style id="placemark-red">
     <IconStyle>
      <Icon>
        <href>http://maps.me/placemarks/placemark-red.png</href>
      </Icon>
    </IconStyle>
    </style>
    <name>
     My track
    </name>
    <placemark>
     <name>
     </name>
    </placemark>
   </document>
  </kml>
 </body>
</html>

If you know the XML structure, try:如果你知道 XML 结构,试试:

xml_root = ElementTree.parse(filename_path).getroot()
elem = xml_root.find('./ExtendedData')
xml_root.remove(elem)

or或者

xml_root = ElementTree.parse(filename_path).getroot()
p_elem = xml_root.find('/Placemark')
c_elem = xml_root.find('/Placemark/ExtendedData')
p_elem.remove(c_elem)

play with this ideas:)玩这个想法:)

if you don't know the xml structure, I think you need to parse the whole tree.如果您不知道 xml 结构,我认为您需要解析整个树。

Simply run the empty template with Identity Transform using XSLT 1.0 which Python's lxml can run.只需使用 Python 的lxml可以运行的 XSLT 1.0 运行带有Identity Transform的空模板。 No for / while loops or if logic needed.没有for / while循环或if需要逻辑。 To handle the default namespace, define a prefix like doc :要处理默认命名空间,请定义一个前缀,如doc

XSLT (save a.xsl file, a special.xml file) XSLT (保存一个.xsl文件,一个特殊的.xml文件)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:doc="http://earth.google.com/kml/2.2">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>

    <!-- REMOVE ALL OCCURRENCES OF NODE -->
    <xsl:template match="doc:ExtendedData"/>

</xsl:stylesheet>

Python Python

import lxml.etree as et

# LOAD XML AND XSL SOURCES
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# TRANSFORM INPUT
transform = et.XSLT(xsl)
result = transform(xml)

# PRINT TO SCREEN
print(result)

# SAVE TO FILE
with open('Output.kml', 'wb') as f:
    f.write(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM