[英]How to remove all occurences of element in XML file?
I'd like to edit a KML file and remove all occurences of ExtendedData elements, wherever they are located in the file.我想编辑一个 KML 文件并删除所有出现的 ExtendedData 元素,无论它们位于文件中的什么位置。
Here's the input XML file:这是输入 XML 文件:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
<Style id="placemark-red">
<IconStyle>
<Icon>
<href>http://maps.me/placemarks/placemark-red.png</href>
</Icon>
</IconStyle>
</Style>
<name>My track</name>
<ExtendedData xmlns:mwm="https://maps.me">
<mwm:name>
<mwm:lang code="default">Blah</mwm:lang>
</mwm:name>
<mwm:lastModified>2020-04-05T14:17:18Z</mwm:lastModified>
</ExtendedData>
<Placemark>
<name></name>
…
<ExtendedData xmlns:mwm="https://maps.me">
<mwm:localId>0</mwm:localId>
<mwm:visibility>1</mwm:visibility>
</ExtendedData>
</Placemark>
</Document>
</kml>
And here's the code that 1) only removes the outermost occurence, and 2) requires adding the namespace to find it:下面是 1) 仅删除最外层出现的代码,以及 2) 需要添加命名空间才能找到它:
from lxml import etree
from pykml import parser
from pykml.factory import KML_ElementMaker as KML
with open("input.xml") as f:
doc = parser.parse(f)
root = doc.getroot()
ns = "{http://earth.google.com/kml/2.2}"
for pm in root.Document.getchildren():
#No way to get rid of namespace, for easier search?
if pm.tag==f"{ns}ExtendedData":
root.Document.remove(pm)
#How to remove innermost occurence of ExtendedData?
print(etree.tostring(doc, pretty_print=True))
Is there a way to remove all occurences in one go, or should I parse the whole tree?有没有办法删除一个 go 中的所有出现,或者我应该解析整个树?
Thank you.谢谢你。
Edit: The BeautifulSoup solution below requires adding an option "BeautifulSoup(my_xml,features="lxml")" to avoid the warning "No parser was explicitly specified".编辑:下面的 BeautifulSoup 解决方案需要添加一个选项“BeautifulSoup(my_xml,features="lxml")”以避免警告“没有明确指定解析器”。
Here's a solution using BeautifulSoup:这是使用 BeautifulSoup 的解决方案:
soup = BeautifulSoup(my_xml) # this is your xml
while True:
elem = soup.find("extendeddata")
if not elem:
break
elem.decompose()
Here's the output for your data:这是您的数据的 output:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
<kml xmlns="http://earth.google.com/kml/2.2">
<document>
<style id="placemark-red">
<IconStyle>
<Icon>
<href>http://maps.me/placemarks/placemark-red.png</href>
</Icon>
</IconStyle>
</style>
<name>
My track
</name>
<placemark>
<name>
</name>
</placemark>
</document>
</kml>
</body>
</html>
If you know the XML structure, try:如果你知道 XML 结构,试试:
xml_root = ElementTree.parse(filename_path).getroot()
elem = xml_root.find('./ExtendedData')
xml_root.remove(elem)
or或者
xml_root = ElementTree.parse(filename_path).getroot()
p_elem = xml_root.find('/Placemark')
c_elem = xml_root.find('/Placemark/ExtendedData')
p_elem.remove(c_elem)
play with this ideas:)玩这个想法:)
if you don't know the xml structure, I think you need to parse the whole tree.如果您不知道 xml 结构,我认为您需要解析整个树。
Simply run the empty template with Identity Transform using XSLT 1.0 which Python's lxml
can run.只需使用 Python 的
lxml
可以运行的 XSLT 1.0 运行带有Identity Transform的空模板。 No for
/ while
loops or if
logic needed.没有
for
/ while
循环或if
需要逻辑。 To handle the default namespace, define a prefix like doc
:要处理默认命名空间,请定义一个前缀,如
doc
:
XSLT (save a.xsl file, a special.xml file) XSLT (保存一个.xsl文件,一个特殊的.xml文件)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://earth.google.com/kml/2.2">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<!-- REMOVE ALL OCCURRENCES OF NODE -->
<xsl:template match="doc:ExtendedData"/>
</xsl:stylesheet>
Python Python
import lxml.etree as et
# LOAD XML AND XSL SOURCES
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')
# TRANSFORM INPUT
transform = et.XSLT(xsl)
result = transform(xml)
# PRINT TO SCREEN
print(result)
# SAVE TO FILE
with open('Output.kml', 'wb') as f:
f.write(result)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.