[英]How do I remove a parent node in XML file based on text in a child element? Using Python 3.6
現在我正在嘗試在 Python 3.6 中使用 lxml。 如果它們包含對沖,我想刪除“程序”,如果所有程序都不包含“保留”,我想刪除“請求”。 xml 的結構如下:
<Requests>
<Request>
<ProgramSelection>
<Program> <![CDATA[hedge]]> </Program>
<Program> <![CDATA[keep]] </Program>
</ProgramSelection>
</Request>
</Requests>
import lxml.etree
file_name = r'C:filename.xml'
parser = lxml.etree.XMLParser(strip_cdata=False)
tree = lxml.etree.parse(file_name, parser)
root = tree.getroot()
for elem in tree.xpath("./Request[ProgramSelection/Program='hedge']"):
root.remove(elem)
你很接近。 以下兩個 xpaths select 元素符合您的刪除條件
import lxml.etree
file_name = r'test.xml'
parser = lxml.etree.XMLParser(strip_cdata=False)
tree = lxml.etree.parse(file_name, parser)
root = tree.getroot()
# remove <Request> lacking a <Program>keep</Program>
for request in tree.xpath(
"Request[not(ProgramSelection/Program[contains(text(),keep)])]"):
request.getparent().remove(request)
# remove <Program>hedge</Program>
for program in tree.xpath(
"Request/ProgramSelection/Program[contains(text(), hedge)]"):
program.getparent().remove(program)
print(lxml.etree.tostring(tree, pretty_print=True).decode())
您可以將它們組合成可讀性較差的“或”
import lxml.etree
file_name = r'test.xml'
parser = lxml.etree.XMLParser(strip_cdata=False)
tree = lxml.etree.parse(file_name, parser)
root = tree.getroot()
# remove <Request> lacking a <Program>keep</Program>
# remove <Program>hedge</Program>
for elem in tree.xpath("Request[
not(ProgramSelection/Program[contains(text(),keep)])]"
"|"
"Request/ProgramSelection/Program[contains(text(), hedge)]"):
elem.getparent().remove(elem)
print(lxml.etree.tostring(tree, pretty_print=True).decode())
由於您使用lxml
模塊,請考慮XSLT ,這是一種旨在轉換 XML 文件的專用語言。 使用這種方法,不需要for
循環或if
邏輯。 另外,XSLT 是便攜式的,因此可以運行它遠遠超過 Python。
以下腳本運行身份轉換以按原樣復制文檔,然后根據需要的邏輯運行兩個空模板以刪除其內容。
XSLT (另存為.xsl文件)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select='normalize-space()'/>
</xsl:template>
<xsl:template match="Program[contains(text(),'hedge')]"/>
<xsl:template match="Request[not(contains(., 'keep'))]"/>
</xsl:stylesheet>
Python
import lxml.etree as et
doc = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')
transform = et.XSLT(xsl)
result = transform(doc)
# OUTPUT TO SCREEN
print(result)
# OUTPUT TO FILE
with open('Output.xml', 'wb') as f:
f.write(result)
Output
<?xml version="1.0"?>
<Requests>
<Request>
<ProgramSelection>
<Program>keep</Program>
</ProgramSelection>
</Request>
</Requests>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.