[英]Remove pattern from xml with python etree
我有一个 xml 文件如下:
<?xml version="1.0" encoding="UTF-8"?>
<kw name="k1" library="k1">
<kw name="k2" library="k2">
<kw name="Keep This" library="Keep This">
<c name="c4" library="c4">
</c>
</kw>
<kw name="k3" library="k3">
<c name="c4" library="c4">
</c>
</kw>
<c name="c3" library="c3">
<c name="c4" library="c4">
</c>
</c>
</kw>
</kw>
我想删除表,但除了满足以下规则:
另一个表应该从 xml 中删除
所以输出应该是这样的:
<?xml version="1.0" encoding="UTF-8"?>
<kw name="k1" library="k1">
<kw name="k2" library="k2">
<kw name="Keep This" library="Keep This">
<c name="c4" library="c4">
</c>
</kw>
<c name="c3" library="c3">
<c name="c4" library="c4">
</c>
</c>
</kw>
</kw>
跟踪递归函数真的很难,有人可以帮助我或推荐另一种方法来实现我的要求吗?
import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')
root = tree.getroot()
def check(root):
# if subchild exist "kw" tag, parse to the subchild
if 'kw' in ([child.tag for child in root]):
for child in root:
flag = check(child)
# remove
if not flag:
root.remove(child)
# if subchild dose not exist "kw" tag
else:
if root.tag == 'kw':
# Check if itself's tag is kw and "Keep this"
if 'Keep This' in [root.attrib[child] for child in root.attrib]:
return True
# Remove if itself's tag is kw but without "Keep this"
else:
print ('remove')
return False
else:
return True
check(root)
ET.dump(root)
您可以改用以下递归函数。 请注意使用异常作为通知父级删除子级的一种方式,因为必须从父级中删除节点,并且布尔返回值仅指示是否带有kw
标记和Keep This
属性值的后代Keep This
是找到了。 这具有在根节点下根本找不到“保持”节点时通知调用者的好处,根据规则,应该删除它,但不能因为它是根节点:
import xml.etree.ElementTree as ET
def check(node):
if node.tag == 'kw' and any(value == 'Keep This' for value in node.attrib.values()):
return True
keep = False
removals = []
for child in node:
try:
if check(child):
keep = True
except RuntimeError:
removals.append(child)
for child in removals:
node.remove(child)
if node.tag == 'kw' and not keep:
raise RuntimeError('No "keep" node found under this node')
return keep
tree = ET.parse('a.xml')
root = tree.getroot()
check(root)
ET.dump(root)
使用您的样本输入,这会输出:
<kw library="k1" name="k1">
<kw library="k2" name="k2">
<kw library="Keep This" name="Keep This">
<c library="c4" name="c4">
</c>
</kw>
<c library="c3" name="c3">
<c library="c4" name="c4">
</c>
</c>
</kw>
</kw>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.