[英]Error in parsing xml using python due to namespace present
使用以下脚本根据 XML 下面的图像类型删除子节点,但由于 xmlns header 存在以下错误,所以我删除了它并尝试仍然只删除 5 个子节点中的 3 个子节点。
你能检查一下吗?
<?xml version="1.0" encoding="UTF-8"?>
<!-- Copyright (c) All rights reserved. -->
<dummy_list xmlns="https://dummy_list_file"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="template.xsd">
<dummy_capability>
<dummy_type>1</dummy_type>
<dummy_type_string>dummy_3700E</dummy_type_string>
<dummy_image>c3700</dummy_image>
<dummy_string>dummy3702E,dummy3701E</dummy_string>
<dummy_capabilities>
<CSTREAMS>True</CSTREAMS>
<ABC_SUPPORTED>True</ABC_SUPPORTED>
<THRESHOLD_SUPPORTED>True</THRESHOLD_SUPPORTED>
<FABRIC_CABLE>True</FABRIC_CABLE>
</dummy_capabilities>
</dummy_capability>
<dummy_capability>
<dummy_type>2</dummy_type>
<dummy_type_string>dummy_2700E</dummy_type_string>
<dummy_image>c2700</dummy_image>
<dummy_string>dummy2702E,dummy2701E</dummy_string>
<dummy_capabilities>
<CSTREAMS>True</CSTREAMS>
<ABC_SUPPORTED>True</ABC_SUPPORTED>
<THRESHOLD_SUPPORTED>True</THRESHOLD_SUPPORTED>
<FABRIC_CABLE>True</FABRIC_CABLE>
</dummy_capabilities>
</dummy_capability>
<dummy_capability>
<dummy_type>3</dummy_type>
<dummy_type_string>dummy_1700E</dummy_type_string>
<dummy_image>c1700</dummy_image>
<dummy_string>dummy1702E,dummy1701E</dummy_string>
<dummy_capabilities>
<CSTREAMS>True</CSTREAMS>
<ABC_SUPPORTED>True</ABC_SUPPORTED>
<THRESHOLD_SUPPORTED>True</THRESHOLD_SUPPORTED>
<FABRIC_CABLE>True</FABRIC_CABLE>
</dummy_capabilities>
</dummy_capability>
<dummy_capability>
<dummy_type>4</dummy_type>
<dummy_type_string>dummy_4700E</dummy_type_string>
<dummy_image>c4700</dummy_image>
<dummy_string>dummy4702E,dummy4701E</dummy_string>
<dummy_capabilities>
<CSTREAMS>True</CSTREAMS>
<ABC_SUPPORTED>True</ABC_SUPPORTED>
<THRESHOLD_SUPPORTED>True</THRESHOLD_SUPPORTED>
<FABRIC_CABLE>True</FABRIC_CABLE>
</dummy_capabilities>
</dummy_capability>
<dummy_capability>
<dummy_type>4</dummy_type>
<dummy_type_string>dummy_4700E</dummy_type_string>
<dummy_image>c4700</dummy_image>
<dummy_string>dummy4702E,dummy4701E</dummy_string>
<dummy_capabilities>
<CSTREAMS>True</CSTREAMS>
<ABC_SUPPORTED>True</ABC_SUPPORTED>
<THRESHOLD_SUPPORTED>True</THRESHOLD_SUPPORTED>
<FABRIC_CABLE>True</FABRIC_CABLE>
</dummy_capabilities>
</dummy_capability>
<dummy_capability>
<dummy_type>4</dummy_type>
<dummy_type_string>dummy_4700E</dummy_type_string>
<dummy_image>c4700</dummy_image>
<dummy_string>dummy4702E,dummy4701E</dummy_string>
<dummy_capabilities>
<CSTREAMS>True</CSTREAMS>
<ABC_SUPPORTED>True</ABC_SUPPORTED>
<THRESHOLD_SUPPORTED>True</THRESHOLD_SUPPORTED>
<FABRIC_CABLE>True</FABRIC_CABLE>
</dummy_capabilities>
</dummy_capability>
</dummy_list>
#!/router/bin/python3-3.6.3
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('dummy.xml')
root = tree.getroot()
for child in root:
if (child.find('dummy_image').text == 'c3700'):
print("Removing child: " + child.find('dummy_image').text)
root.remove(child)
tree.write('out.xml')
xmlns="https://dummy_list_file"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="template.xsd
另一种方法。
from simplified_scrapy import SimplifiedDoc,utils
import json
xml = utils.getFileContent('dummy.xml')
doc = SimplifiedDoc(xml)
dummy_capabilitys = doc.selects('dummy_image').contains('c3700').parent
for dummy_capability in dummy_capabilitys:
dummy_capability.repleaceSelf("")
utils.saveFile("out.xml",doc.html)
# Get attributes
root = doc.select('dummy_list')
print (root["xmlns"],root["xmlns:xsi"],root["xsi:schemaLocation"])
结果:
https://dummy_list_file http://www.w3.org/2001/XMLSchema-instance template.xsd
这里有更多例子: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.