[英]Merge specific child nodes in xml using python
I want to merge certain sub elements of xml file together. 我想将xml文件的某些子元素合并在一起。 The following is the format I have:
以下是我的格式:
<?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='360' left='113' width='440' height='147'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9921.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></image>
</images></dataset>
In the above xml I have the box coordinates of image 99.jpg specified twice which I want to merge into one. 在上面的xml中,我指定了两次要合并为一个的图像99.jpg的框坐标。 I want to remove the
<image>
tag that appears repeatively for the same image and want to merge all the box coordinates for every single image within its own image tags. 我想删除对同一图像重复出现的
<image>
标签,并希望将每个图像的所有框坐标合并到其自己的图像标签中。 I have never worked with XML and hence I am not sure if the definitions that I use is right here or not. 我从未使用过XML,因此不确定我使用的定义是否正确。 The desired output is:
所需的输出是:
<?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></box>
<box top='360' left='113' width='440' height='147'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9921.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></image>
</images></dataset>
You can try with module xml.etree.ElementTree : 您可以尝试使用模块xml.etree.ElementTree :
import xml.etree.ElementTree as ET
tree = ET.parse('dataset.xml')
root = tree.getroot()
file_dict = dict()
for image in root.iter('image'):
file_str = image.get('file')
if file_str in file_dict:
root.find('images').remove(image) #remove the duplicate one
root.find('images').find("./image[@file='"+file_str+"']").append(image.find('box')) #append duplicated subelement to merge with same image element
else:
file_dict[file_str]=image
print(ET.tostring(root))
The new root
will be: 新的
root
将是:
<dataset><images>
<image file="/home/user126043/Documents/testimages/9941.jpg">
<box height="147" left="113" top="360" width="440">
<label>Pirelli
</label></box></image>
<image file="/home/user126043/Documents/testimages/99.jpg">
<box height="276" left="247" top="160" width="228">
<label>Pirelli
</label></box><box height="276" left="247" top="439" width="506">
<label>Pirelli
</label></box></image>
</images></dataset>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.