简体   繁体   English

使用python合并xml中的特定子节点

[英]Merge specific child nodes in xml using python

I want to merge certain sub elements of xml file together. 我想将xml文件的某些子元素合并在一起。 The following is the format I have: 以下是我的格式:

 <?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='360' left='113' width='440' height='147'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9921.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></image>
</images></dataset>

In the above xml I have the box coordinates of image 99.jpg specified twice which I want to merge into one. 在上面的xml中,我指定了两次要合并为一个的图像99.jpg的框坐标。 I want to remove the <image> tag that appears repeatively for the same image and want to merge all the box coordinates for every single image within its own image tags. 我想删除对同一图像重复出现的<image>标签,并希望将每个图像的所有框坐标合并到其自己的图像标签中。 I have never worked with XML and hence I am not sure if the definitions that I use is right here or not. 我从未使用过XML,因此不确定我使用的定义是否正确。 The desired output is: 所需的输出是:

<?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
    <image file='/home/user126043/Documents/testimages/9935.jpg'>
    <box top='329' left='510' width='385' height='534'>
    <label>Pirelli
    </label></box>
    <box top='360' left='113' width='440' height='147'>
    <label>Pirelli
    </label></box></image>
    <image file='/home/user126043/Documents/testimages/9921.jpg'>
    <box top='329' left='510' width='385' height='534'>
    <label>Pirelli
    </label></image>
    </images></dataset>

You can try with module xml.etree.ElementTree : 您可以尝试使用模块xml.etree.ElementTree

import xml.etree.ElementTree as ET
tree = ET.parse('dataset.xml')
root = tree.getroot()
file_dict = dict()
for image in root.iter('image'):    
    file_str = image.get('file')    
    if file_str in file_dict:
        root.find('images').remove(image) #remove the duplicate one
        root.find('images').find("./image[@file='"+file_str+"']").append(image.find('box')) #append duplicated subelement to merge with same image element
    else:
        file_dict[file_str]=image
print(ET.tostring(root))

The new root will be: 新的root将是:

<dataset><images>
<image file="/home/user126043/Documents/testimages/9941.jpg">
<box height="147" left="113" top="360" width="440">
<label>Pirelli
</label></box></image>
<image file="/home/user126043/Documents/testimages/99.jpg">
<box height="276" left="247" top="160" width="228">
<label>Pirelli
</label></box><box height="276" left="247" top="439" width="506">
<label>Pirelli
</label></box></image>
</images></dataset>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM