简体   繁体   中英

Merge elements having common tag in xml file

I have created a xml file using ElementTree in Python. I am very new to python so please forgive if am making some mistakes in terms. I want to merge the contents of elements having the same attribute name.

<?xml version="1.0" ?>
<DefaultLines>
    <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
        <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
                <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
                <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
        </FileName>
        <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
                <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
        </FileName>
    </Files>
</DefaultLines>

For eg Filename1 and Filename2 have the same attribute ie "emem_fifo_1c.vhd ". I want the elements inside the FileName to be merged into one if "file" is the same.

My output xml should look like

<?xml version="1.0" ?>
<DefaultLines>
    <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
        <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
                <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
                <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
                <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
        </FileName>
    </Files>
</DefaultLines>

I am really clueless how to do the same using ElementTree in python.

Update: I was about to solve this issue with the help of dabingsou. However I am facing another issue of duplicate content inside the nodes. I am trying to remove them while adding them into the xml but it is not working.

<?xml version="1.0" ?>
<DefaultLines>
    <Files Date="2020-10-31" Name="D:\report_byfile_detailed.txt">
        <FileName file="emem_fifo_1c.vhd ">
            <DefLines>
                <Message>'108'<Child>Expression</Child>Item    1  ((W_EN and not(fifo_full)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>108<Child>Expression</Child>Item    1  ((W_EN and not(fifo_full)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>109<Child>Expression</Child>Item    1  ((R_EN and not(fifo_empty)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
            <DefLines>
                <Message>108<Child>Expression</Child>Item    1  ((W_EN and not(fifo_full)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   4:<Child>Expression</Child>fifo_full_1 not SRESET &amp;&amp; W_EN</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   6:<Child>Expression</Child>SRESET_1 (W_EN and not(fifo_full))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   4:<Child>Expression</Child>fifo_full_1           not SRESET &amp;&amp; W_EN</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   6:<Child>Expression</Child>SRESET_1              (W_EN and not(fifo_full))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>

'108', '109' 'Row4', Row6' is getting appended multiple times. Is it possible i keep only the first occurrence and remove the rest.

Update: After using the method to remove duplicates,I am getting xml having incomplete nodes:

<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-11-01" Name="D:\report_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>
'108'
<Child>Expression</Child>
Item    1  ((W_EN and not(fifo_full)) and not(SRESET))
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'119'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'120'
<Child>Statement</Child>
w_addr &lt;= (others =&gt; '0');
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'135'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>

<DefLines>
<Message>
'136'
<Child>Statement</Child>
r_addr &lt;= (others =&gt; '0');
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'157'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'158'
<Child>Statement</Child>
fifo_empty &lt;= '1';
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'180'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>

<DefLines>
<Message>
'181', '182'
<Child>Statement</Child>
fifo_used     &lt;= (others =&gt; '0');
fifo_used_one &lt;= '0';
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'568', '569', '570', '571'
<Child>Statement</Child>
config_rd_fsm                     

    &lt;= '0';
axi4_lite_slave_rdata_ch_out &lt;= AXI4LITE_RDATA32_S2M_DEF;
config_rd_fsm                &lt;= IDLE;
</Message>
<Justification />
<Comment />
<Status />
**<

<DefLines>**
<Message>
161
<Child>Condition</Child>
Item    1  (((r_en_valid = '1') and (fifo_used_one = '1')) and (w_en_valid = '0'))
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
DefLines>

<DefLines>
<Message>
367
<Child>Branch</Child>
when others =&gt;
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
**<Child>Bran**

<DefLines>
<Message>
Row   5:    
<Child>Condition</Child>
(w_en_valid = '0')_0     ((r_en_valid = '1') and (fifo_used_one = '1'))
</Message>
<Justification />
<Comment />
<Status />
**</DefLines>sh</Child>**
All False Count
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
587
<Child>Branch</Child>
when others =&gt;
</Message>
**<Justi>**


</FileName>

I have tried to bold the areas where the tree is coming incomplete and hence i am getting error while generating and parsing the xml tree

This is how it can be done with lxml; I'll try to explain as we go along.

The basic principle is that we choose, randomly, the first FileName as the repository of the target information, paste that target information into it, and then delete the parent of that target.

    from lxml import etree
    deflines = """<?xml version="1.0" ?>
    <DefaultLines>
        <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
            <FileName file="emem_fifo_1c.vhd  ">
                <DefLines>
                    <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
                </DefLines>
                <DefLines>
                    <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
                </DefLines>
            </FileName>
          <FileName file="some_other_name.text">
                <DefLines>
                    <Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
                </DefLines>
            </FileName>
            <FileName file="emem_fifo_1c.vhd  " id="mushi">
                <DefLines>
                    <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
                </DefLines>
            </FileName>       
        </Files>
    </DefaultLines>
    """
    # I added another FileName which doesn't meet the requirements, just to demonstrate how it works
    
    doc = etree.XML(deflines)
    destination = doc.xpath('//Files/FileName[1]//DefLines')[0]
    for  dl in doc.xpath('//FileName[@file="emem_fifo_1c.vhd  "][position()>1]//DefLines'): #position has to be >1 to make sure we skip the destination element        
        dl.getparent().getparent().remove(dl.getparent()) #the target was inside a parent which to be removed; so we search for the target's grandparent 
        destination.append(dl)
    print(etree.tostring(doc, xml_declaration = True).decode())

Another method, for your reference.

from simplified_scrapy import SimplifiedDoc, utils
xml = '''
<?xml version="1.0" ?>
<DefaultLines>
   <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
      <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
               <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
               <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
      </FileName>
      <FileName file="some_other_name.text">
            <DefLines>
               <Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
            </DefLines>
      </FileName>
      <FileName file="emem_fifo_1c.vhd  " id="mushi">
            <DefLines>
               <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
      </FileName>       
   </Files>
</DefaultLines>
'''

dic = {}
doc = SimplifiedDoc(xml)
nodes = doc.selects('Files>FileName')
for node in nodes:
   last = dic.get(node['file'])
   if last:
      last.appendChild(node.html)
      node.remove()
   else:
      dic[node['file']]=node
      
# print (doc.html)
# remove the duplicate items
nodes = doc.selects('Files>FileName')
for node in nodes:
    dic.clear()
    lst = node.selects('DefLines')
    if len(lst) <= 1:
        continue
    for n in lst:
        key = n.select('Message').firstText()
        exist = dic.get(key)
        if exist:
            n.remove()
        else:
            dic[key] = True
# Sort
nodes = doc.selects('Files>FileName')
for node in nodes:
    dic.clear()
    lst = node.selects('DefLines')
    if len(lst) <= 1:
        continue
    for n in lst:
        dic[n.select('Message').firstText()] = n.outerHtml # Cache, replace it below.

    i = 0
    for key in sorted(dic):
        lst[i].replaceSelf(dic[key]) # Replace after sorting
        i = i + 1
# Save
utils.saveFile('test.xml', doc.html)

Result:

<?xml version="1.0" ?>
<DefaultLines>
   <Files Date="2020-10-23" Name="D: eport_byfile_detailed.txt">
      <FileName file="emem_fifo_1c.vhd ">
            <DefLines>
               <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
            <DefLines>
               <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
      
            <DefLines>
               <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
      </FileName>
      <FileName file="some_other_name.text">
            <DefLines>
               <Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
            </DefLines>
      </FileName>       
   </Files>
</DefaultLines>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM