简体   繁体   中英

Comparing XML files using python scripting

There are 2 XML files and need to compare these two XMLS to verify that data is same or not. Now book id is not in same sequence in both the XMLs. but script should be able compare these 2 XMLs based on book id. can someone help me in this?

Output: Data is same in both XML files.

text1.xml

<?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
   <product description="Cardigan Sweater" product_image="cardigan.jpg">
      <catalog_item gender="Men's">
         <item_number>QWZ5671</item_number>
         <cool_number>QWZ5671</cool_number>
         <price>39.5</price>
         <size description="Medium">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
         </size>
      </catalog_item>
      <catalog_item gender="Women's">
         <item_number>RRX986</item_number>
         <price>42.50</price>
         <size description="Small">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="navy_cardigan.jpg">Nay</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burundy</color_swatch>
         </size>
      </catalog_item>
   </product>
</catalog>

text2.xml

<?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
   <product description="Cardigan Sweater" product_image="cardigan.jpg">
      <catalog_item gender="Women's">
         <item_number>RRX9856</item_number>
         <price>42.50</price>
         <size description="Small">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="navy_cardigan.jpg">Navy</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
         </size>
      </catalog_item>
      <catalog_item gender="Men's">
         <item_number>QWZ5671</item_number>
         <price>39.95</price>
         <size description="Medium">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
         </size>
      </catalog_item>      
   </product>
</catalog>

Try this:

from lxml import etree

root_1 = etree.parse('test1.xml').getroot()
root_2 = etree.parse('test2.xml').getroot()

d1, d2 = [], []
for node in root_1.findall('.//catalog_item'):
    for x in node.iter():
        if x.attrib:
            d1.append(x.attrib.values()[0])
        if x.text.strip():
            d1.append(x.text.strip())

for node in root_2.findall('.//catalog_item'):
    for x in node.iter():
        if x.attrib:
            d2.append(x.attrib.values()[0])
        if x.text.strip():
            d2.append(x.text.strip())

print('Data is same in both XML files') if set(d1) == set(d2) else print('Data is different in both XML files')

Another method

It will store witch properties are which are different in a dictionary.

from lxml import etree
from collections import defaultdict

root_1 = etree.parse('test1.xml').getroot()
root_2 = etree.parse('test2.xml').getroot()

d1, d2 = [], []
for node in root_1.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        if x.attrib:
            item[x.attrib.keys()[0]].append(x.attrib.values()[0])
        if x.text.strip():
            item[x.tag].append(x.text.strip())
    d1.append(dict(item))

for node in root_2.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        if x.attrib:
            item[x.attrib.keys()[0]].append(x.attrib.values()[0])
        if x.text.strip():
            item[x.tag].append(x.text.strip())
    d2.append(dict(item))

d1 = sorted(d1, key = lambda x: x['item_number'])
d2 = sorted(d2, key = lambda x: x['item_number'])

res_dict = defaultdict(list)
for x, y in zip(d1, d2):
    for key1, key2 in zip(x.keys(), y.keys()):
        if key1 == key2 and sorted(x[key1]) != sorted(y[key2]):
            res_dict[x['item_number'][0]].append({key1: list(set(x[key1]) ^ set(y[key2]))})

print('Data is same in both XML files') if res_dict == {} else print('Data is different in both XML files \n', dict(res_dict))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM