简体   繁体   中英

How to extract data from multiple xml tags to csv file in python

xml is:

<?xml version="1.0" encoding="UTF-8"?>
<Page xmlns="http://gigabyte.com/documoto/Statuslist/1.6" xmlns:xs="http://www.w3.org/2001/XMLSchema" hashKey="MDAwNTgxMzQtQS0xLjEuc3Zn" pageFile="status-1.1.svg" tenantKey="Staus">
  <Stage description="SPREADER,GB/DD" locale="en" name="SPREADER,GB/DD"/>
  <File Price="0.0" Id="1" item="1" stage_status="true" ForPage="true" Number="05051401">
    <Stage description="" locale="n" name="DANGER"/>
  </File>
  <File Price="0.0" Id="2" item="2" stage_status="true" ForPage="true" Number="05051402">
    <Stage description="" locale="n" name="SPINNERS"/>
  </File>
  <File Price="0.0" Id="3" item="3" stage_status="true" ForPage="true" Number="05051404">
    <Stage description="" locale="n" name="CAUTION"/>
  </File>
</Page>

Expected Output in table format is:

price,Id,item,stage_status,Number

0.0,1,1,True,05051401

0.0,1,1,True,05051402

0.0,1,1,True,05051404

I tried this code:

import csv
import xml.etree.ElementTree as ET

tree = ET.parse("status-1.1.xml")
root = tree.getroot()

with open('Data.csv', 'w') as f:
    w = csv.DictWriter(f, fieldnames=('Price', 'Id', 'item', 'stage_status', 'Number'))
    w.writerheader()
    w.writerows(e.attrib for e in root.findall('.//File'))

An alternate way would be to use pandas. Consider this code:

import pandas as pd
import numpy as np

df = pd.read_xml('status-1.1.xml')

df2 = df[["Price","Id","item","stage_status","Number"]]
df2 = df2[df2["Price"].isna() == False]
df2.to_csv('Data.csv')

The below should work (no external lib is required)

import csv
import xml.etree.ElementTree as ET

root = ET.parse("status-1.1.xml")
data = []
attr_list = ['Price', 'Id', 'item', 'stage_status', 'Number']
for f in root.findall('.//{http://gigabyte.com/documoto/Statuslist/1.6}File'):
  data.append({a:f.attrib[a] for a in attr_list})
with open('Data.csv', 'w') as f:
    w = csv.DictWriter(f, fieldnames=attr_list)
    w.writeheader()
    w.writerows(data)

output file

Price,Id,item,stage_status,Number
0.0,1,1,true,05051401
0.0,2,2,true,05051402
0.0,3,3,true,05051404

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM