简体   繁体   中英

How to iterate over an xml and save it into a dataframe in Python?

I have an XML and i'm trying to iterate over it and save it(just the tracking events part) into a dataframe .

this is the input XML:

<?xml version="1.0" encoding="UTF-8"?>
<trackingresponse>
   <trackingdetails>
      <trackingdetail>
         <trackingnumber>1550161004</trackingnumber>
         <trackingevents>
            <trackingevent>
               <date>2020-10-21T11:04:00+01:00</date>
               <code>17</code>
            </trackingevent>
            <trackingevent>
               <date>2020-10-21T08:41:00+01:00</date>
               <code>18</code>
            </trackingevent>
    </trackingdetail>
   </trackingdetails>
</trackingresponse>

i tried this code but it shows empty dataframe :

    response =requests.post(endpoint_url, data=t, headers = headers).text
    # response is correct
    response_tree = ET.fromstring(response)
    data = []
    for el in response_tree.iter('./*'):
        for i in el.iter('*'):
            data.append(dict(i.items()))

    df = pd.DataFrame(data)
    print(df)

also i tried writing text values into a temp dataframe, but this wont either :

response_df = pd.read_csv('/home/test.csv')
response_df['date']= response_tree.find('.//date').text
response_df['code']= response_tree.find('.//code').text

i also tried this , but its giving me everything element as a new row :

for child in tree.iter('trackingevent'): 
  for elem in child.iter():
       data = {str(elem.tag):[elem.text]}
       if str(elem.text)=='None':continue
       response_df = pd.DataFrame(data)
       consolidated_list.append(response_df)

i'm just trying to get the tracking events inside the xml into a dataframe

expected dataframe:


date                              code
2020-10-21T11:04:00+01:00         17
2020-10-21T08:41:00+01:00         18

You can use this example to parse the XML with etree (note: you're missing </trackingevents> in your XML snippet, probably a typo):

import pandas as pd
import xml.etree.ElementTree as et

tree = et.ElementTree(file='<your file.xml>')

data = []
for ev in tree.findall('.//trackingevent'):
    date = ev.find('date').text
    code = ev.find('code').text
    data.append({
        'date': date,
        'code': code
    })

df = pd.DataFrame(data)
print(df)

Prints:

                        date code
0  2020-10-21T11:04:00+01:00   17
1  2020-10-21T08:41:00+01:00   18

The below code does the job

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<trackingresponse>
   <trackingdetails>
      <trackingdetail>
         <trackingnumber>1550161004</trackingnumber>
         <trackingevents>
            <trackingevent>
               <date>2020-10-21T11:04:00+01:00</date>
               <code>17</code>
            </trackingevent>
            <trackingevent>
               <date>2020-10-21T08:41:00+01:00</date>
               <code>18</code>
            </trackingevent>
          </trackingevents>
      </trackingdetail>
   </trackingdetails>
</trackingresponse>'''

root = ET.fromstring(xml)
data = [{'date': e.find('date').text, 'code': e.find('code').text} for e in root.findall('.//trackingevent')]
df = pd.DataFrame(data)
print(df)

output

                        date code
0  2020-10-21T11:04:00+01:00   17
1  2020-10-21T08:41:00+01:00   18

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM