繁体   English   中英

如何遍历 xml 并将其保存到 Python 中的数据帧中?

[英]How to iterate over an xml and save it into a dataframe in Python?

我有一个XML ,我正在尝试对其进行迭代并将其(仅tracking events部分)保存到一个dataframe

这是input XML:

<?xml version="1.0" encoding="UTF-8"?>
<trackingresponse>
   <trackingdetails>
      <trackingdetail>
         <trackingnumber>1550161004</trackingnumber>
         <trackingevents>
            <trackingevent>
               <date>2020-10-21T11:04:00+01:00</date>
               <code>17</code>
            </trackingevent>
            <trackingevent>
               <date>2020-10-21T08:41:00+01:00</date>
               <code>18</code>
            </trackingevent>
    </trackingdetail>
   </trackingdetails>
</trackingresponse>

我试过这段代码,但它显示空数据框:

    response =requests.post(endpoint_url, data=t, headers = headers).text
    # response is correct
    response_tree = ET.fromstring(response)
    data = []
    for el in response_tree.iter('./*'):
        for i in el.iter('*'):
            data.append(dict(i.items()))

    df = pd.DataFrame(data)
    print(df)

我也尝试将text值写入临时数据帧,但这也不会:

response_df = pd.read_csv('/home/test.csv')
response_df['date']= response_tree.find('.//date').text
response_df['code']= response_tree.find('.//code').text

我也试过这个,但它给了我所有元素作为一个新行:

for child in tree.iter('trackingevent'): 
  for elem in child.iter():
       data = {str(elem.tag):[elem.text]}
       if str(elem.text)=='None':continue
       response_df = pd.DataFrame(data)
       consolidated_list.append(response_df)

我只是想将xmltracking events放入dataframe

预期数据框:


date                              code
2020-10-21T11:04:00+01:00         17
2020-10-21T08:41:00+01:00         18

您可以使用此示例通过etree解析 XML(注意:您在 XML 代码段中缺少</trackingevents> ,可能是拼写错误):

import pandas as pd
import xml.etree.ElementTree as et

tree = et.ElementTree(file='<your file.xml>')

data = []
for ev in tree.findall('.//trackingevent'):
    date = ev.find('date').text
    code = ev.find('code').text
    data.append({
        'date': date,
        'code': code
    })

df = pd.DataFrame(data)
print(df)

印刷:

                        date code
0  2020-10-21T11:04:00+01:00   17
1  2020-10-21T08:41:00+01:00   18

下面的代码完成了这项工作

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<trackingresponse>
   <trackingdetails>
      <trackingdetail>
         <trackingnumber>1550161004</trackingnumber>
         <trackingevents>
            <trackingevent>
               <date>2020-10-21T11:04:00+01:00</date>
               <code>17</code>
            </trackingevent>
            <trackingevent>
               <date>2020-10-21T08:41:00+01:00</date>
               <code>18</code>
            </trackingevent>
          </trackingevents>
      </trackingdetail>
   </trackingdetails>
</trackingresponse>'''

root = ET.fromstring(xml)
data = [{'date': e.find('date').text, 'code': e.find('code').text} for e in root.findall('.//trackingevent')]
df = pd.DataFrame(data)
print(df)

输出

                        date code
0  2020-10-21T11:04:00+01:00   17
1  2020-10-21T08:41:00+01:00   18

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM