[英]How to iterate over XML tags in Python using ElementTree & save to CSV
[英]How to iterate over an xml and save it into a dataframe in Python?
我有一个XML
,我正在尝试对其进行迭代并将其(仅tracking events
部分)保存到一个dataframe
。
这是input
XML:
<?xml version="1.0" encoding="UTF-8"?>
<trackingresponse>
<trackingdetails>
<trackingdetail>
<trackingnumber>1550161004</trackingnumber>
<trackingevents>
<trackingevent>
<date>2020-10-21T11:04:00+01:00</date>
<code>17</code>
</trackingevent>
<trackingevent>
<date>2020-10-21T08:41:00+01:00</date>
<code>18</code>
</trackingevent>
</trackingdetail>
</trackingdetails>
</trackingresponse>
我试过这段代码,但它显示空数据框:
response =requests.post(endpoint_url, data=t, headers = headers).text
# response is correct
response_tree = ET.fromstring(response)
data = []
for el in response_tree.iter('./*'):
for i in el.iter('*'):
data.append(dict(i.items()))
df = pd.DataFrame(data)
print(df)
我也尝试将text
值写入临时数据帧,但这也不会:
response_df = pd.read_csv('/home/test.csv')
response_df['date']= response_tree.find('.//date').text
response_df['code']= response_tree.find('.//code').text
我也试过这个,但它给了我所有元素作为一个新行:
for child in tree.iter('trackingevent'):
for elem in child.iter():
data = {str(elem.tag):[elem.text]}
if str(elem.text)=='None':continue
response_df = pd.DataFrame(data)
consolidated_list.append(response_df)
我只是想将xml
的tracking events
放入dataframe
预期数据框:
date code
2020-10-21T11:04:00+01:00 17
2020-10-21T08:41:00+01:00 18
您可以使用此示例通过etree
解析 XML(注意:您在 XML 代码段中缺少</trackingevents>
,可能是拼写错误):
import pandas as pd
import xml.etree.ElementTree as et
tree = et.ElementTree(file='<your file.xml>')
data = []
for ev in tree.findall('.//trackingevent'):
date = ev.find('date').text
code = ev.find('code').text
data.append({
'date': date,
'code': code
})
df = pd.DataFrame(data)
print(df)
印刷:
date code
0 2020-10-21T11:04:00+01:00 17
1 2020-10-21T08:41:00+01:00 18
下面的代码完成了这项工作
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<trackingresponse>
<trackingdetails>
<trackingdetail>
<trackingnumber>1550161004</trackingnumber>
<trackingevents>
<trackingevent>
<date>2020-10-21T11:04:00+01:00</date>
<code>17</code>
</trackingevent>
<trackingevent>
<date>2020-10-21T08:41:00+01:00</date>
<code>18</code>
</trackingevent>
</trackingevents>
</trackingdetail>
</trackingdetails>
</trackingresponse>'''
root = ET.fromstring(xml)
data = [{'date': e.find('date').text, 'code': e.find('code').text} for e in root.findall('.//trackingevent')]
df = pd.DataFrame(data)
print(df)
输出
date code
0 2020-10-21T11:04:00+01:00 17
1 2020-10-21T08:41:00+01:00 18
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.