繁体   English   中英

提取多个 xml 属性到 pandas dataframe

[英]Extract multiple xml attributes to pandas dataframe

我有一个基本的 xml 文件,名为 foods.xml,如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<meals name="Sample Text">
    <meal id="1" name="Poached Eggs" type="breakfast"/>
    <meal id="2" name="Club Sandwich" type="lunch"/>
    <meal id="3" name="Steak" type="dinner"/>
    <meal id="4" name="Steak" type="dinner"/>
</meals>

我想将“id”和“name”属性都提取到 dataframe 中。 我可以在指定一列和一个属性时提取一个(例如,仅名称),但似乎无法弄清楚在 for 循环中获取多个属性的语法。 这是我尝试过的,将 id 添加到“df_cols”和“attrib.get”function:

import xml.etree.ElementTree as ET
import pandas as pd
root = ET.parse('meals.xml').getroot()

df_cols = ["id", "name"]
rows = []

for node in root:
    value = node.attrib.get('id', 'name')
    rows.append(value)
 
df = pd.DataFrame(rows, columns = df_cols)   
df

有人可以建议如何做到这一点吗?

以下可能对您有用

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<meals name="Sample Text">
    <meal id="1" name="Poached Eggs" type="breakfast"/>
    <meal id="2" name="Club Sandwich" type="lunch"/>
    <meal id="3" name="Steak" type="dinner"/>
    <meal id="4" name="Steak" type="dinner"/>
</meals>'''
root = ET.fromstring(xml)
data = [{'id': m.attrib['id'], 'name': m.attrib['name']} for m in root.findall('.//meal')]
df = pd.DataFrame(data)
print(df)

output

  id           name
0  1   Poached Eggs
1  2  Club Sandwich
2  3          Steak
3  4          Steak

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM