繁体   English   中英

将 xml 文件的条目转换为 Pandas 数据框

[英]Convert entries of xml file into pandas dataframe

我有以下 xml 文件。 我想提取所有以 ItemDescription 开头的行并从中创建一个数据框,这样一列包含颜色,另一列包含 ID,另一列包含字母等等。 我怎样才能做到这一点?

我使用xml.etree.ElementTree包进行了尝试,但我无法生成数据xml.etree.ElementTree ,因为我无法访问所需行中的元素。 我不想使用pandas_read_xml因为它只适用于 pip,我猜。 即使我更新了熊猫, pd.read_xml也不起作用。 是否有一种可靠的方法可以使用xml.etree.ElementTree或其他不太花哨的包来实现?

<?xml version="1.0" ?>
<OrderList>
   <ItemDescriptions>
      <ItemDescription Color="rosybrown" ID="0" Letter="a" Type="Letter" Weight="1.67"/>
      <ItemDescription Color="lightcoral" ID="1" Letter="a" Type="Letter" Weight="0.91"/>
      <ItemDescription Color="indiaread" ID="2" Letter="a" Type="Letter" Weight="0.62"/>
      <ItemDescription Color="brown" ID="3" Letter="a" Type="Letter" Weight="2.92"/>
      <ItemDescription Color="firedbrick" ID="4" Letter="a" Type="Letter" Weight="2.34"/>
      <ItemDescription Color="maroon" ID="5" Letter="a" Type="Letter" Weight="0.53"/>
      <ItemDescription Color="darkred" ID="6" Letter="a" Type="Letter" Weight="2.72"/>
   </ItemDescriptions>
   <ItemBundles/>
   <Orders>
      <Order TimeStamp="">
         <Positions>
            <Position Count="1" ItemDescriptionID="9"/>
            <Position Count="1" ItemDescriptionID="18"/>
         </Positions>
      </Order>
      <Order TimeStamp="">
         <Positions>
            <Position Count="2" ItemDescriptionID="9"/>
            <Position Count="1" ItemDescriptionID="12"/>
            <Position Count="2" ItemDescriptionID="14"/>
            <Position Count="1" ItemDescriptionID="18"/>
            <Position Count="1" ItemDescriptionID="16"/>
         </Positions>
      </Order>
   </Orders>
</OrderList>

使用read_xmlxpath

>>> pd.read_xml('data.xml', xpath='./ItemDescriptions/ItemDescription')

        Color  ID Letter    Type  Weight
0   rosybrown   0      a  Letter    1.67
1  lightcoral   1      a  Letter    0.91
2   indiaread   2      a  Letter    0.62
3       brown   3      a  Letter    2.92
4  firedbrick   4      a  Letter    2.34
5      maroon   5      a  Letter    0.53
6     darkred   6      a  Letter    2.72

替代lxml

from lxml import etree

tree = etree.parse('data.xml')

df = pd.DataFrame([dict(elmt.items()) 
                       for elmt in tree.xpath('.//ItemDescription')])

使用 ElementTree(不需要外部库)

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0" ?>
<OrderList>
   <ItemDescriptions>
      <ItemDescription Color="rosybrown" ID="0" Letter="a" Type="Letter" Weight="1.67"/>
      <ItemDescription Color="lightcoral" ID="1" Letter="a" Type="Letter" Weight="0.91"/>
      <ItemDescription Color="indiaread" ID="2" Letter="a" Type="Letter" Weight="0.62"/>
      <ItemDescription Color="brown" ID="3" Letter="a" Type="Letter" Weight="2.92"/>
      <ItemDescription Color="firedbrick" ID="4" Letter="a" Type="Letter" Weight="2.34"/>
      <ItemDescription Color="maroon" ID="5" Letter="a" Type="Letter" Weight="0.53"/>
      <ItemDescription Color="darkred" ID="6" Letter="a" Type="Letter" Weight="2.72"/>
   </ItemDescriptions>
   <ItemBundles/>
   <Orders>
      <Order TimeStamp="">
         <Positions>
            <Position Count="1" ItemDescriptionID="9"/>
            <Position Count="1" ItemDescriptionID="18"/>
         </Positions>
      </Order>
      <Order TimeStamp="">
         <Positions>
            <Position Count="2" ItemDescriptionID="9"/>
            <Position Count="1" ItemDescriptionID="12"/>
            <Position Count="2" ItemDescriptionID="14"/>
            <Position Count="1" ItemDescriptionID="18"/>
            <Position Count="1" ItemDescriptionID="16"/>
         </Positions>
      </Order>
   </Orders>
</OrderList>'''

root = ET.fromstring(xml)
data = [desc.attrib for desc in root.findall('.//ItemDescription')]
df = pd.DataFrame(data)
print(df)

输出

        Color ID Letter    Type Weight
0   rosybrown  0      a  Letter   1.67
1  lightcoral  1      a  Letter   0.91
2   indiaread  2      a  Letter   0.62
3       brown  3      a  Letter   2.92
4  firedbrick  4      a  Letter   2.34
5      maroon  5      a  Letter   0.53
6     darkred  6      a  Letter   2.72

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM