繁体   English   中英

如何使用 Pandas 将 xml 转换为 dataFrame

[英]How to convert xml to dataFrame with pandas

我是新来的熊猫,我刚刚开始我的代码学习。 拜托,如果你能帮助我,那就太好了。 我有一个像这样的简单 XML,我想将它转换为带有 Pandas 的数据帧

   <products_availability date="2020-01-24 06:32" >
    <region id="122">
        <products count="45453242">
            <product id="1000001">0</product>
            <product id="1000002">5</product>
            <product id="1000003">3</product>
        </products>
   </region>
   </products_availability>`

我使用了一些代码,但无论如何它对我没有帮助:

    import pandas as pd
    import xml.etree.ElementTree as et
    xtree = et.parse("file.xml")
    xroot = xtree.getroot()
    df_cols = ["product"]
    rows = []
    for node in xroot:
    s_product = node.attrib.get("product")
    rows.append({"name":  s_product
                 })
    out_df = pd.DataFrame(rows, columns = df_cols)

在此处输入图片说明

如果不需要按区域分类,可以使用findall方法或iterfind方法查找所有匹配的子元素:

 <products_availability date="2020-01-24 06:32" > <region id="122"> <products count="45453242"> <product id="1000001">0</product> <product id="1000002">5</product> <product id="1000003">3</product> </products> </region> <region id="133"> <products count="45453242"> <product id="1000004">7</product> <product id="1000005">3</product> <product id="1000006">1</product> </products> </region> </products_availability>
import pandas as pd
import xml.etree.ElementTree as et


columns = ["product", "products_availability"]
xtree = et.parse("file.xml")
products = ((p.get("id"), p.text) for p in xtree.iterfind(".//product"))
out_df = pd.DataFrame(products, columns=columns)
 >>> out_df product products_availability 0 1000001 0 1 1000002 5 2 1000003 3 3 1000004 7 4 1000005 3 5 1000006 1

如果需要该区域,只需:

import pandas as pd
import xml.etree.ElementTree as et


columns = ["product", "products_availability", "region"]
xtree = et.parse("file.xml")
prds = ((p.get("id"), p.text, r.get("id")) for r in xtree.iterfind(".//region")
            for p in r.iterfind(".//product")
            )

out_df = pd.DataFrame(prds, columns=columns)
 >>> out_df product products_availability region 0 1000001 0 122 1 1000002 5 122 2 1000003 3 122 3 1000004 7 133 4 1000005 3 133 5 1000006 1 133

下面的代码将 xml (region,products,product) 扁平化为单个记录。

import xml.etree.ElementTree as ET

import pandas as pd

xml = '''<products_availability date="2020-01-24 06:32" >
    <region id="122">
        <products count="45453242">
            <product id="1000001">0</product>
            <product id="1000002">5</product>
            <product id="1000003">3</product>
        </products>
    </region>
    <region id="133">
        <products count="45453277">
            <product id="1000004">7</product>
            <product id="1000005">3</product>
            <product id="1000006">1</product>
        </products>
    </region>
</products_availability>'''

data = []
root = ET.fromstring(xml)
regions = root.findall('.//region')
for region in regions:
    region_id = region.attrib['id']
    products_count = region.find('./products').attrib['count']
    for product in region.findall('.//product'):
        entry = {'region_id': region_id, 'products_count': products_count,
                 'product_id': product.attrib['id'], 'number': product.text}
        data.append(entry)
df = pd.DataFrame(data)
print(df)

输出

  region_id products_count product_id number
0       122       45453242    1000001      0
1       122       45453242    1000002      5
2       122       45453242    1000003      3
3       133       45453277    1000004      7
4       133       45453277    1000005      3
5       133       45453277    1000006      1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM