简体   繁体   English

如何从嵌套的 xml 创建 pandas DataFrame

[英]How to create pandas DataFrame from nested xml

I am trying to create Pandas dataframe out of XML.我正在尝试用 XML 创建 Pandas 数据框。 The XML looks like this: XML 如下所示:

<?xml version="1.0" encoding="utf-8"?>
<Products>
    <Info>
        <Msg>Shop items.</Msg>
    </Info>
    <shop shopNr="01">
        <ItemNr>1001</ItemNr>
        <ItemNr>1002</ItemNr>
        <ItemNr>1003</ItemNr>
        <ItemNr>1004</ItemNr>
                <ItemNr>1010</ItemNr>
    </shop>
    <shop shopNr="02">
        <ItemNr>1002</ItemNr>
        <ItemNr>1006</ItemNr>
        <ItemNr>1005</ItemNr>
    </shop>
    <shop shopNr="03">
        <ItemNr>1009</ItemNr>
        <ItemNr>1006</ItemNr>
        <ItemNr>1005</ItemNr>
        <ItemNr>1002</ItemNr>
    </shop>
</Products>

I have tried using XML Etree as the code below.我试过使用 XML Etree 作为下面的代码。 I have two problems.我有两个问题。

First, I can not get the values of ItemNr as children of the root.首先,我无法将 ItemNr 的值作为根的子项。 Instead of getting the value, ie.而不是获得价值,即。 1001, I get 1001,我明白了

<Element 'ItemNr' at 0x000001E2D6C41B38>.

The second problem is when I am creating a dataframe out of the lists.第二个问题是当我从列表中创建数据框时。 I end up having a list of lists in items.我最终得到了项目列表的列表。 Although the result is empty now since I could not get the values above, I want to end up with a flattened list.虽然由于我无法获得上面的值,现在结果是空的,但我想以一个扁平化的列表结束。

import xml.etree.ElementTree as ET
import pandas as pd
data = 'example_shops.xml'
tree = ET.parse(data)
root = tree.getroot()

shops = []
items = []
for node in root.iter('shop'):
    shops.append(node.attrib.get('shopNr'))
    items.append(list(node))

d = {'shops': shops, 'items': items}
df = pd.DataFrame(d)

The DataFrame produced.生成的 DataFrame。

 shops                 items
0    01  [[], [], [], [], []]
1    02          [[], [], []]
2    03      [[], [], [], []]

desired output is:期望的输出是:


 shops                 items
0    01  [1001, 1002, 1003, 1004, 1010]
1    02          [1002, 1006, 1005]
2    03      [1009, 1006, 1005, 1002]

You want to append the text values from the ItemNr elements which are under the shop element to the items list and not the xml Element python object which is what you were doing.您想要将shop元素下的ItemNr元素的文本值附加到项目列表,而不是您正在执行的 xml Element python 对象。

The following code was working for me:以下代码对我有用:

items.append([item_nr_element.text for item_nr_element in node.getchildren()])

I hope this is the expected output:我希望这是预期的输出:

import xml.etree.ElementTree as ET
import pandas as pd
data = 'example_shops.xml'
tree = ET.parse(data)
root = tree.getroot()
shops_items = []
all_shops_items = []
for ashop in root.iter('shop'):
    items = []
    shop_Nr = ashop.attrib.get('shopNr')
    for anitem in ashop.iter('ItemNr'):
        items.append(anitem.text)
    shops_items = [shop_Nr,items]
    all_shops_items.append(shops_items)
df = pd.DataFrame(all_shops_items,columns=['SHOP_NUMBER','ITEM_NUMBER'])        
print(df)

Output:输出:

  SHOP_NUMBER                     ITEM_NUMBER
0          01  [1001, 1002, 1003, 1004, 1010]
1          02              [1002, 1006, 1005]
2          03        [1009, 1006, 1005, 1002]

If you want shops with individual items:如果你想要商店有个别物品:

import xml.etree.ElementTree as ET
import pandas as pd
data = 'example_shops.xml'
tree = ET.parse(data)
root = tree.getroot()
shops_items = []
all_shops_items = []
for ashop in root.iter('shop'):
    shop_Nr = ashop.attrib.get('shopNr')
    for anitem in ashop.iter('ItemNr'):
        item_Nr = anitem.text
        shops_items = [shop_Nr,item_Nr]
        all_shops_items.append(shops_items)
df = pd.DataFrame(all_shops_items,columns=['SHOP_NUMBER','ITEM_NUMBER'])        
print(df)

output:输出:

   SHOP_NUMBER ITEM_NUMBER
0           01        1001
1           01        1002
2           01        1003
3           01        1004
4           01        1010
5           02        1002
6           02        1006
7           02        1005
8           03        1009
9           03        1006
10          03        1005
11          03        1002

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM