简体   繁体   English

python xml-数据帧的多个子节点

[英]python xml - multiple child nodes to dataframe

Hi Guys I have the following XML 嗨,大家好,我有以下XML

<Batch>
<Date></Date>
<Customer>
<CustType>1</CustType>
<CustomerId>123</CustomerId>
<Address>1 abc st</Address>
<Letters>
<Letter>
<LetterId>123456</LetterId>
<LetterDate>1/1/2000</LetterDate>
</Letter>
<LetterId>98765</LetterId>
<Letter>
<LetterId>5675</LetterId>
<LetterDate>1/1/2010</LetterDate>
</Letter>
</Letters>
</Customer>
</Batch>

As you can see, each customer has multiple letters and I need to get all this in a data frame like below. 如您所见,每个客户都有多个字母,我需要将所有这些都放在如下数据框中。

CustomerID | 客户编号| LetterId | LetterId |
123 123456 123 123456
123 5675 123 5675

I have tried pythons element tree libary as follows. 我已经尝试了pythons元素树库如下。

import xml.etree.ElementTree as ETree  
u = open(filename)  
doc = parse(u)  
    for item in doc.iterfind('Customer'):  
        dict1 = {}  
        dict1['address'] = item.findtext('address')

        list2.append(dict1)

But when I try to get each of the letters I can't join each letter back to the parent node data. 但是,当我尝试获取每个字母时,无法将每个字母重新连接到父节点数据。

I can either get the customer data, or all the letters but not both. 我可以获取客户数据,也可以获取所有字母,但不能同时获取两者。

I need to be able to get a duplicate record of each customer as per the number of letters 我需要能够根据字母数获得每个客户的重复记录

thanks. 谢谢。

I have actually worked this out. 我实际上已经解决了。

from lxml import etree
doc = etree.parse(filename)
root = doc.getroot()

list1 = []
for i in range(len(root.xpath('/Customer/Letters/Letter'))):
   dict1 = {}
   record = root.xpath('/Customer/Letters/Letter')[i]
   for ancestor in record.iterancestors('Customer'):
      dict1['LetterID'] = record.findtext('LetterID')
      dict1['CustomerID'] = ancestor.findtext('CustomerId')
      List1.append(dict1)

Hopefully this helps somebody else 希望这可以帮助其他人

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM