[英]python xml - multiple child nodes to dataframe
Hi Guys I have the following XML 嗨,大家好,我有以下XML
<Batch>
<Date></Date>
<Customer>
<CustType>1</CustType>
<CustomerId>123</CustomerId>
<Address>1 abc st</Address>
<Letters>
<Letter>
<LetterId>123456</LetterId>
<LetterDate>1/1/2000</LetterDate>
</Letter>
<LetterId>98765</LetterId>
<Letter>
<LetterId>5675</LetterId>
<LetterDate>1/1/2010</LetterDate>
</Letter>
</Letters>
</Customer>
</Batch>
As you can see, each customer has multiple letters and I need to get all this in a data frame like below. 如您所见,每个客户都有多个字母,我需要将所有这些都放在如下数据框中。
CustomerID | 客户编号| LetterId |
LetterId |
123 123456 123 123456
123 5675 123 5675
I have tried pythons element tree libary as follows. 我已经尝试了pythons元素树库如下。
import xml.etree.ElementTree as ETree
u = open(filename)
doc = parse(u)
for item in doc.iterfind('Customer'):
dict1 = {}
dict1['address'] = item.findtext('address')
list2.append(dict1)
But when I try to get each of the letters I can't join each letter back to the parent node data. 但是,当我尝试获取每个字母时,无法将每个字母重新连接到父节点数据。
I can either get the customer data, or all the letters but not both. 我可以获取客户数据,也可以获取所有字母,但不能同时获取两者。
I need to be able to get a duplicate record of each customer as per the number of letters 我需要能够根据字母数获得每个客户的重复记录
thanks. 谢谢。
I have actually worked this out. 我实际上已经解决了。
from lxml import etree
doc = etree.parse(filename)
root = doc.getroot()
list1 = []
for i in range(len(root.xpath('/Customer/Letters/Letter'))):
dict1 = {}
record = root.xpath('/Customer/Letters/Letter')[i]
for ancestor in record.iterancestors('Customer'):
dict1['LetterID'] = record.findtext('LetterID')
dict1['CustomerID'] = ancestor.findtext('CustomerId')
List1.append(dict1)
Hopefully this helps somebody else 希望这可以帮助其他人
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.