[英]Parse multiple nested XML to Panda Data frame Table in Python
<?xml version='1.0' encoding='UTF-8' ?>
<DOC>
<INFO1
A = "1"
B = "2"
C = "3"
>
<INFO12
D = "a"
>
</INFO12>
</INFO1>
<INFO2
E = "4"
F = "5"
G = "6"
>
<INFO21
H = "b"
>
</INFO21>
</INFO2>
</DOC>
TestFile="test.xml"
ttree = etree.parse(TestFile)
troot = ttree.getroot()
df_cols =["Col1", "Col2", "Col3", "Col4","Col5","Col6"
"Col7", "Col8"]
df = pd.DataFrame(columns =df_cols)
for i in troot:
df = df.append(pd.Series([i.get('A'), i.get('B'),i.get('C'), i.get('D'),
i.get('E'), i.get('F'),i.get('G'),i.get('H')],
index = df_cols), ignore_index=True)
df.head()
https://i.stack.imgur.com/pvBnm.png https://i.stack.imgur.com/pvBnm.png
Question: I'm trying to parse XML to a data frame in Python by using the xml.etree.cElementTree library.问题:我正在尝试使用 xml.etree.cElementTree 库将 XML 解析为 Python 中的数据框。 But how to make the result in a single line and including a and b, so will be '1, 2, 3, a, 4, 5, 6, b'.但是如何在一行中生成结果并包括 a 和 b,那么将是 '1, 2, 3, a, 4, 5, 6, b'。 Thank you!谢谢!
Can you use other libraries.你可以使用其他库。
from simplified_scrapy import SimplifiedDoc
html = '''
<?xml version='1.0' encoding='UTF-8' ?>
<DOC>
<INFO1
A = "1"
B = "2"
C = "3"
>
<INFO12
D = "a"
>
</INFO12>
</INFO1>
<INFO2
E = "4"
F = "5"
G = "6"
>
<INFO21
H = "b"
>
</INFO21>
</INFO2>
</DOC>
'''
doc = SimplifiedDoc(html)
infos = doc.DOC.children
row = [infos[0].A,infos[0].B,infos[0].C,infos[0].child.D,
infos[1].E,infos[1].F,infos[1].G,infos[1].child.H]
print (row)
Result:结果:
['1', '2', '3', 'a', '4', '5', '6', 'b']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.