简体   繁体   English

在 Python 中将多个嵌套的 XML 解析为 Panda 数据框表

[英]Parse multiple nested XML to Panda Data frame Table in Python

<?xml version='1.0' encoding='UTF-8' ?>
  <DOC>
    <INFO1
      A = "1"
      B = "2"
      C = "3"
    >
      <INFO12
        D = "a"
      >
      </INFO12>
    </INFO1>
    <INFO2
      E = "4"
      F = "5"
      G = "6"
    >
      <INFO21
        H = "b"
      >
      </INFO21>
    </INFO2>
 </DOC>
TestFile="test.xml"
ttree = etree.parse(TestFile)
troot = ttree.getroot()
df_cols =["Col1", "Col2", "Col3", "Col4","Col5","Col6"
              "Col7", "Col8"]
df = pd.DataFrame(columns =df_cols)

for i in troot: 
    df = df.append(pd.Series([i.get('A'), i.get('B'),i.get('C'), i.get('D'),
                                     i.get('E'), i.get('F'),i.get('G'),i.get('H')],
                          index = df_cols), ignore_index=True)
        
df.head()

https://i.stack.imgur.com/pvBnm.png https://i.stack.imgur.com/pvBnm.png

Question: I'm trying to parse XML to a data frame in Python by using the xml.etree.cElementTree library.问题:我正在尝试使用 xml.etree.cElementTree 库将 XML 解析为 Python 中的数据框。 But how to make the result in a single line and including a and b, so will be '1, 2, 3, a, 4, 5, 6, b'.但是如何在一行中生成结果并包括 a 和 b,那么将是 '1, 2, 3, a, 4, 5, 6, b'。 Thank you!谢谢!

Can you use other libraries.你可以使用其他库。

from simplified_scrapy import SimplifiedDoc

html = '''
<?xml version='1.0' encoding='UTF-8' ?>
  <DOC>
    <INFO1
      A = "1"
      B = "2"
      C = "3"
    >
      <INFO12
        D = "a"
      >
      </INFO12>
    </INFO1>
    <INFO2
      E = "4"
      F = "5"
      G = "6"
    >
      <INFO21
        H = "b"
      >
      </INFO21>
    </INFO2>
 </DOC>
'''
doc = SimplifiedDoc(html)
infos = doc.DOC.children
row = [infos[0].A,infos[0].B,infos[0].C,infos[0].child.D,
    infos[1].E,infos[1].F,infos[1].G,infos[1].child.H]
print (row)

Result:结果:

['1', '2', '3', 'a', '4', '5', '6', 'b']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM