简体   繁体   English

Python-将具有可变嵌套元素的xml解析到csv中

[英]Python -parse xml with variable nested elements into csv

Desperately need help. 迫切需要帮助。 I am a beginner to Python and have tried for days (and nights) to do this with no success. 我是Python的初学者,已经尝试了数天(和数夜)来做到这一点,但没有成功。 Have large xml file which has elements (ie. accounts) that has subelements (ie. attributes) with variable sub-sub-elements (ie. attributeValue). 具有较大的xml文件,该文件具有的元素(即帐户)具有带有可变子子元素(即attributeValue)的子元素(即属性)。 Since the sub-sub-element is variable I don't know how to get it to drill down as far as it needs to pickup everything and put it into .csv. 由于sub-sub-element是可变的,因此我不知道如何将其细分为需要提取的所有内容并将其放入.csv。 So per account, there could be many records. 因此,每个帐户可能有很多记录。 I want a row with the account id, followed by the attribute name, then attribute value. 我想要一行包含帐户ID,然后是属性名称,然后是属性值的行。 One account could have many rows if they have many attributes. 如果一个帐户具有许多属性,则它们可以具有许多行。

Any help you can provide is much appreciated! 您能提供的任何帮助将不胜感激! :) :)

<?xml version="1.0" encoding="UTF-8"?>
<rbacx>
  <namespace namespaceName="ABC RSS : xxxxxxx" namespaceShortName="RSS" />
  <attributeValues />
  <accounts>
    <account id="AAGALY2">
      <name>AAGALY2</name>
      <endPoint>ABCD</endPoint>
      <domain>ABCD</domain>
      <comments />
      <attributes>  ### one account can have many attribute records
        <attribute name="appUserName">
          <attributeValues>
            <attributeValue>
              <value><![CDATA[A, Agglya]]></value>
            </attributeValue>
          </attributeValues>
        </attribute>
        <attribute name="costCentre">
          <attributeValues>
            <attributeValue>
              <value><![CDATA[6734]]></value>
            </attributeValue>
          </attributeValues>
        </attribute>
        <attribute name="App ID">
          <attributeValues>
            <attributeValue>
              <value><![CDATA[AAGALY2]]></value>
            </attributeValue>
          </attributeValues>
        </attribute>
        <attribute name="Last Access Date">
          <attributeValues>
            <attributeValue>
              <value><![CDATA[00000000]]></value>

etc......

Would like csv to look like this: 想要csv看起来像这样:

AcctName   Endpoint     Domain     AttribName     AttribValue
AAGALY2     ABCD        ABCD       appUserName    A, Agalya
AAGALY2     ABCD        ABCD       CostCentre     333333
AAGALY2     ABCD        ABCD       App ID         AAGALY2
AAGALY2     ABCD        ABCD       Jobtemplate    A12-can read
JSMITH1     EFG         ABCD       appUserName    J, Smith
JSMITH1     ABCD        ABCD       CostCentre     12345
JSMITH1     ABCD        ABCD       Jobtemplate    A22-perm to write
ZZMITH3     EFG         GHI        appUserName    Z, Zmith
ZZMITH3     EFG         GHI        CostCentre     3456

I have found xmltodict to be a really simple way to get through xml parsing if xml etree isn't helping. 如果xml etree没有帮助,我发现xmltodict是通过xml解析的一种非常简单的方法。

So what your code may look like: 因此,您的代码可能如下所示:

import xmltodict
import csv

xmldict = xmltodict.parse(yourxml)

f = csv.writer(open('yourcsv.csv', "w"))

#write field names to file keys of the dict, or you can specify the ones you outlined in your output eg.
f.writerow(xmldict.keys())

#write the contents
for key in xmldict:
    f.writerow(key['attrs'], key['attrs'] etc. etc.)

You will obviously have to map based on the nesting of your xml and access the 'attrs' you want, but it should be quite straight forward through the dict structure. 显然,您将必须基于xml的嵌套进行映射并访问所需的“属性”,但是通过dict结构应该很简单。 Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM