[英]Python -parse xml with variable nested elements into csv
Desperately need help. 迫切需要帮助。 I am a beginner to Python and have tried for days (and nights) to do this with no success. 我是Python的初学者,已经尝试了数天(和数夜)来做到这一点,但没有成功。 Have large xml file which has elements (ie. accounts) that has subelements (ie. attributes) with variable sub-sub-elements (ie. attributeValue). 具有较大的xml文件,该文件具有的元素(即帐户)具有带有可变子子元素(即attributeValue)的子元素(即属性)。 Since the sub-sub-element is variable I don't know how to get it to drill down as far as it needs to pickup everything and put it into .csv. 由于sub-sub-element是可变的,因此我不知道如何将其细分为需要提取的所有内容并将其放入.csv。 So per account, there could be many records. 因此,每个帐户可能有很多记录。 I want a row with the account id, followed by the attribute name, then attribute value. 我想要一行包含帐户ID,然后是属性名称,然后是属性值的行。 One account could have many rows if they have many attributes. 如果一个帐户具有许多属性,则它们可以具有许多行。
Any help you can provide is much appreciated! 您能提供的任何帮助将不胜感激! :) :)
<?xml version="1.0" encoding="UTF-8"?>
<rbacx>
<namespace namespaceName="ABC RSS : xxxxxxx" namespaceShortName="RSS" />
<attributeValues />
<accounts>
<account id="AAGALY2">
<name>AAGALY2</name>
<endPoint>ABCD</endPoint>
<domain>ABCD</domain>
<comments />
<attributes> ### one account can have many attribute records
<attribute name="appUserName">
<attributeValues>
<attributeValue>
<value><![CDATA[A, Agglya]]></value>
</attributeValue>
</attributeValues>
</attribute>
<attribute name="costCentre">
<attributeValues>
<attributeValue>
<value><![CDATA[6734]]></value>
</attributeValue>
</attributeValues>
</attribute>
<attribute name="App ID">
<attributeValues>
<attributeValue>
<value><![CDATA[AAGALY2]]></value>
</attributeValue>
</attributeValues>
</attribute>
<attribute name="Last Access Date">
<attributeValues>
<attributeValue>
<value><![CDATA[00000000]]></value>
etc......
Would like csv to look like this: 想要csv看起来像这样:
AcctName Endpoint Domain AttribName AttribValue
AAGALY2 ABCD ABCD appUserName A, Agalya
AAGALY2 ABCD ABCD CostCentre 333333
AAGALY2 ABCD ABCD App ID AAGALY2
AAGALY2 ABCD ABCD Jobtemplate A12-can read
JSMITH1 EFG ABCD appUserName J, Smith
JSMITH1 ABCD ABCD CostCentre 12345
JSMITH1 ABCD ABCD Jobtemplate A22-perm to write
ZZMITH3 EFG GHI appUserName Z, Zmith
ZZMITH3 EFG GHI CostCentre 3456
I have found xmltodict to be a really simple way to get through xml parsing if xml etree isn't helping. 如果xml etree没有帮助,我发现xmltodict是通过xml解析的一种非常简单的方法。
So what your code may look like: 因此,您的代码可能如下所示:
import xmltodict
import csv
xmldict = xmltodict.parse(yourxml)
f = csv.writer(open('yourcsv.csv', "w"))
#write field names to file keys of the dict, or you can specify the ones you outlined in your output eg.
f.writerow(xmldict.keys())
#write the contents
for key in xmldict:
f.writerow(key['attrs'], key['attrs'] etc. etc.)
You will obviously have to map based on the nesting of your xml and access the 'attrs' you want, but it should be quite straight forward through the dict structure. 显然,您将必须基于xml的嵌套进行映射并访问所需的“属性”,但是通过dict结构应该很简单。 Hope this helps! 希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.