[英]XML to CSV using Python
I have XML file which I want to convert to CSV using Python. 我有要使用Python转换为CSV的XML文件。 I need contents from the
Testitemname
tag as CSV
headers and contents in the Testvalue
tag as values in CSV. 我需要将
Testitemname
标记中的内容作为CSV
标头,并将Testvalue
标记中的内容作为CSV
中的值。 Can some one help me with this? 有人可以帮我弄这个吗?
Sample XML file (input) 样本XML文件(输入)
<sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
<sample:TestData>
<sample:Testitem>
<sample:TestitemName>Field1</sample:TestitemName>
<sample:Testvalue>1</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field2</sample:TestitemName>
<sample:Testvalue>Hi</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field3</sample:TestitemName>
<sample:Testvalue>1234</sample:Testvalue>
</sample:TestData>
<sample:TestData>
<sample:Testitem>
<sample:TestitemName>Field1</sample:TestitemName>
<sample:Testvalue>3</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field2</sample:TestitemName>
<sample:Testvalue>Hello</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field3</sample:TestitemName>
<sample:Testvalue>999</sample:Testvalue>
</sample:TestData>
Desired CSV file (Output) 所需的CSV文件(输出)
Field1,Field2,Filed3 (Header field names)
1,Hi,1234 (1st record)
3,Hello,999 (2nd record)
BeautifulSoup
can be used to parse XML data. BeautifulSoup
可用于解析XML数据。 With well organized data, you just need to loop over the nested tag types and collect the data as you go. 有了组织良好的数据,您只需要遍历嵌套的标记类型并随时收集数据。
Code: 码:
from BeautifulSoup import BeautifulSoup as Soup
def parse_xml(file_like):
data = []
names = []
soup = Soup(file_like)
for batch in soup.findAll('sample:batch'):
for test_data in batch.findAll('sample:testdata'):
item = {}
for test_item in test_data.findAll('sample:testitem'):
name = test_item.find('sample:testitemname').text
value = test_item.find('sample:testvalue').text
item[name] = value
if name not in names:
names.append(name)
data.append(item)
return [names] + [[datum.get(name) for name in names] for datum in data]
Test Code: 测试代码:
data = parse_xml(xml_data)
for datum in data:
print(','.join(datum))
Test Data: 测试数据:
from io import StringIO
xml_data = StringIO(u"""
<sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
<sample:TestData>
<sample:Testitem>
<sample:TestitemName>Field1</sample:TestitemName>
<sample:Testvalue>1</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field2</sample:TestitemName>
<sample:Testvalue>Hi</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field3</sample:TestitemName>
<sample:Testvalue>1234</sample:Testvalue>
</sample:TestData>
<sample:TestData>
<sample:Testitem>
<sample:TestitemName>Field1</sample:TestitemName>
<sample:Testvalue>3</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field2</sample:TestitemName>
<sample:Testvalue>Hello</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field3</sample:TestitemName>
<sample:Testvalue>999</sample:Testvalue>
</sample:TestItem>
</sample:TestData>
</sample:batch>
""")
Results: 结果:
Field1,Field2,Field3
1,Hi,1234
3,Hello,999
Use pyxmlparser 使用pyxmlparser
It is a command line utility to do the same thing! 这是一个命令行实用程序,可以执行相同的操作!
https://pypi.org/project/pyxmlparser/ https://pypi.org/project/pyxmlparser/
Disclaimer: I am the author of the library. 免责声明:我是图书馆的作者。 Since it is new I am more than happy to know if it worked.
由于它是新的,我很高兴知道它是否有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.