简体   繁体   English

使用Python将XML转换为CSV

[英]XML to CSV using Python

I have XML file which I want to convert to CSV using Python. 我有要使用Python转换为CSV的XML文件。 I need contents from the Testitemname tag as CSV headers and contents in the Testvalue tag as values in CSV. 我需要将Testitemname标记中的内容作为CSV标头,并将Testvalue标记中的内容作为CSV中的值。 Can some one help me with this? 有人可以帮我弄这个吗?

Sample XML file (input) 样本XML文件(输入)

<sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
    <sample:TestData>
        <sample:Testitem>
            <sample:TestitemName>Field1</sample:TestitemName>
            <sample:Testvalue>1</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field2</sample:TestitemName>
            <sample:Testvalue>Hi</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field3</sample:TestitemName>
            <sample:Testvalue>1234</sample:Testvalue>
        </sample:TestData>
        <sample:TestData>
        <sample:Testitem>
            <sample:TestitemName>Field1</sample:TestitemName>
            <sample:Testvalue>3</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field2</sample:TestitemName>
            <sample:Testvalue>Hello</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field3</sample:TestitemName>
            <sample:Testvalue>999</sample:Testvalue>
        </sample:TestData>

Desired CSV file (Output) 所需的CSV文件(输出)

Field1,Field2,Filed3 (Header field names)
1,Hi,1234 (1st record)
3,Hello,999 (2nd record)

BeautifulSoup can be used to parse XML data. BeautifulSoup可用于解析XML数据。 With well organized data, you just need to loop over the nested tag types and collect the data as you go. 有了组织良好的数据,您只需要遍历嵌套的标记类型并随时收集数据。

Code: 码:

from BeautifulSoup import BeautifulSoup as Soup

def parse_xml(file_like):
    data = []
    names = []
    soup = Soup(file_like)
    for batch in soup.findAll('sample:batch'):
        for test_data in batch.findAll('sample:testdata'):
            item = {}
            for test_item in test_data.findAll('sample:testitem'):
                name = test_item.find('sample:testitemname').text
                value = test_item.find('sample:testvalue').text
                item[name] = value
                if name not in names:
                    names.append(name)
            data.append(item)

    return [names] + [[datum.get(name) for name in names] for datum in data]

Test Code: 测试代码:

data = parse_xml(xml_data)
for datum in data:
    print(','.join(datum))

Test Data: 测试数据:

from io import StringIO
xml_data = StringIO(u"""
    <sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
        <sample:TestData>
            <sample:Testitem>
                <sample:TestitemName>Field1</sample:TestitemName>
                <sample:Testvalue>1</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field2</sample:TestitemName>
                <sample:Testvalue>Hi</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field3</sample:TestitemName>
                <sample:Testvalue>1234</sample:Testvalue>
        </sample:TestData>
        <sample:TestData>
            <sample:Testitem>
                <sample:TestitemName>Field1</sample:TestitemName>
                <sample:Testvalue>3</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field2</sample:TestitemName>
                <sample:Testvalue>Hello</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field3</sample:TestitemName>
                <sample:Testvalue>999</sample:Testvalue>
            </sample:TestItem>
        </sample:TestData>
    </sample:batch>
""")

Results: 结果:

Field1,Field2,Field3
1,Hi,1234
3,Hello,999

Use pyxmlparser 使用pyxmlparser

It is a command line utility to do the same thing! 这是一个命令行实用程序,可以执行相同的操作!

https://pypi.org/project/pyxmlparser/ https://pypi.org/project/pyxmlparser/

Disclaimer: I am the author of the library. 免责声明:我是图书馆的作者。 Since it is new I am more than happy to know if it worked. 由于它是新的,我很高兴知道它是否有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM