简体   繁体   中英

Transform the CSV to XML in python

I have a scenario where the data is extracted from oracle in the form of CSV and then it should be transformed to desired XML format.

Input CSV File:

Id,SubID,Rank,Size
1,123,1,0.1
1,234,2,0.2
2,456,1,0.1
2,123,2,0.2

Expected XML output:

<AA_ITEMS>
<Id ID="1">
    <SubId ID="123">
        <Rank>1</Rank>
        <Size>0.1</Size>
    </SubId>
    <SubId ID="234">
        <Rank>2</Rank>
        <Size>0.2</Size>
    </SubId>
</Id>
<Id ID="2">
    <SubId ID="456">
        <Rank>1</Rank>
        <Size>0.1</Size>
    </SubId>
    <SubId ID="123">
        <Rank>2</Rank>
        <Size>0.2</Size>
    </SubId>
</Id>

Note: The CSV file is a daily load and contains around 150K to 200K records Please assist. Thanks in advance

There are a couple of ways to approach it and though some people dislike building xml from a template, I believe it works best:

from itertools import groupby
from lxml import etree
csv_string = """[your csv ab0ve]
"""

#first deal with the csv
#split it into lines and discard the headers
lines = csv_string.splitlines()[1:]
#group the lines by the first character
grpfunc = lambda x: x[0]
grps = [list(group) for key, group in groupby(lines, grpfunc)]

#now convert the whole thing into xml:
xml_string = """
<AA_ITEMS>
"""
for grp in grps:
    elem = f'  <Id ID="{grp[0][0]}">'
    for g in grp:
        entry = g.split(',')
        #create an entry template:
        id_tmpl = f"""      
        <SubId ID="{entry[1]}">
            <Rank>{entry[2]}</Rank>
            <Size>{entry[3]}</Size>
        </SubId>
"""
        elem+=id_tmpl
    #close elem
    elem+="""</Id>
    """
    xml_string+=elem
#close the xml string
xml_string += """</AA_ITEMS>"""

#finally, show that the output is well formed xml:
print(etree.tostring(etree.fromstring(xml_string)).decode())

The output should be your expected xml.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM