I have a scenario where the data is extracted from oracle in the form of CSV and then it should be transformed to desired XML format.
Id,SubID,Rank,Size
1,123,1,0.1
1,234,2,0.2
2,456,1,0.1
2,123,2,0.2
<AA_ITEMS>
<Id ID="1">
<SubId ID="123">
<Rank>1</Rank>
<Size>0.1</Size>
</SubId>
<SubId ID="234">
<Rank>2</Rank>
<Size>0.2</Size>
</SubId>
</Id>
<Id ID="2">
<SubId ID="456">
<Rank>1</Rank>
<Size>0.1</Size>
</SubId>
<SubId ID="123">
<Rank>2</Rank>
<Size>0.2</Size>
</SubId>
</Id>
Note: The CSV file is a daily load and contains around 150K to 200K records Please assist. Thanks in advance
There are a couple of ways to approach it and though some people dislike building xml from a template, I believe it works best:
from itertools import groupby
from lxml import etree
csv_string = """[your csv ab0ve]
"""
#first deal with the csv
#split it into lines and discard the headers
lines = csv_string.splitlines()[1:]
#group the lines by the first character
grpfunc = lambda x: x[0]
grps = [list(group) for key, group in groupby(lines, grpfunc)]
#now convert the whole thing into xml:
xml_string = """
<AA_ITEMS>
"""
for grp in grps:
elem = f' <Id ID="{grp[0][0]}">'
for g in grp:
entry = g.split(',')
#create an entry template:
id_tmpl = f"""
<SubId ID="{entry[1]}">
<Rank>{entry[2]}</Rank>
<Size>{entry[3]}</Size>
</SubId>
"""
elem+=id_tmpl
#close elem
elem+="""</Id>
"""
xml_string+=elem
#close the xml string
xml_string += """</AA_ITEMS>"""
#finally, show that the output is well formed xml:
print(etree.tostring(etree.fromstring(xml_string)).decode())
The output should be your expected xml.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.