简体   繁体   中英

Python Conditional XML Writing

I am using Python to convert CSV files to XML format. The CSV files have a varying amount of rows ranging anywhere from 2 (including headers) to infinity. (realistically 10-15 but unless there's some major performance issue, I'd like to cover my bases) In order to convert the files I have the following code:

for row in csvData:
    if rowNum == 0:
        xmlData.write('    <'+csvFile[:-4]+'-1>' + "\n")
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(len(tags)):
            tags[i] = tags[i].replace(' ', '_')
    if rowNum == 1: 
        for i in range(len(tags)):
            xmlData.write('        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-1>' + "\n" + '    <' +csvFile[:-4]+'-2>' + "\n")
    if rowNum == 2:
        for i in range(len(tags)):
            xmlData.write('        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-2>' + "\n")
    if rowNum == 3:
        for i in range(len(tags)):
            xmlData.write('<'+csvFile[:-4]+'-3>' + "\n" + '        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-3>' + "\n")

    rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()

As you can see, I have the upper-level tags set to be created manually if the row exists. Is there a more efficient way to achieve my goal of creating the <csvFile-*></csvFile-*> tags rather than repeating my code 15+ times? Thanks!

I would use xml.etree.ElementTree or lxml.etree to write the XML. xml.etree.ElementTree is in the standard library, but does not have built-in pretty-printing. (You could use the indent function from here , however).

lxml.etree is a third-party module, but it has built-in pretty-printing in its tostring method.

Using lxml.etree, you could do something like this:

import lxml.etree as ET

csvData = [['foo bar', 'baz quux'],['bing bang', 'bim bop', 'bip burp'],]
csvFile = 'rowboat'
name = csvFile[:-4]
root = ET.Element('csv_data')
for num, tags in enumerate(csvData):
    row = ET.SubElement(root, '{f}-{n}'.format(f = name, n = num))
    for text in tags:
        text = text.replace(' ', '_')
        tag = ET.SubElement(row, text)
        tag.text = text

print(ET.tostring(root, pretty_print = True))

yields

<csv_data>
  <row-0>
    <foo_bar>foo_bar</foo_bar>
    <baz_quux>baz_quux</baz_quux>
  </row-0>
  <row-1>
    <bing_bang>bing_bang</bing_bang>
    <bim_bop>bim_bop</bim_bop>
    <bip_burp>bip_burp</bip_burp>
  </row-1>
</csv_data>

Some suggestions:

  • In Python, almost never do you need to say

     for i in range(len(tags)): # do stuff with tags[i] 

    Instead say

     for tag in tags: 

    to loop over all the items in tags .

  • Also instead of manually counting the times through a loop with

     num = 0 for tags in csvData: num += 1 

    instead use the enumerate function:

     for num, tags in enumerate(csvData): 
  • Strings like

     ' ' + '<' + tags[i] + '>' \\ + row[i] + '</' + tags[i] + '>' + "\\n" 

    are incredibly difficult to read. It mixes together logic of indentation, with the XML syntax of tags, with the minutia of end of line characters. That's where xml.etree.ElementTree or lxml.etree will help you. It will take care of the serialization of the XML for you; all you need to provide is the relationship between the XML elements. The code will be much more readable and easier to maintain.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM