Python Conditional XML Writing

Question

I am using Python to convert CSV files to XML format. The CSV files have a varying amount of rows ranging anywhere from 2 (including headers) to infinity. (realistically 10-15 but unless there's some major performance issue, I'd like to cover my bases) In order to convert the files I have the following code:

for row in csvData:
    if rowNum == 0:
        xmlData.write('    <'+csvFile[:-4]+'-1>' + "\n")
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(len(tags)):
            tags[i] = tags[i].replace(' ', '_')
    if rowNum == 1: 
        for i in range(len(tags)):
            xmlData.write('        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-1>' + "\n" + '    <' +csvFile[:-4]+'-2>' + "\n")
    if rowNum == 2:
        for i in range(len(tags)):
            xmlData.write('        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-2>' + "\n")
    if rowNum == 3:
        for i in range(len(tags)):
            xmlData.write('<'+csvFile[:-4]+'-3>' + "\n" + '        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-3>' + "\n")

    rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()

As you can see, I have the upper-level tags set to be created manually if the row exists. Is there a more efficient way to achieve my goal of creating the <csvFile-*></csvFile-*> tags rather than repeating my code 15+ times? Thanks!

Answer 1

I would use xml.etree.ElementTree or lxml.etree to write the XML. xml.etree.ElementTree is in the standard library, but does not have built-in pretty-printing. (You could use the indent function from here , however).

lxml.etree is a third-party module, but it has built-in pretty-printing in its tostring method.

Using lxml.etree, you could do something like this:

import lxml.etree as ET

csvData = [['foo bar', 'baz quux'],['bing bang', 'bim bop', 'bip burp'],]
csvFile = 'rowboat'
name = csvFile[:-4]
root = ET.Element('csv_data')
for num, tags in enumerate(csvData):
    row = ET.SubElement(root, '{f}-{n}'.format(f = name, n = num))
    for text in tags:
        text = text.replace(' ', '_')
        tag = ET.SubElement(row, text)
        tag.text = text

print(ET.tostring(root, pretty_print = True))

yields

<csv_data>
  <row-0>
    <foo_bar>foo_bar</foo_bar>
    <baz_quux>baz_quux</baz_quux>
  </row-0>
  <row-1>
    <bing_bang>bing_bang</bing_bang>
    <bim_bop>bim_bop</bim_bop>
    <bip_burp>bip_burp</bip_burp>
  </row-1>
</csv_data>

Some suggestions:

In Python, almost never do you need to say
```
 for i in range(len(tags)): # do stuff with tags[i] 
```
Instead say
```
 for tag in tags: 
```
to loop over all the items in tags .
Also instead of manually counting the times through a loop with
```
 num = 0 for tags in csvData: num += 1 
```
instead use the enumerate function:
```
 for num, tags in enumerate(csvData): 
```
Strings like
```
 ' ' + '<' + tags[i] + '>' \\ + row[i] + '</' + tags[i] + '>' + "\\n" 
```
are incredibly difficult to read. It mixes together logic of indentation, with the XML syntax of tags, with the minutia of end of line characters. That's where xml.etree.ElementTree or lxml.etree will help you. It will take care of the serialization of the XML for you; all you need to provide is the relationship between the XML elements. The code will be much more readable and easier to maintain.

Python Conditional XML Writing

Question

1 answers

solution1
4 ACCPTED 2012-10-25 16:38:06

Python Conditional XML Writing

Question

1 answers

solution1 4 ACCPTED 2012-10-25 16:38:06

solution1
4 ACCPTED 2012-10-25 16:38:06