Python - Sort XML derived from CSV by original CSV field names

Question

I've derived XML from a CSV file. I'm having an issue ensuring the XML is sorted in the same order as the CSV header. The issue is that DictReader is not maintaining the sorting, however, if I use fieldnames I then run into the issue that 'str' object has no attribute 'items' .

My CSV file has the following content:

FieldA,FieldB,FieldC,FieldD
1,asdf,2,ghjk
3,qwer,4,yuio
5,slslkd,,aldkjslkj

And my Python script is as follows:

    import gzip
    import csv
    from xml.etree.ElementTree import Element, SubElement, tostring

    csv_file = 'Workbook1.csv.gz'

    class GZipCSVReader:
        def __init__(self, filename):
            self.gzfile = gzip.open(filename)
            self.reader = csv.DictReader(self.gzfile)
            self.fieldnames = self.reader.fieldnames

        def next(self):
            return self.reader.next()

        def close(self):
            self.gzfile.close()

        def __iter__(self):
            return self.reader.__iter__()


    def to_xml(r):
        for row in r.fieldnames:
            element = Element('event') # parent element is required
            children = [] # reset the list with each new row

            # Iterate through key:value pairs for each row and create a sub-element
            for (k, v) in row.items():
                if v:
                    sub = SubElement(element, k) # adds the column header as the sub
                    sub.text = v # adds row value as sub-element text

            # Create a list of sub-elements, minus the parent.
            for child in list(element):
                children.append(tostring(child))
            event_data = ''.join(children) # this creates a string of data to be passed to the server
            print (event_data + '\n')
        r.close()

if __name__ == '__main__':
    r = GZipCSVReader(csv_file)
    to_xml(r)

The above code prints out each CSV row as XML SubElements. If you notice, the order of the SubElements is different from the CSV header, and if I try fieldnames I get the error 'str' object has no attribute 'items' . Is there a way around this so I can have the resulting XML in the same order as the CSV header?

Thanks.

Answer 1

You are doing it wrongly , when you do - for row in r.fieldnames - row is actually the fieldname , not elements.

What you need to do is iterate over r , and for each row in r , iterate over the fieldnames, and then get row[fieldname] as value and fieldname as key and create the sub element.

Example function -

def to_xml(r):
        for row in r:
            element = Element('event') # parent element is required
            children = [] # reset the list with each new row

            # Iterate through key:value pairs for each row and create a sub-element
            for k in r.fieldnames:
                v = row[k]
                if v:
                    sub = SubElement(element, k) # adds the column header as the sub
                    sub.text = v # adds row value as sub-element text

            # Create a list of sub-elements, minus the parent.
            for child in list(element):
                children.append(tostring(child))
            event_data = ''.join(children) # this creates a string of data to be passed to the server
            print (event_data + '\n')
        r.close()

Python - Sort XML derived from CSV by original CSV field names

Question

1 answers

solution1
0 ACCPTED 2015-06-25 14:49:08

Python - Sort XML derived from CSV by original CSV field names

Question

1 answers

solution1 0 ACCPTED 2015-06-25 14:49:08

solution1
0 ACCPTED 2015-06-25 14:49:08