简体   繁体   中英

Python Special Characters Encoding

I have a python script that reads a CSV file and writes in a XML file. I have been hitting a wall trying to find out how to read special characters such as: ç, á, é, í, etc. The script runs perfectly fine without special characters. That is the script header:

# coding=utf-8

'''
@modified by: Julierme Pinheiro
'''
import os
import sys
import unittest
from unittest import skip
import csv
import uuid
import xml
import xml.dom.minidom as minidom
import owslib
from owslib.iso import *
import pyproj
from decimal import *
import logging

The way I retrieve information from the csv file is shown bellow:

# add the title
                title = data[1]
                titleElement = identificationInfo[0].getElementsByTagName('gmd:title')[0]
                titleNode = record.createTextNode(title)
                titleElement.childNodes[1].appendChild(titleNode)
                print "Title:" + title

Note: If data[1], second column in the csv file, contains a special character as found in "Navega çã o" the script fails ( It does not write anything in the xml file ).

The way a new XML file is created based on a blank Template XML is shown bellow:

 # write out the gemini record
                filename = '../output/%s.xml' % fileId
                with open(filename,'w') as test_xml:
                    test_xml.write(record.toprettyxml(newl="", encoding="utf-8"))
            except:
                e = sys.exc_info()[1]
                logging.debug("Import failed for entry %s" % data[0])
                logging.debug("Specific error: %s" % e)

    @skip('')
    def testOWSMetadataImport(self):
        raw_data = []
        with open('../input/metadata_cartapapel.csv') as csvfile:
            reader = csv.reader(csvfile, dialect='excel')
            for columns in reader:
                raw_data.append(columns)   

        md = MD_Metadata(etree.parse('gemini-template.xml'))
        md.identification.topiccategory = ['farming','environment']
        print md.identification.topiccategory
        outfile = open('mdtest.xml','w')
        # crap, can't update the model and write back out - this is badly needed!!
        outfile.write(md.xml) 


if __name__ == "__main__":
    unittest.main()

Could someone help to solve this issue, please?

Thank you in advance for your time.

That's unicode. csv can't read unicode if you are in python 2.7. In python 3.x you can pass the utf-8 option while opening the file.

In python you can decode the data[1] to utf-8 like below.

title = data[1].decode('utf-8')

Some of the windows legacy windows components in english might require the 'cp1252'. If the above decoding fails, try this.

title = data[1].decode('cp1252')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM