python, UnicodeEncodeError, converting unicode to ascii

Question

Firstly, I am pretty new to python, so forgive me for all the n00b stuff. So the application logic in Python goes like this:

I am sending and SQL Select to database and it returns an array of data.
I need to take this data and use it in another SQL insert sentence.

Now the problem is, that SQL query returns me unicode strings. The output from select is something like this:

(u'Abc', u'Lololo', u'Fjordk\xe6r')

So first I was trying to convert it string, but it fails as the third element contains this german 'ae' letter:

for x in data[0]:
    str_data.append(str(x))

I am getting: UnicodeEncodeError: 'ascii' codec can't encode character u'\\xe6' in position 6: ordinal not in range(128)

I can insert unicode straightly to insert also as TypeError occurs. TypeError: coercing to Unicode: need string or buffer, NoneType found

Any ideas?

Answer 1

From my experiences, Python and Unicode are often a problem.

Generally speaking, if you have a Unicode string, you can convert it to a normal string like this:

normal_string = unicode_string.encode('utf-8')

And convert a normal string to a Unicode string like this:

unicode_string = normal_string.decode('utf-8')

Answer 2

The issue here is that str function tries to convert unicode using ascii codepage, and ascii codepage doesn't have mapping for u\\xe6 (æ - char reference here ).

Therefore you need to convert it to some codepage which supports the char. Nowdays the most usual is utf-8 encoding.

>>> x = (u'Abc', u'Lololo', u'Fjordk\xe6r')
>>> print x[2].encode("utf8")
Fjordkær
>>> x[2].encode("utf-8")
'Fjordk\xc3\xa6r'

On the other hand you may try to convert it to cp1252 - Western latin alphabet which supports it:

>>> x[2].encode("cp1252")
'Fjordk\xe6r'

But Eeaster european charset cp1250 doesn't support it:

>>> x[2].encode("cp1250")
...
UnicodeEncodeError: 'charmap' codec can't encode character u'\xe6' in position 6: character maps to <undefined>

The issue with unicode in python is very common, and I would suggest following:

understand what unicode is
understand what utf-8 is (it is not unicode)
understand ascii and other codepages
recommended conversion workflow: input (any cp) -> convert to unicode -> (process) -> output to utf-8

python, UnicodeEncodeError, converting unicode to ascii

Question

2 answers

solution1
7 ACCPTED 2013-05-22 17:27:02

solution2
4 2013-05-22 17:56:33

python, UnicodeEncodeError, converting unicode to ascii

Question

2 answers

solution1 7 ACCPTED 2013-05-22 17:27:02

solution2 4 2013-05-22 17:56:33

solution1
7 ACCPTED 2013-05-22 17:27:02

solution2
4 2013-05-22 17:56:33