Python / Mako : How to get unicode strings/characters parsed correctly?

Question

I'm trying to get Mako render some string with unicode characters :

tempLook=TemplateLookup(..., default_filters=[], input_encoding='utf8',output_encoding='utf-8', encoding_errors='replace')
...
print sys.stdout.encoding
uname=cherrypy.session['userName']
print uname
kwargs['_toshow']=uname
...
return tempLook.get_template(page).render(**kwargs)

The related template file :

...${_toshow}...

And the output is :

UTF-8
Deşghfkskhü
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)

I don't think there's any problem with the string itself since I can print it just fine.

Altough I've played (a lot) with input/output_encoding and default_filters parameters, it always complains about being unable to decode/encode with ascii codec.

So I decided to try out the example found on the documentation , and the following works the "best" :

input_encoding='utf-8', output_encoding='utf-8'
#(note : it still raised an error without output_encoding, despite tutorial not implying it)

With

${u"voix m’a réveillé."}

And the result being

voix mâ�a rÃ©veillÃ©

I simply don't get why this doesn't work. "Magic encoding comment"s don't work either. All the files are encoded with UTF-8.

I've spent hours to no avail, am I missing something ?

~~Update :~~

~~I have a simpler question now :~~

~~Now that all the variables are unicode, how can I get Mako to render unicode strings without applying anything ?~~ ~~Passing a blank filter / render_unicode() doesn't help.~~

Answer 1

Yes, UTF-8 != Unicode.

UTF-8 is a specifc string encoding, as are ASCII and ISO 8859-1. Try this:

For any input string do a inputstring.decode('utf-8') (or whatever input encoding you get). For any output string do a outputstring.encode('utf-8') (or whatever output encoding you want). For any internal use, take unicode strings ( 'this is a normal string'.decode('utf-8') == u'this is a normal string' )

'foo' is a string, u'foo' is a unicode string, which doesn't "have" an encoding (can't be decoded). SO anytime python want to change an encoding of a normal string, it first tries to "decode" it, the to "encode" it. And the default is "ascii", which fails more often than not :-)

Python / Mako : How to get unicode strings/characters parsed correctly?

Question

1 answers

solution1
3 ACCPTED 2010-09-19 01:52:12

Python / Mako : How to get unicode strings/characters parsed correctly?

Question

1 answers

solution1 3 ACCPTED 2010-09-19 01:52:12

solution1
3 ACCPTED 2010-09-19 01:52:12