简体   繁体   中英

Python2.7 print unicode string still getting UnicodeEncodeError: 'ascii' codec can't encode character … ordinal not in range(128)

A simple print function

def TODO(message):
    print(type(message))
    print(u'\n~*~ TODO ~*~ \n %s\n     ~*~\n' % message)

called like this

TODO(u'api servisleri için input check decorator gerekiyor')

results in this error

<type 'unicode'>                                                                                 
Traceback (most recent call last):                                                               
  File "/srv/www/proj/__init__.py", line 38, in <module>                                      
    TODO(u'api servisleri için input check decorator gerekiyor')                                 
  File "/srv/www/proj/helpers/utils.py", line 33, in TODO                                     
    print(u'\n~*~ TODO ~*~ \n %s\n     ~*~\n' % message)                                         
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 32: ordinal not in range(128)

But it works in ipython console

In [10]: TODO(u'api servisleri için input check decorator gerekiyor')
<type 'unicode'>

~*~ TODO ~*~ 
 api servisleri için input check decorator gerekiyor
     ~*~

This works with python 2.7.12 but fails somehow with 2.7.9.

What is it that am i doing wrong here?

Edit: function fails when called in a flask application, works in python console.

Different terminals (and GUIs) allow different encodings. I don't have a recent ipython handy, but it is apparently able to handle the non-ASCII 0xe7 character ( 'ç' ) in your string. Your normal console, however, is using the 'ascii' encoding (mentioned by name in the exception), which can't display any bytes greater than 0x7f .

If you want to print non-ASCII strings to an ASCII console, you'll have to decide what to do with the characters it can't display. The str.encode method offers several options:

str.encode([encoding[, errors]])

errors may be given to set a different error handling scheme. The default for errors is 'strict' , meaning that encoding errors raise a UnicodeError . Other possible values are 'ignore' , 'replace' , 'xmlcharrefreplace' , 'backslashreplace' and any other name registered via codecs.register_error() , see section Codec Base Classes .

Here's an example that uses each of those four alternative error-handlers on your string (without the extra decoration added by TODO ):

#!/usr/bin/env python2
# -*- coding: utf-8 -*-

from __future__ import print_function

uni = u'api servisleri için input check decorator gerekiyor'
handlers = ['ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace']
for handler in handlers:
    print(handler + ':')
    print(uni.encode('ascii', handler))
    print()

The output:

ignore:
api servisleri iin input check decorator gerekiyor

replace:
api servisleri i?in input check decorator gerekiyor

xmlcharrefreplace:
api servisleri i&#231;in input check decorator gerekiyor

backslashreplace:
api servisleri i\xe7in input check decorator gerekiyor

Which one of those outputs comes closest to what you want is for you to decide.

For more information, see the Python 2 " Unicode HOWTO ", and Ned Batchelder's " Pragmatic Unicode, or, How Do I Stop the Pain? ", also available as a 36 minute video from PyCon US 2012 .

Edit : ...or, as you seem to have discovered, your terminal can display Unicode just fine, but your default encoding is nevertheless set to 'ascii' , which is more restrictive than it needs to be.

\\xe7

One of the utf-8 character that represents small 'ç'. Python 2.7.9 probably encode with ASCII. You can run the code below in any version of Python that represents Python 2.7.9's behaviour.

import sys; 
# -*- coding: utf-8 -*-

def TODO(message):
    print(type(message))
    print(u'\n~*~ TODO ~*~ \n %s\n     ~*~\n' % message)

message = u'api servisleri için input check decorator gerekiyor'
encodedMessage = message.encode('ascii')

print(sys.stdout.encoding)
TODO(encodedMessage)

It will throw the exception

Traceback (most recent call last): File "test.py", line 9, in encodedMessage = message.encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode character '\\xe7' in position 16: ordinal not in range(128)

So, issue is related with interpreter's encoding rules. You can encode on your own or ignore.

Hope it will be useful

Apparently, print function is a bit different from the print statement.

https://docs.python.org/2.7/library/functions.html#print

All non-keyword arguments are converted to strings like 
str() does and written to the stream, separated by sep 
and followed by end. 

Simply, encoding the unicode string solved it

msg = u'\n~*~ TODO ~*~ \n %s\n     ~*~\n' % message
print(msg.encode("utf-8"))

Still, not sure why it works with 2.7.12, maybe a locale thing?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM