A simple print function
def TODO(message):
print(type(message))
print(u'\n~*~ TODO ~*~ \n %s\n ~*~\n' % message)
called like this
TODO(u'api servisleri için input check decorator gerekiyor')
results in this error
<type 'unicode'>
Traceback (most recent call last):
File "/srv/www/proj/__init__.py", line 38, in <module>
TODO(u'api servisleri için input check decorator gerekiyor')
File "/srv/www/proj/helpers/utils.py", line 33, in TODO
print(u'\n~*~ TODO ~*~ \n %s\n ~*~\n' % message)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 32: ordinal not in range(128)
But it works in ipython console
In [10]: TODO(u'api servisleri için input check decorator gerekiyor')
<type 'unicode'>
~*~ TODO ~*~
api servisleri için input check decorator gerekiyor
~*~
This works with python 2.7.12 but fails somehow with 2.7.9.
What is it that am i doing wrong here?
Edit: function fails when called in a flask application, works in python console.
Different terminals (and GUIs) allow different encodings. I don't have a recent ipython handy, but it is apparently able to handle the non-ASCII 0xe7
character ( 'ç'
) in your string. Your normal console, however, is using the 'ascii'
encoding (mentioned by name in the exception), which can't display any bytes greater than 0x7f
.
If you want to print non-ASCII strings to an ASCII console, you'll have to decide what to do with the characters it can't display. The str.encode
method offers several options:
str.encode([encoding[, errors]])
errors
may be given to set a different error handling scheme. The default forerrors
is'strict'
, meaning that encoding errors raise aUnicodeError
. Other possible values are'ignore'
,'replace'
,'xmlcharrefreplace'
,'backslashreplace'
and any other name registered viacodecs.register_error()
, see section Codec Base Classes .
Here's an example that uses each of those four alternative error-handlers on your string (without the extra decoration added by TODO
):
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
from __future__ import print_function
uni = u'api servisleri için input check decorator gerekiyor'
handlers = ['ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace']
for handler in handlers:
print(handler + ':')
print(uni.encode('ascii', handler))
print()
The output:
ignore:
api servisleri iin input check decorator gerekiyor
replace:
api servisleri i?in input check decorator gerekiyor
xmlcharrefreplace:
api servisleri için input check decorator gerekiyor
backslashreplace:
api servisleri i\xe7in input check decorator gerekiyor
Which one of those outputs comes closest to what you want is for you to decide.
For more information, see the Python 2 " Unicode HOWTO ", and Ned Batchelder's " Pragmatic Unicode, or, How Do I Stop the Pain? ", also available as a 36 minute video from PyCon US 2012 .
Edit : ...or, as you seem to have discovered, your terminal can display Unicode just fine, but your default encoding is nevertheless set to 'ascii'
, which is more restrictive than it needs to be.
\\xe7
One of the utf-8 character that represents small 'ç'. Python 2.7.9 probably encode with ASCII. You can run the code below in any version of Python that represents Python 2.7.9's behaviour.
import sys;
# -*- coding: utf-8 -*-
def TODO(message):
print(type(message))
print(u'\n~*~ TODO ~*~ \n %s\n ~*~\n' % message)
message = u'api servisleri için input check decorator gerekiyor'
encodedMessage = message.encode('ascii')
print(sys.stdout.encoding)
TODO(encodedMessage)
It will throw the exception
Traceback (most recent call last): File "test.py", line 9, in encodedMessage = message.encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode character '\\xe7' in position 16: ordinal not in range(128)
So, issue is related with interpreter's encoding rules. You can encode on your own or ignore.
Hope it will be useful
Apparently, print function is a bit different from the print statement.
https://docs.python.org/2.7/library/functions.html#print
All non-keyword arguments are converted to strings like
str() does and written to the stream, separated by sep
and followed by end.
Simply, encoding the unicode string solved it
msg = u'\n~*~ TODO ~*~ \n %s\n ~*~\n' % message
print(msg.encode("utf-8"))
Still, not sure why it works with 2.7.12, maybe a locale thing?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.