I have some problems converting from unicode to str in python. To give some context:
$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "αά".decode('utf-8')
u'\u03b1\u03ac'
>>> u"αά".encode('utf-8')
'\xce\xb1\xce\xac'
Now for some stange reason i have a library function which in case of αά gives the string u'\\xce\\xb1\\xce\\xac' and i need to get the string u'\α\ά' and everything i try does not work if I try decode gives me error
>>> u'\xce\xb1\xce\xac'.decode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
So i need a way to make u'xce\\xb1\\xce\\xac' in 'xce\\xb1\\xce\\xac' it does not work with str:
>>> str(u'\xce\xb1\xce\xac')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
Any ideas on how to do it are welcome.
It appear your input is double-encoded, so you should:
>>> u'\xce\xb1\xce\xac'.encode('raw_unicode_escape').decode('utf8')
u'\u03b1\u03ac'
At first I though it was an issue with your terminal encoding which did not accept to print 'αά'.decode('utf8')
...
See the related post:
Sorry for my mistakes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.