简体   繁体   中英

unicode to str in python 2.7.3

I have some problems converting from unicode to str in python. To give some context:

$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "αά".decode('utf-8')
u'\u03b1\u03ac'
>>> u"αά".encode('utf-8')
'\xce\xb1\xce\xac'

Now for some stange reason i have a library function which in case of αά gives the string u'\\xce\\xb1\\xce\\xac' and i need to get the string u'\α\ά' and everything i try does not work if I try decode gives me error

>>> u'\xce\xb1\xce\xac'.decode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

So i need a way to make u'xce\\xb1\\xce\\xac' in 'xce\\xb1\\xce\\xac' it does not work with str:

>>> str(u'\xce\xb1\xce\xac')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

Any ideas on how to do it are welcome.

Edited

It appear your input is double-encoded, so you should:

>>> u'\xce\xb1\xce\xac'.encode('raw_unicode_escape').decode('utf8')
u'\u03b1\u03ac'

At first I though it was an issue with your terminal encoding which did not accept to print 'αά'.decode('utf8') ...

See the related post:

Sorry for my mistakes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM