简体   繁体   中英

Convert str to unicode str

I need to convert a str to text in Python 2.7

a = u'"\u0274\u1d1c\u0274\u1d04\u1d00 \u1d00\u028f\u1d1c\u1d05\u1d07s \u1d00 \u1d1c\u0274 \u0274\u026a\xf1\u1d0f \u1d0f \u1d1c\u0274\u1d00 \u0274\u026a\xf1\u1d00 \u1d04\u1d0f\u0274 \u1d1c\u0274\u1d00 \u1d1b\u1d00\u0280\u1d07\u1d00 \u1d07\u0274 \u029f\u1d00 \u01eb\u1d1c\u1d07 s\u026a\u1d07\u0274\u1d1b\u1d07 \u01eb\u1d1c\u1d07 \u1d18\u1d1c\u1d07\u1d05\u1d07 \u1d1b\u1d07\u0274\u1d07\u0280 \u1d07x\u026a\u1d1b\u1d0f"'

I try with a.decode('utf8') but the truth is I don't know what kind of code is the str a

The output I need is:

"ɴᴜɴᴄᴀ ᴀʏᴜᴅᴇs ᴀ ᴜɴ ɴɪñᴏ ᴏ ᴜɴᴀ ɴɪñᴀ ᴄᴏɴ ᴜɴᴀ ᴛᴀʀᴇᴀ ᴇɴ ʟᴀ ǫᴜᴇ sɪᴇɴᴛᴇ ǫᴜᴇ ᴘᴜᴇᴅᴇ ᴛᴇɴᴇʀ ᴇxɪᴛᴏ"

ERROR:

>>> print(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "F:\WinPython-64bit-2.7.13.1Zero\python-2.7.13.amd64\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-5: character maps to <undefined>

Since you are on Python2, you have to encode the string contents - which are already text, to your terminal encoding.

So, if you are on windows, print(a.encode("cp-850")) , if you are on Linux, Mac-OS, or other OS: print(a.encode("utf-8"))

On Python3 the encoding should be done automatically. Also, it is important to understand that characters codified like \\uNNNN in Python correspond to Unicode codepoints - and not to specific character encodings like "utf-8", "latin1" or "utf-16". In Python 3 most readable characters encoding like this will be shown even with the string internal representation, which is displayed by default in a Python interactive session (otherwise use the built-in repr call to see it). By using the built-in "str" or a call to print , you see the rendered string, and all \\uXXXX , \\UXXXXXXXX , \\xNN and \\N{unicode character name} tokens are rendered as the actual characters. (In Python2 you need to manually encode this representation to the character encoding used in your device)

In other words, if you are using Python 3, this is as simple as:


In [15]: a = u'"\u0274\u1d1c\u0274\u1d04\u1d00 \u1d00\u028f\u1d1c\u1d05\u1d07s \u1d00 \u1d1c\u0274 \u0274\u026a\xf1\u1d0f \u1d0f \u1d1c\u0274\u1d00 \u0274\u026a\xf1\u1d00 \u1d04\u1d0f\u0274 \u1d1c\u0274\u1d00 \u1d1b\u1d00\u0280\u1d07\u1d00 \u1d07\u0274 \u029f\u1d00 \u01eb\u1d1c\u1d07 s\u026a\u1d07\u0274\u1d1b\u1d07 \u01eb\u1d1c\u1d07 \u1d18\u1d1c\u1d07\u1d05\u1d07 \u1d1b\u1d07\u0274\u1d07\u0280 \u1d07x\u026a\u1d1b\u1d0f"' 
    ...:                                                                                                                                                            

In [16]: a                                                                                                                                                          
Out[16]: '"ɴᴜɴᴄᴀ ᴀʏᴜᴅᴇs ᴀ ᴜɴ ɴɪñᴏ ᴏ ᴜɴᴀ ɴɪñᴀ ᴄᴏɴ ᴜɴᴀ ᴛᴀʀᴇᴀ ᴇɴ ʟᴀ ǫᴜᴇ sɪᴇɴᴛᴇ ǫᴜᴇ ᴘᴜᴇᴅᴇ ᴛᴇɴᴇʀ ᴇxɪᴛᴏ"'

Or:
In [17]: print(a)                                                                                                                                                   
"ɴᴜɴᴄᴀ ᴀʏᴜᴅᴇs ᴀ ᴜɴ ɴɪñᴏ ᴏ ᴜɴᴀ ɴɪñᴀ ᴄᴏɴ ᴜɴᴀ ᴛᴀʀᴇᴀ ᴇɴ ʟᴀ ǫᴜᴇ sɪᴇɴᴛᴇ ǫᴜᴇ ᴘᴜᴇᴅᴇ ᴛᴇɴᴇʀ ᴇxɪᴛᴏ"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM