简体   繁体   中英

how can i decode unicode string in python?

Wikipedia API encodes string into unicode format

"Golden Globe Award for Best Motion Picture \u2013 Drama"

how can i convert it back to

"Golden Globe Award for Best Motion Picture – Drama"

The Wikipedia API returns JSON data, use the json module to decode:

json.loads(inputstring)

Demo:

>>> import json
>>> print json.loads('"Golden Globe Award for Best Motion Picture \u2013 Drama"')
Golden Globe Award for Best Motion Picture – Drama

If you instead have a string that starts with u'' , you already have a Python unicode value and are looking at the representation of that string:

>>> json.loads('"Golden Globe Award for Best Motion Picture \u2013 Drama"')
u'Golden Globe Award for Best Motion Picture \u2013 Drama'

Just print that value to have Python encode it to your terminal codec and represent that em-dash character in a format your terminal will understand.

You may want to read up about Python and Unicode and encodings before you continue, if you do not understand what the difference is between a unicode value and byte strings:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM