[英]Converting unicode characters from returned JSON in python
I am trying to return JSON from the API service from musicbrainz, the returned data for some songs have unicode characters which I am having trouble converting them to regular symbols etc. Kindly let me know what I should be doing here. 我正在尝试从musicbrainz的API服务返回JSON,某些歌曲的返回数据包含unicode字符,我很难将它们转换为常规符号等。请让我知道我应该在这里做什么。
JSON: JSON:
{
"status": "ok",
"results": [{
"recordings": [{
"duration": 402,
"tracks": [{
"duration": 402,
"position": 6,
"medium": {
"release": {
"id": "dde6ecee-8e9b-4b46-8c28-0f8d659f83ac",
"title": "Tecno Fes, Volume 2"
},
"position": 1,
"track_count": 11
},
"artists": [{
"id": "57c1e5ea-e08f-413a-bcb1-f4e4b675bead",
"name": "Gigi D\u2019Agostino"
}],
"title": "You Spin Me Round"
}],
"id": "2e0a7bce-9e44-4a63-a789-e8c4d2a12af9"
}, ....
Failed Code (example): 代码失败(示例):
string = '\u0420\u043e\u0441\u0441\u0438\u044f'
print string.encode('utf-8')
I am using this on a windows 7 machine and have python 2.7 and running this code on a command line terminal.. I have the output I get below: 我在Windows 7机器上使用此工具,安装了python 2.7,并在命令行终端上运行了此代码。.我得到的输出如下:
C:\Python27>python junk.py Gigi DGÇÖAgostino Gigi D?Agostino Gigi D\u2019Agostino
I am expecting the output to be Gigi D' Agostino
我期望输出是
Gigi D' Agostino
Unicode escape only works with unicode strings, to convert your regular string to unicode use str.decode('unicode-escape')
: Unicode转义仅适用于unicode字符串,请使用
str.decode('unicode-escape')
将常规字符串转换为unicode:
In [1]: s='\u0420\u043e\u0441\u0441\u0438\u044f'
In [2]: s
Out[2]: '\\u0420\\u043e\\u0441\\u0441\\u0438\\u044f'
In [3]: s.decode('unicode-escape')
Out[3]: u'\u0420\u043e\u0441\u0441\u0438\u044f'
In [4]: print s.decode('unicode-escape')
Россия
In [5]: s2="Gigi D\u2019Agostino"
In [6]: s2
Out[6]: 'Gigi D\\u2019Agostino'
In [7]: print s2.decode('unicode-escape')
Gigi D’Agostino
You should use json parser that returns Unicode string as any valid json parser does. 您应该像任何有效的json解析器一样使用返回Unicode字符串的json解析器。 Your failing example shows a bytestring ie, you haven't used a json parser.
您失败的示例显示了一个字节字符串,即您没有使用json解析器。
For example, to parse json data: 例如,解析json数据:
obj = json.load(urllib2.urlopen(request))
To pretty print obj
without using Unicode escapes: 要在不使用Unicode转义的情况下漂亮地打印
obj
:
print json.dumps(obj, indent=4, ensure_ascii=False)
It is also useful to understand the difference between: 了解以下内容之间的区别也很有用:
print unicode_string
And: 和:
print repr(unicode_string)
You are using the cmd
in Windows? 您在Windows中使用
cmd
吗? In that case it might be a bit of a hack to get Unicode working at all to display correctly. 在这种情况下,使Unicode正常工作以完全显示可能有点麻烦。 You might want to think about using another "terminal" to test your scripts.
您可能想考虑使用另一个“终端”来测试脚本。 MSYS provides a nice terminal/shell and IDLE is included in the Windows Python distribution and has a Python Shell (right click, open in IDLE, F5).
MSYS提供了一个不错的终端/外壳,并且Windows Python发行版中包含IDLE,并且具有Python Shell(右键单击,在IDLE中打开,F5)。
If you really want to make it work in the cmd
: 如果您真的想使它在
cmd
工作:
You have to set Lucida Console
as font in cmd
. 您必须将
Lucida Console
设置为cmd
字体。 Then: 然后:
> chcp
Active code page: 850
> chcp 65001
Then you should have unicode output in the cmd
. 然后,您应该在
cmd
具有unicode输出。 Your "Active code page" might be different. 您的“活动代码页”可能有所不同。 Note that somewhere, because you might want to change it back afterwards:
请注意,在某个地方,因为您可能之后需要将其更改回:
> chcp 850
Otherwise you will run into other problems (starting .bat
files doesn't work). 否则,您将遇到其他问题(无法启动
.bat
文件)。 (See also batch-file-encoding ) (另请参阅批处理文件编码 )
In your script you also need this: 在脚本中,您还需要:
import codecs
def cp65001(name):
"""This might be buggy, but better than just a LookupError
"""
if name.lower() == "cp65001":
return codecs.lookup("utf-8")
codecs.register(cp65001)
Otherwise python will crash. 否则python将崩溃。 (see windows-cmd-encoding-change-causes-python-crash )
(请参阅Windows-cmd-encoding-change-causes-python-crash )
I had a similar bug report for my script. 我的脚本有类似的错误报告 。
You might also consider using a library to access the MusicBrainz Web Service. 您可能还考虑使用库来访问MusicBrainz Web服务。 Python-musicbrainzngs works with the current ws/2.
Python-musicbrainzngs使用当前的ws / 2。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.