当存在Unicode数据时，Json解码器不一致

Question

（这个问题与此有关）

看一下以下会话：

Python 2.7.3 (default, Jan  2 2013, 16:53:07) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import simplejson as json
>>> 
>>> my_json = '''[
...   {
...     "id" : "normal",
...     "txt" : "This is a normal entry"
...   },
...   {
...     "id" : "αβγδ",
...     "txt" : "This is a unicode entry"
...   }
... ]'''
>>> 
>>> cache = json.loads(my_json, encoding='utf-8')
>>> 
>>> cache
[{'txt': 'This is a normal entry', 'id': 'normal'}, {'txt': 'This is a unicode entry', 'id': u'\u03b1\u03b2\u03b3\u03b4'}]

为什么json解码器有时产生unicode，有时产生纯字符串？ 它不应该产生总是 unicode吗？

Answer 1

这似乎是来自simplejson docs的 simplejson中的优化：

如果s是str，则出于性能和内存原因，可以将仅包含ASCII字符的解码JSON字符串解析为str。 如果您的代码只希望unicode，则合适的解决方案是在调用解码之前将s解码为unicode。

注意：ASCII中包含的任何字符在UTF-8和ASCII中的编码均相同。 因此ASCII是UTF-8的子集。

当存在Unicode数据时，Json解码器不一致

问题描述

1 个解决方案

解决方案1
4 已采纳 2013-10-31 09:06:25

当存在Unicode数据时，Json解码器不一致

问题描述

1 个解决方案

解决方案1 4 已采纳 2013-10-31 09:06:25

解决方案1
4 已采纳 2013-10-31 09:06:25