简体   繁体   中英

In Python, have json not escape a string

I am caching some JSON data, and in storage it is represented as a JSON-encode string. No work is performed on the JSON by the server before sending it to the client, other than collation of multiple cached objects, like this:

def get_cached_items():
  item1 = cache.get(1)
  item2 = cache.get(2)
  return json.dumps(item1=item1, item2=item2, msg="123")

There may be other items included with the return value, in this case represented by msg="123" .

The issue is that the cached items are double-escaped. It would behoove the library to allow a pass-through of the string without escaping it.

I have looked at the documentation for json.dumps default argument , as it seems to be the place where one would address this, and searched on google/SO but found no useful results.

It would be unfortunate, from a performance perspective, if I had to decode the JSON of each cached items to send it to the browser. It would be unfortunate from a complexity perspective to not be able to use json.dumps .

My inclination is to write a class that stores the cached string and when the default handler encounters an instance of this class it uses the string without perform escaping. I have yet to figure out how to achieve this though, and I would be grateful for thoughts and assistance.

EDIT For clarity, here is an example of the proposed default technique:

class RawJSON(object):
   def __init__(self, str):
       self.str = str

class JSONEncoderWithRaw(json.JSONEncoder):
   def default(self, o):
       if isinstance(o, RawJSON): 
          return o.str # but avoid call to `encode_basestring` (or ASCII equiv.)
       return super(JSONEncoderWithRaw, self).default(o)

Here is a degenerate example of the above:

>>> class M():
       str = ''
>>> m = M()
>>> m.str = json.dumps(dict(x=123))
>>> json.dumps(dict(a=m), default=lambda (o): o.str)
'{"a": "{\\"x\\": 123}"}'

The desired output would include the unescaped string m.str , being:

'{"a": {"x": 123}}'

It would be good if the json module did not encode/escape the return of the default parameter, or if same could be avoided. In the absence of a method via the default parameter, one may have to achieve the objective here by overloading the encode and iterencode method of JSONEncoder , which brings challenges in terms of complexity, interoperability, and performance.

A quick-n-dirty way is to patch json.encoder.encode_basestring*() functions:

import json

class RawJson(unicode):
    pass

# patch json.encoder module
for name in ['encode_basestring', 'encode_basestring_ascii']:
    def encode(o, _encode=getattr(json.encoder, name)):
        return o if isinstance(o, RawJson) else _encode(o)
    setattr(json.encoder, name, encode)


print(json.dumps([1, RawJson(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]

If you are caching JSON strings, you need to first decode them to python structures; there is no way for json.dumps() to distinguish between normal strings and strings that are really JSON-encoded structures:

return json.dumps({'item1': json.loads(item1), 'item2': json.loads(item2), 'msg': "123"})

Unfortunately, there is no option to include already-converted JSON data in this; the default function is expected to return Python values. You extract data from whatever object that is passed in and return a value that can be converted to JSON, not a value that is already JSON itself.

The only other approach I can see is to insert "template" values, then use string replacement techniques to manipulate the JSON output to replace the templates with your actual cached data:

json_data = json.dumps({'item1': '==item1==', 'item2': '==item2==', 'msg': "123"})
return json_data.replace('"==item1=="', item1).replace('"==item2=="', item2)

A third option is to cache item1 and item2 in non-serialized form, as a Python structure instead of a JSON string.

You can use the better maintained simplejson instead of json which provides this functionality.

import simplejson as json
from simplejson.encoder import RawJSON

print(json.dumps([1, RawJSON(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]

You get simplicity of code, plus all the C optimisations of simplejson .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM