简体   繁体   中英

Unable to load json containing escape sequences

I'm being passed some Json and am having trouble parsing it.

The object is currently simple with a single key/value pair. The key works fine but the value \\d causes issues.

This is coming from an html form, via javascript. All of the below are literals.

  • Html: \\d
  • Javascript: {'Key': '\\d'}
  • Json: {"Key": "\\\\d"}

json.loads() doesn't seem to like Json in this format. A quick sanity check that I'm not doing anything silly works fine:

>>> import json
>>> json.loads('{"key":"value"}')
{'key': 'value'}

Since I'm declaring this string in Python, it should escape it down to a literal of va\\\\lue - which, when parsed as Json should be va\\lue .

>>> json.loads('{"key":"va\\\\lue"}')
{'key': 'va\\lue'}

In case python wasn't escaping the string on the way in, I thought I'd check without the doubling...

>>> json.loads('{"key":"va\\lue"}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python33\lib\json\__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "C:\Python33\lib\json\decoder.py", line 352, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python33\lib\json\decoder.py", line 368, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid \escape: line 1 column 11 (char 10)

but it fails, as expected.

I can't see any way to parse Json field that should contain a single backslash after all the unescaping has taken place.

How can I get Python to deserialize this string literal {"a":"val\\\\ue\u0026quot;} (which is valid Json ) into the appropriate python representation: {'a': 'val\\ue\u0026#39;} ?

As an aside, it doesn't help that PyDev is inconsistent with what representation of a string it uses. The watch window shows double backslashes, the tooltip of the variable shows quadruple backslashes. I assume that's the "If you were to type the string, this is what you'd have to use for it to escape to the original" representation, but it's by no means clear.

Edit to follow on from @twalberg's answer...

>>> input={'a':'val\ue'}
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec cant decode bytes in position 3-5: truncated \uXXXX escape
>>> input={'a':'val\\ue'}
>>> input
{'a': 'val\\ue'}
>>> json.dumps(input)
'{"a": "val\\\\ue"}'
>>> json.loads(json.dumps(input))
{'a': 'val\\ue'}
>>> json.loads(json.dumps(input))['a']
'val\\ue'

Using json.dumps() to see how json would represent your target string:

>>> orig = { 'a' : 'val\ue' }
>>> jstring = json.dumps(orig)
>>> print jstring
{"a": "val\\ue"}
>>> extracted = json.loads(jstring)
>>> print extracted
{u'a': u'val\\ue'}
>>> print extracted['a']
val\ue
>>> 

This was in Python 2.7.3, though, so it may be only partially relevant to your Python 3.x environment. Still, I don't think JSON has changed that much...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM