简体   繁体   English

将带前导零整数的字符串转换为json

[英]Converting string with leading-zero integer to json

I convert a string to a json-object using the json -library: 我使用json -library将字符串转换为json-object:

a = '{"index":1}'
import json
json.loads(a)
{'index': 1}

However, if I instead change the string a to contain a leading 0, then it breaks down: 但是,如果我改为将字符串a更改为包含前导0,那么它会分解:

a = '{"index":01}'
import json
json.loads(a)
>>> JSONDecodeError: Expecting ',' delimiter

I believe this is due to the fact that it is invalid JSON if an integer begins with a leading zero as described in this thread . 我相信这是因为如果一个整数以此线程中描述的前导零开头,则它是无效的JSON。

Is there a way to remedy this? 有办法解决这个问题吗? If not, then I guess the best way is to remove any leading zeroes by a regex from the string first, then convert to json? 如果没有,那么我想最好的方法是首先从字符串中删除正则表达式的任何前导零,然后转换为json?

First, using regex on JSON is evil, almost as bad as killing a kitten. 首先,在JSON上使用正则表达式是邪恶的,几乎和杀死小猫一样糟糕。

If you want to represent 01 as a valid JSON value, then consider using this structure: 如果要将01表示为有效的JSON值,请考虑使用此结构:

a = '{"index" : "01"}'
import json
json.loads(a)

If you need the string literal 01 to behave like a number, then consider just casting it to an integer in your Python script. 如果您需要字符串文字01的行为类似于数字,那么请考虑将其转换为Python脚本中的整数。

How to convert string int JSON into real int with json.loads Please see the post above You need to use your own version of Decoder. 如何使用json.loads将string int JSON转换为real int请参阅上面的帖子您需要使用自己的Decoder版本。

More information can be found here , in the github https://github.com/simplejson/simplejson/blob/master/index.rst 更多信息可以在这里找到,在github https://github.com/simplejson/simplejson/blob/master/index.rst

c = '{"value": 02}'
value= json.loads(json.dumps(c))
print(value)

This seems to work .. It is strange 这似乎有用......很奇怪

> >>> c = '{"value": 02}'
> >>> import json
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 02}
> >>> c = '{"value": 0002}'
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 0002}

As @Dunes, pointed out the loads produces string as an outcome which is not a valid solution. 正如@Dunes所指出的那样,load会产生字符串作为结果,这不是一个有效的解决方案。 However, 然而,

DEMJSON seems to decode it properly. DEMJSON似乎正确解码它。 https://pypi.org/project/demjson/ -- alternative way https://pypi.org/project/demjson/ - 替代方式

>>> c = '{"value": 02}'
>>> import demjson
>>> demjson.decode(c)
{'value': 2}

A leading 0 in a number literal in JSON is invalid unless the number literal is only the character 0 or starts with 0. . 除非数字文字只是字符0或以0.开头,否则JSON中数字文字中的前导0无效。 The Python json module is quite strict in that it will not accept such number literals. Python json模块非常严格,因为它不接受这样的数字文字。 In part because a leading 0 is sometimes used to denote octal notation rather than decimal notation. 部分原因是前导0有时用于表示八进制表示法而不是十进制表示法。 Deserialising such numbers could lead to unintended programming errors. 对这些数字进行反序列化可能会导致意外的编程错误。 That is, should 010 be parsed as the number 8 (in octal notation) or as 10 (in decimal notation). 也就是说,应将010解析为数字8 (八进制表示法)或10 (十进制表示法)。

You can create a decoder that will do what you want, but you will need to heavily hack the json module or rewrite much of its internals. 您可以创建一个可以执行您想要的解码器,但是您需要严重破解json模块或重写其内部的大部分内容。 Either way, you will see a performance slow down as you will no longer be using the C implementation of the module. 无论哪种方式,您将看到性能变慢,因为您将不再使用模块的C实现。

Below is an implementation that can decode JSON which contains numbers with any number of leading zeros. 下面是一个可以解码JSON的实现,其中包含具有任意数量前导零的数字。

import json
import re
import threading

# a more lenient number regex (modified from json.scanner.NUMBER_RE)
NUMBER_RE = re.compile(
    r'(-?(?:\d*))(\.\d+)?([eE][-+]?\d+)?',
    (re.VERBOSE | re.MULTILINE | re.DOTALL))


# we are going to be messing with the internals of `json.scanner`. As such we
# want to return it to its initial state when we're done with it, but we need to
# do so in a thread safe way.
_LOCK = threading.Lock()
def thread_safe_py_make_scanner(context, *, number_re=json.scanner.NUMBER_RE):
    with _LOCK:
        original_number_re = json.scanner.NUMBER_RE
        try:
            json.scanner.NUMBER_RE = number_re
            return json.scanner._original_py_make_scanner(context)
        finally:
            json.scanner.NUMBER_RE = original_number_re

json.scanner._original_py_make_scanner = json.scanner.py_make_scanner
json.scanner.py_make_scanner = thread_safe_py_make_scanner


class MyJsonDecoder(json.JSONDecoder):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # overwrite the stricter scan_once implementation
        self.scan_once = json.scanner.py_make_scanner(self, number_re=NUMBER_RE)


d = MyJsonDecoder()
n = d.decode('010')
assert n == 10

json.loads('010') # check the normal route still raise an error

I would stress that you shouldn't rely on this as a proper solution. 我要强调你不应该依赖这个作为一个合适的解决方案。 Rather, it's a quick hack to help you decode malformed JSON that is nearly, but not quite valid. 相反,它是一个快速的黑客,可以帮助您解码几乎但不太有效的格式错误的JSON。 It's useful if recreating the JSON in a valid form is not possible for some reason. 如果由于某种原因无法以有效形式重新创建JSON,则此选项很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM