简体   繁体   English

使用python解析HTTP数组响应

[英]Parsing HTTP array response with python

I'm trying to parse HTTP response via json, but it gives me character error, but when I'm trying to loop through this response via for loop, it splits everything in single characters. 我正在尝试通过json解析HTTP响应,但它给了我字符错误,但是当我尝试通过for循环遍历此响应时,它将所有内容拆分为单个字符。 Is there better way to parse this response? 有没有更好的方法来解析此响应?

Code: 码:

    _url = self.MAIN_URL
    try:
        _request = self.__webSession.get(_url, cookies=self.__cookies)
        if _request.status_code != 200:
            self.log("Request failed with code: {}. URL: {}".format(_request.status_code, _url))
            return
    except Exception as err:
        self.log("[e4] Web-request error: {}. URL: {}".format(err, _url))
        return

    _text = _request.json()

json.loads() returns following json.loads()返回以下内容

 Expecting value: line 1 column 110 (char 109)

HTTP Response needed to be parsed: HTTP响应需要解析:

[
  [
    9266939,
    'Value1',
    'Value2',
    'Value3',
            ,
    'Value4',
        [
            [
                'number',
                'number2',
                    [
                        'value',
                               ,
                        'value2'
                    ]
            ]
        ]
  ],
  [
    5987798,
    'Value1',
    'Value2',
            ,
    'Value3',
    'Value4',
        [
            [
                'number',
                'number2',
                    [
                        'value',
                        'value2'
                    ]
            ]
        ]
  ]
]

While the error message is confusing because of the line and column numbers, the JSON format in any case does not accept single quotes for strings, so the given HTTP response is not in JSON format. 尽管由于行号和列号而使错误消息令人困惑,但是JSON格式在任何情况下都不接受字符串的单引号,因此给定的HTTP响应不是JSON格式。 You have to use double quotes for strings. 您必须对字符串使用双引号。

So you have to make the input like this instead (if you are in control of it): 因此,您必须改为这样输入(如果您可以控制它):

[
  [
    9266939,
    "Value1",
    "Value2",
    "Value3",
    "Value4",
    [
        [
        "number",
        "number2",
            [
            "value",
            "value2"
            ]
        ]
...

If you are not in control of the HTTP response you are parsing, you could replace all single quotes with double quotes before parsing: 如果您无法控制要解析的HTTP响应,则可以在解析之前将所有单引号替换为双引号:

http_response_string = (get the HTTP response)
adjusted_http_response_string = http_response_string.replace("'", '"')
data = json.loads(adjusted_http_response_string)

But that of course comes with a potential risk of replacing single quotes (or apostrophes) that aren't meant to be string delimiters. 但这当然带有替换单引号(或单引号)的潜在风险,这些单引号不是字符串分隔符。 It might solve the problem sufficiently, though, working most of the time. 但是,它在大多数时间都可以充分解决问题。

EDIT: 编辑:

Further cleanup as requested in the comments: 根据注释中的要求进一步清理:

http_response_string = (get the HTTP response)

# More advanced replacement of ' with ", expecting
# strings to always come after at least four spaces,
# and always end in either comma, colon, or newline.
adjusted_http_response_string = \
    re.sub("(    )'", r'\1"',
    re.sub("'([,:\n])", r'"\1',
    http_response_string))

# Replacing faulty ",  ," with ",".
adjusted_http_response_string = \
    re.sub(",(\s*,)*", ",", 
    adjusted_http_response_string)

data = json.loads(adjusted_http_response_string)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM