简体   繁体   中英

Parsing HTTP array response with python

I'm trying to parse HTTP response via json, but it gives me character error, but when I'm trying to loop through this response via for loop, it splits everything in single characters. Is there better way to parse this response?

Code:

    _url = self.MAIN_URL
    try:
        _request = self.__webSession.get(_url, cookies=self.__cookies)
        if _request.status_code != 200:
            self.log("Request failed with code: {}. URL: {}".format(_request.status_code, _url))
            return
    except Exception as err:
        self.log("[e4] Web-request error: {}. URL: {}".format(err, _url))
        return

    _text = _request.json()

json.loads() returns following

 Expecting value: line 1 column 110 (char 109)

HTTP Response needed to be parsed:

[
  [
    9266939,
    'Value1',
    'Value2',
    'Value3',
            ,
    'Value4',
        [
            [
                'number',
                'number2',
                    [
                        'value',
                               ,
                        'value2'
                    ]
            ]
        ]
  ],
  [
    5987798,
    'Value1',
    'Value2',
            ,
    'Value3',
    'Value4',
        [
            [
                'number',
                'number2',
                    [
                        'value',
                        'value2'
                    ]
            ]
        ]
  ]
]

While the error message is confusing because of the line and column numbers, the JSON format in any case does not accept single quotes for strings, so the given HTTP response is not in JSON format. You have to use double quotes for strings.

So you have to make the input like this instead (if you are in control of it):

[
  [
    9266939,
    "Value1",
    "Value2",
    "Value3",
    "Value4",
    [
        [
        "number",
        "number2",
            [
            "value",
            "value2"
            ]
        ]
...

If you are not in control of the HTTP response you are parsing, you could replace all single quotes with double quotes before parsing:

http_response_string = (get the HTTP response)
adjusted_http_response_string = http_response_string.replace("'", '"')
data = json.loads(adjusted_http_response_string)

But that of course comes with a potential risk of replacing single quotes (or apostrophes) that aren't meant to be string delimiters. It might solve the problem sufficiently, though, working most of the time.

EDIT:

Further cleanup as requested in the comments:

http_response_string = (get the HTTP response)

# More advanced replacement of ' with ", expecting
# strings to always come after at least four spaces,
# and always end in either comma, colon, or newline.
adjusted_http_response_string = \
    re.sub("(    )'", r'\1"',
    re.sub("'([,:\n])", r'"\1',
    http_response_string))

# Replacing faulty ",  ," with ",".
adjusted_http_response_string = \
    re.sub(",(\s*,)*", ",", 
    adjusted_http_response_string)

data = json.loads(adjusted_http_response_string)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM