简体   繁体   中英

Downloading Json data from website

I am trying to write a python script which will take a bunch of urls one at a time and fetch the response content of that url and store it as json files.

Here is what I wrote initially to get the response of the url

def download_json()

    params={'id':00163E0BD0C1FA89,
            'list':'141',
            'queue': 'gen',
            'type': 'abc_stat'
           }



    req_obj= requests.get(link, params=params)
    print(req_obj.url)
    print(req_obj.status_code)

    return req_obj

It creates the right url as when I copy the url directly in browser it shows me the output in json format. Here is one row of json output i am seeing on the browser:

{
  "DATA" : [
    {
      "SCHEMA" : "abc_4_QAATu2.",
      "ID" : "QAATu2",
      "IM_ID" : "22faba86_c9e0_4dbc",
      "S_NUMBER" : "502379284",
      "CONFIG_TYPE" : "las_home_type",
      "CONFIG_KEY" : "las_home_key",
      "CONFIG_LONG_V" : "1",
      "CONFIG_STRING_V" : "https://abc-deg/development",
      "MODIFIED_DATE" : "Unknown"
    },

So this does show that data is returned in json format when I enter the url in browser directly.

However my requests object has this for headers:

Out[26]:

{'content-length': '15457', 'expires': '0', 'content-encoding': 'gzip', 'cache-control': 'no-cache, no-store, private', 'set-cookie': 'login-XSRF_RZA=2018051-axJnifQUpOnrS8WCFI; path=/abc/deo/cpo; secure; HttpOnly, usercontext=client=002; path=/', 'content-type': 'text/html; charset=utf-8', 'pragma': 'no-cache, no-store, private'}

Now when I do requests.json() to get the data in json python object I get the following error

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-28-4cfc1a694fcf> in <module>()
----> 1 req_obj.json()

/Users/anaconda/envs/dl/lib/python3.5/site-packages/requests/models.py in json(self, **kwargs)
    890                     # used.
    891                     pass
--> 892         return complexjson.loads(self.text, **kwargs)
    893 
    894     @property

/Users/anaconda/envs/dl/lib/python3.5/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    317             parse_int is None and parse_float is None and
    318             parse_constant is None and object_pairs_hook is None and not kw):
--> 319         return _default_decoder.decode(s)
    320     if cls is None:
    321         cls = JSONDecoder

/Users/anaconda/envs/dl/lib/python3.5/json/decoder.py in decode(self, s, _w)
    337 
    338         """
--> 339         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    340         end = _w(s, end).end()
    341         if end != len(s):

/Users/anaconda/envs/dl/lib/python3.5/json/decoder.py in raw_decode(self, s, idx)
    355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
    358         return obj, end

JSONDecodeError: Expecting value: line 2 column 1 (char 2)

EDIT:

The content_type if you see in above headers is shown as html even when on browser it shows json as output

However when I do

req_obj.json 

<bound method Response.json of <Response [200]>>

But req_obj.json() gives below error.

Any idea why it is not able to convert the data into json format when output is actually in json format as shown above? Thanks

According to the documentation :

In case the JSON decoding fails, r.json() raises an exception. For example, if the response gets a 204 (No Content), or if the response contains invalid JSON, attempting r.json() raises ValueError: No JSON object could be decoded.

Although it's not throwing the same error message, the cause appears to be the same: you're probably not getting JSON as an answer, which would explain why it JSONDecode throws an exception.

You should be able to confirm this by printing req_obj.text instead of using req_obj.json() .

As for how to fix it, I suspect that there must be something different between the request you're making using the browser and the one you're making using Python (such as different parameters).

I suggest you read this to further investigate the source of the problem.

According to this document: http://docs.python-requests.org/en/master/

You could check the req_obj.status_code and r.headers['content-type'] . If the status_code is 200 and the content type is 'application/json; charset=utf8' 'application/json; charset=utf8' then you can try to check for req_obj.json() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM