简体   繁体   中英

Getting website data from a URL using requests

Trying to use requests to retrieve website data and convert it into a json file. While it works for one url you'll see below, it's not working for other types of urls. I'm sure its just a bad url but I think my syntax is off too.

r = requests.get('https://icanhazdadjoke.com/', headers={'Accept': 'application/json'})
data = r.json()
data

output: {'id': 'Z8UDIRuXLmb',
 'joke': 'Who did the wizard marry? His ghoul-friend',
 'status': 200}

But when I try:

r = requests.get('https://icanhazdadjoke.com/', headers={'Accept': 'application/json'})
data = r.json()
print(data)

or

data = requests.get('http://web.archive.org/web/20180326124748/https://www.theguardian.com/side-hustle', headers={'Accept': 'application/json'}).json()
print(data)

I get this full traceback error:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-50-92f4c7755f72> in <module>()
----> 1 data = requests.get('http://web.archive.org/web/20180326124748/https://www.theguardian.com/side-hustle', headers={'Accept': 'application/json'}).json()
      2 data

3 frames
/usr/local/lib/python3.6/dist-packages/requests/models.py in json(self, **kwargs)
    896                     # used.
    897                     pass
--> 898         return complexjson.loads(self.text, **kwargs)
    899 
    900     @property

/usr/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352             parse_int is None and parse_float is None and
    353             parse_constant is None and object_pairs_hook is None and not kw):
--> 354         return _default_decoder.decode(s)
    355     if cls is None:
    356         cls = JSONDecoder

/usr/lib/python3.6/json/decoder.py in decode(self, s, _w)
    337 
    338         """
--> 339         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    340         end = _w(s, end).end()
    341         if end != len(s):

/usr/lib/python3.6/json/decoder.py in raw_decode(self, s, idx)
    355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
    358         return obj, end

JSONDecodeError: Expecting value: line 2 column 1 (char 1)

Is it just a bad url? Thanks!

The problem is not in your Python code.

You can check this with the wget cli program in Linux.

With

wget https://icanhazdadjoke.com

you get the HTML code of that webpage, but with

wget --header="Accept:application/json" https://icanhazdadjoke.com

you get JSON data containing a joke similar to the one you have shown.

However, for the website you got the error you get HTML code and not JSON code even with --header="Accept:application/json" . Therefore, trying to decode the returned data as JSON will yield an error, since it is not JSON.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM