Difficulties with JSON using Python's requests library

Question

When attempting to scrape producthunt,

import requests

headers = {
    'authority': 'www.producthunt.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'upgrade-insecure-requests': '1',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'en-US,en;q=0.9',
}

response = requests.get('https://www.producthunt.com/', headers=headers)

I found that the returned response doesn't have a valid string to convert to json. After trying to replace the type of quote with response.text.replace() and to return the json with json.loads(re.sub(r'^jsonp\d+(|)\s+$', '', response.text)), I still get the same error.

Error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Thoughts?

Answer 1

Problem has nothing to do with JSON

You're requesting a webpage, not a JSON API

$ curl -sH 'Accept: application/json' https://www.producthunt.com/ | head -c 200
<!DOCTYPE html><html lang="en"><head><title>Product Hunt – The best new products in tech.</title><link rel="canonical" href="https://www.producthunt.com/"/><meta name="description" content="Product %

You should use beautifulsoup or selenium-webdriver instead to extract HTML content, then parse to JSON, depending on your needs

And in reality, the site uses GraphQL at https://www.producthunt.com/frontend/graphql

Difficulties with JSON using Python's requests library

Question

1 answers

solution1
0 2020-07-30 19:16:08

Difficulties with JSON using Python's requests library

Question

1 answers

solution1 0 2020-07-30 19:16:08

solution1
0 2020-07-30 19:16:08