使用 Python 的 requests 库的 JSON 的困难

Question

在尝试抓取 producthunt 时，

import requests

headers = {
    'authority': 'www.producthunt.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'upgrade-insecure-requests': '1',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'en-US,en;q=0.9',
}

response = requests.get('https://www.producthunt.com/', headers=headers)

我发现返回的响应没有要转换为 json 的有效字符串。 在尝试用 response.text.replace() 替换引用类型并用 json.loads(re.sub(r'^jsonp\d+(|)\s+$', '', response.text 返回 json ）），我仍然得到同样的错误。

错误：

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

想法？

Answer 1

问题与 JSON 无关

您请求的是网页，而不是 JSON API

$ curl -sH 'Accept: application/json' https://www.producthunt.com/ | head -c 200
<!DOCTYPE html><html lang="en"><head><title>Product Hunt – The best new products in tech.</title><link rel="canonical" href="https://www.producthunt.com/"/><meta name="description" content="Product %

您应该使用beautifulsoup或selenium-webdriver来提取 HTML 内容，然后根据您的需要解析为 Z0ECD11C1D7A287401DZ148A2

实际上，该站点在https://www.producthunt.com/frontend/graphql使用 GraphQL

使用 Python 的 requests 库的 JSON 的困难

问题描述

1 个解决方案

解决方案1
0 2020-07-30 19:16:08

使用 Python 的 requests 库的 JSON 的困难

问题描述

1 个解决方案

解决方案1 0 2020-07-30 19:16:08

解决方案1
0 2020-07-30 19:16:08