無法使用請求模塊從網頁中抓取表格內容

Question

我正在嘗試使用請求模塊從網頁中抓取表格內容。 該頁面的內容是高度動態的但是，根據開發工具，可以通過 api 訪問它。 我試圖用適當的參數模仿相同的發布請求，但我總是得到狀態403 。

import requests
from pprint import pprint

start_url = 'https://opensea.io/rankings'
link = 'https://api.opensea.io/graphql/'
payload = {"id":"rankingsQuery","query":"query rankingsQuery(\n  $chain: [ChainScalar!]\n  $count: Int!\n  $cursor: String\n  $sortBy: CollectionSort\n  $parents: [CollectionSlug!]\n  $createdAfter: DateTime\n) {\n  ...rankings_collections\n}\n\nfragment rankings_collections on Query {\n  collections(after: $cursor, chains: $chain, first: $count, sortBy: $sortBy, parents: $parents, createdAfter: $createdAfter, sortAscending: false, includeHidden: true, excludeZeroVolume: true) {\n    edges {\n      node {\n        createdDate\n        name\n        slug\n        logo\n        stats {\n          floorPrice\n          marketCap\n          numOwners\n          totalSupply\n          sevenDayChange\n          sevenDayVolume\n          oneDayChange\n          oneDayVolume\n          thirtyDayChange\n          thirtyDayVolume\n          totalVolume\n          id\n        }\n        id\n        __typename\n      }\n      cursor\n    }\n    pageInfo {\n      endCursor\n      hasNextPage\n    }\n  }\n}\n","variables":{"chain":None,"count":100,"cursor":"YXJyYXljb25uZWN0aW9uOjk5","sortBy":"SEVEN_DAY_VOLUME","parents":None,"createdAfter":None}}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
    s.headers['x-api-key'] = '2f6f419a083c46de9d83ce3dbe7db601'
    s.headers['x-build-id'] = 'cplNDIqD8Uy8MvANX90r9'
    s.headers['referer'] = 'https://opensea.io/'
    res = s.post(link,json=payload)
    pprint(res.status_code)
    print(res.json())

如何使用請求模塊從該網頁中抓取表格內容？

Answer 1

您可以從腳本標簽中對其進行正則表達式，然后重建表。 有一些列格式要做。

import requests, re, json
import pandas as pd

r = requests.get('https://opensea.io/rankings')
data = json.loads(re.search(r'window\.__wired__=([^<]*)', r.text).group(1))
items = [v for v in data['records'].values() if v['__typename'] in ['CollectionType', 'CollectionStatsType']]
d = {i['name']:j for i, j in zip(items[::2], items[1::2])}
df = pd.DataFrame.from_dict(d, orient='index')      
print(df)

Answer 2

我不認為 graphql 查詢是你想要的。 那里有一個返回數據的 GET 查詢。

試試吧

res = s.get('https://api.opensea.io/tokens/?limit=100')

Answer 3

我認為 opensea 使用 CloudFlare 來保護其 API .. 嘗試通過ScrapeNinja或 Puppeteer 啟動您的請求 - 這種方式似乎可以正常工作。

無法使用請求模塊從網頁中抓取表格內容

問題描述

2 個解決方案

解決方案1
1 已采納 2021-08-15 21:52:59

解決方案2
0 2021-08-15 20:06:50

解決方案3
-1 2022-01-25 12:25:08

無法使用請求模塊從網頁中抓取表格內容

問題描述

2 個解決方案

解決方案1 1 已采納 2021-08-15 21:52:59

解決方案2 0 2021-08-15 20:06:50

解決方案3 -1 2022-01-25 12:25:08

解決方案1
1 已采納 2021-08-15 21:52:59

解決方案2
0 2021-08-15 20:06:50

解決方案3
-1 2022-01-25 12:25:08