繁体   English   中英

尽管在标头中设置了 User-Agent,但在抓取时出现 403 错误

[英]403 Error when scraping despite setting User-Agent in header

我想抓取一个网站(用于足球比赛的球员统计数据),但出现 403 错误。 这是我第一次尝试抓取。

url = ' https://www.whoscored.com/Matches/1375928/LiveStatistics/England-Premier-League-2019-2020-West-Ham-Manchester-City '

headers = {'Sec-Fetch-Mode': 'no-cors',
'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'} 

result = requests.get(url, headers=headers)

print(result.status_code)

编辑:我可以使用浏览器(chrome)打开网页。

编辑2:如果我跑

print(result.status_code)
print(result.headers)
print(result.content)

然后我得到以下信息

403
{'Content-Type': 'text/html', 'Cache-Control': 'no-cache', 'Connection': 'close', 'Content-Length': '736', 'X-Iinfo': '9-168604272-0 0NNN RT(1566297863307 56) q(0 -1 -1 -1) r(0 -1) B15(4,200,0) U18', 'X-Iejgwucgyu': '1', 'Set-Cookie': 'visid_incap_774904=wSb3+5UxQeC+slK3rAhjswfPW10AAAAAQUIPAAAAAADmqJS6Gs0uzOV2Z5XomjoU; expires=Wed, 19 Aug 2020 06:56:00 GMT; path=/; Domain=.whoscored.com, incap_ses_198_774904=2GHrGcAd9C8niMLwwnK/AgfPW10AAAAAttp7+XadyowHY5iqiWs/Yg==; path=/; Domain=.whoscored.com'}
b'<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe id="main-iframe" src="/_Incapsula_Resource?CWUDNSAI=21&xinfo=9-168604272-0%200NNN%20RT%281566297863307%2056%29%20q%280%20-1%20-1%20-1%29%20r%280%20-1%29%20B15%284%2c200%2c0%29%20U18&incident_id=198003090216026722-548063901729035097&edet=15&cinfo=04000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 198003090216026722-548063901729035097</iframe></body></html>'

您需要在会话中添加 cookie。 有用。 我已经从我的浏览器添加了 cookie。

import requests

session = requests.Session()

session.headers.update({'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0"})

session.cookies["visid_incap_774904"]="SRvZ2F36RzuA5U8jaUC8yq3fXF0AAAAAQUIPAAAAAAC/7mBuVWtbzccGROHlxPzv"
session.cookies["incap_ses_964_774904"]="hJHbakasVSAoo8+/rNFgDa7fXF0AAAAA0e9groglmml+odd4mLW2zg=="
session.cookies["_cmpQcif3pcsupported"]="0"
session.cookies["googlepersonalization"]="OloL0IOloL0IgA"
session.cookies["eupubconsent"]="BOloL0IOloL0IAKAYAENAAAA6AAAAA"
session.cookies["euconsent"]="BOloL0IOloL0IAKAYBENCh-AAAAp57v______9______9uz_Ov_v_f__33e8__9v_l_7_-___u_-3zd4u_1vf99yfm1-7etr3tp_87ues2_Xur__79__3z3_9phP78k89r7337Ew-v83oA"

resp = session.get("https://www.whoscored.com/Matches/1375928/LiveStatistics/England-Premier-League-2019-2020-West-Ham-Manchester-City")

print(resp.status_code)

print(resp.text)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM