简体   繁体   English

用 python 请求刮大叫会给出 403 错误

[英]scraping yell with python requests gives 403 error

I have this code我有这个代码

from requests.sessions import Session
url = "https://www.yell.com/s/launderettes-birmingham.html"

s = Session()
headers = {
    'user-agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
}
r = s.get(url,headers=headers)
print(r.status_code)

but I get 403 output, instead 200但我得到 403 输出,而不是 200

I can scrape this data with selenium, but is there a way to scrape this with requests我可以用 selenium 刮取这些数据,但是有没有办法用请求刮取这些数据

If you modify your code like so:如果您像这样修改代码:

print(r.text)
print(r.status_code)

you will see, that the reason you are getting a 400 error code is due to yell using Cloudflare browser check.您会看到,您收到 400 错误代码的原因是由于使用 Cloudflare 浏览器检查而yell

As it uses javascript, there is no way to reliably use the requests module.由于它使用 javascript,因此无法可靠地使用 requests 模块。

Since you mentioned you are going to use selenium, make sure to use the undetected driver package Also, be sure to rotate your IP to avoid getting your IP blocked.既然您提到您将使用 selenium,请确保使用未检测到的驱动程序包另外,请确保轮换您的 IP以避免您的 IP 被阻止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM