简体   繁体   中英

scraping yell with python requests gives 403 error

I have this code

from requests.sessions import Session
url = "https://www.yell.com/s/launderettes-birmingham.html"

s = Session()
headers = {
    'user-agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
}
r = s.get(url,headers=headers)
print(r.status_code)

but I get 403 output, instead 200

I can scrape this data with selenium, but is there a way to scrape this with requests

If you modify your code like so:

print(r.text)
print(r.status_code)

you will see, that the reason you are getting a 400 error code is due to yell using Cloudflare browser check.

As it uses javascript, there is no way to reliably use the requests module.

Since you mentioned you are going to use selenium, make sure to use the undetected driver package Also, be sure to rotate your IP to avoid getting your IP blocked.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM