简体   繁体   中英

Webscraping with Python Requests and getting Access Denied even after updating headers

this webscraper was working for a while but the website must have been updated so it no longer works. After each request I get an Access Denied error, I have tried adding headers but still get the same issue. This is what the code prints:

</html>

<html><head>
<title>Access Denied</title>
</head><body>
<h1>Access Denied</h1>

You don't have permission to access "http://www.jdsports.co.uk/product/white-nike-air-force-1-shadow-womens/15984107/" on this server.<p>
Reference #18.4d4c1002.1616968601.6e2013c
</p></body>
</html>

Heres the part of the code to get the HTML:

scraper=requests.Session()

headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',
}
            
html = scraper.get(info[0], proxies= proxy_test, headers=headers).text
soup = BeautifulSoup(html, 'html.parser')

print(soup)
stock = soup.findAll("button", {"class": "btn btn-default"})

What else can I try to fix it? The website I was to scrape is https://www.jdsports.co.uk/

Not sure where you are, but here in the US, your code works for me. I just had to use a different product as the one listed above in the url didn't exist. I was able to see a list of buttons. Didn't require headers either.

url = 'https://www.jdsports.co.uk/product/black-nike-air-force-1-react-lv8-all-stars/16080098/'
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
soup.findAll("button", {"class": "btn btn-default"})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM