简体繁体中英

Scrapy Access Denied crawling the head of a website

原文 2020-07-14 09:37:08 1 1 python/ web-scraping/ scrapy/ web-crawler

I wanna crawler a website, but I got the next error:

'<head>\n<title>Access Denied</title>\n</head>'

I just trying in the console:

scrapy shell https://www.zara.com/es/en/
response.css("head").get()

What I am doing wrong? Is related to the User-Agent? Does the website have an anti-crawling method? How can crawl this website?

1 answers

Set USER_AGENT = 'zara (+http://www.yourdomain.com)' in settings.py. Solves the issue. You could put your own user agent if you like also.

Scrapy crawling not working on ASPX website

Scrapy Splash Crawling Javascript Website

Crawling all comments on a website with scrapy

Scrapy - issues with crawling deeper into the website

How to keep Scrapy from crawling “denied” pages

Scrapy - How to scrape a website when access is denied [Lowes]

Scrapy not crawling

How to find element iTunes Connect website for scrapy(crawling)?

Prevent the scrapy spider from crawling one part of the website too long

Scrapy unable to follow internal links while crawling website

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Scrapy crawling not working on ASPX website Scrapy Splash Crawling Javascript Website Crawling all comments on a website with scrapy Scrapy - issues with crawling deeper into the website How to keep Scrapy from crawling “denied” pages Scrapy - How to scrape a website when access is denied [Lowes] Scrapy not crawling How to find element iTunes Connect website for scrapy(crawling)? Prevent the scrapy spider from crawling one part of the website too long Scrapy unable to follow internal links while crawling website

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM