I really need help to solve my problem.
I have an error:
"[scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.islam.gov.my/robots.txt> (referer: None)"
When I try running scrapy crawl my_scraper -o ehadith.csv
That's not an error. That's a debug level log telling you that you're spider successfully downloaded the domain's robots.txt
file.
The other issue you are having is 403 responses. Try using the AutoThrottle extension to reduce requests concurrency.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.