I have a fully functioning scrapy script to extract data from a website. During setup, the target site banned me based on my USER_AGENT information. I subsequently added a RotateUserAgentMiddleware to rotate the USER_AGENT randomly. This works great.
However, now when I trying to use the scrapy shell to test xpath and css requests, I get a 403 error. I'm sure this is because the USER_AGENT of the scrapy shell is defaulting to some value the target site has blacklisted.
Question: is it possible to fetch a URL in the scrapy shell with a different USER_AGENT than the default?
fetch(' http://www.test ') [add something ?? to change USER_AGENT]
Thx
scrapy shell -s USER_AGENT='custom user agent' 'http://www.example.com'
Inside the scrapy shell, you can set the User-Agent
in the request
header
.
url = 'http://www.example.com'
request = scrapy.Request(url, headers={'User-Agent': 'Mybot'})
fetch(request)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.