[英]Is there a way to process scrapy.Request object in the shell?
In the terminal, I ran在终端,我跑了
scrapy startproject tutorial
I created the following spider in the spiders
folder我在
spiders
文件夹中创建了以下蜘蛛
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ['http://quotes.toscrape.com/page/1/']
In the terminal, I ran在终端,我跑了
scrapy shell 'http://quotes.toscrape.com/page/1/'
This all works fine as in the Python shell that opens up I get这一切都很好,就像在打开的 Python shell 中一样,我得到了
>>> response
<200 http://quotes.toscrape.com/page/1/>
Now, I ran现在,我跑了
>>> next_page = response.css('li.next a::attr(href)').extract_first()
>>> next_page
'/page/2/'
>>> response.follow(next_page)
<GET http://quotes.toscrape.com/page/2/>
>>> type(response.follow(next_page))
<class 'scrapy.http.request.Request'>
I would like to get a new Response
object in the shell, based on the link to next_page
.我想根据指向
next_page
的链接在 shell 中获取一个新的Response
对象。 Is this possible at all?这可能吗? Any help very much appreciated.
非常感谢任何帮助。
I tried the below already, but couldn't fix the error.我已经尝试了以下方法,但无法修复错误。
>>> scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware.process_request(response.follow(next_page), "quotes")
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: process_request() missing 1 required positional argument: 'spider'
使用fetch()
:
>>> fetch(response.follow(next_page))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.