[英]How to use new contex for each request with scrapy-playwright?
这是我的做法,但我不确定它是否为每个新请求创建和使用新上下文:
class TestSpider(scrapy.Spider):
name = 'test'
start_urls = [...]
cnt = 0
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url=url,
meta={'playwright': True,
'playwright_context': f'{self.cnt}'})
def parse(self, response):
self.cnt += 1
for res in response.xpath('//div[@id="contenu"]'):
url = res.xpath('.//h2/a/@href').get()
yield scrapy.Request(url=url,
callback=self.get_content,
meta={'playwright': True,
'playwright_context': f'{self.cnt}'})
这段代码是在做我想做的事还是错了?
self.cnt += 1
应该在发送请求之前/之后的 for 循环中,以便在发送每个请求后创建一个带有递增数字的新上下文
Class TestSpider(scrapy.Spider):
name = 'test'
start_urls = [...]
cnt = 0
def start_requests(self):
for url in self.start_urls:
self.cnt += 1 # <------ increment the count here
yield scrapy.Request(url=url,
meta={'playwright': True,
'playwright_context': f'{self.cnt}'})
def parse(self, response):
for res in response.xpath('//div[@id="contenu"]'):
url = res.xpath('.//h2/a/@href').get()
self.cnt += 1 # <------ increment the count here
yield scrapy.Request(url=url,
callback=self.get_content,
meta={'playwright': True,
'playwright_context': f'{self.cnt}'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.