[英]Scrapy: How to do I prevent a yield request with a conditional item value?
I'm parsing a list of urls, and I want to avoid saving some url resulted item on the condition of some its value.我正在解析一个 url 列表,我想避免在某些值的条件下保存一些 url 结果项。 My code is something like this:我的代码是这样的:
start_urls = [www.rootpage.com]
def parse(self,response):
item = CreatedItem()
url_list = response.xpath('somepath').extract()
for url in url_list:
request = scrapy.Request(item['url'],callback=self.parse_article)
request.meta['item'] = item
yield request
def parse_article(self,response):
item = response.meta['item']
item['parameterA'] = response.xpath('somepath').extract()
yield item
Now I want that in case item['parameterA'] follows a condition, there is no need to "yield request" (so that no saving for this url occurs).现在我希望在 item['parameterA'] 遵循条件的情况下,不需要“yield request”(这样就不会保存此 url 的内容)。 I tried add a conditional like:我尝试添加一个条件,如:
if item['parameterA'] == 0:
continue
else:
yield item
but as expected it does not work, because scrapy continues the loop even before the request is performed.但正如预期的那样,它不起作用,因为即使在执行请求之前,scrapy 也会继续循环。
From what I understand, you should make the decision inside the parse_article
method:据我了解,您应该在parse_article
方法中做出决定:
def parse_article(self,response):
item = response.meta['item']
item['parameterA'] = response.xpath('somepath').extract_first()
if item['parameterA'] != "0":
yield item
Note the use of the extract_first()
and the quotes around 0
.请注意extract_first()
的使用和0
周围的引号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.