Scrapy：如何防止带有条件项目值的收益请求？

Question

I'm parsing a list of urls, and I want to avoid saving some url resulted item on the condition of some its value.我正在解析一个 url 列表，我想避免在某些值的条件下保存一些 url 结果项。 My code is something like this:我的代码是这样的：

start_urls = [www.rootpage.com]
def parse(self,response):
    item = CreatedItem()
    url_list = response.xpath('somepath').extract()
    for url in url_list:
        request =  scrapy.Request(item['url'],callback=self.parse_article)
        request.meta['item'] = item
        yield request

 def parse_article(self,response):
     item = response.meta['item']
     item['parameterA'] = response.xpath('somepath').extract()
     yield item

Now I want that in case item['parameterA'] follows a condition, there is no need to "yield request" (so that no saving for this url occurs).现在我希望在 item['parameterA'] 遵循条件的情况下，不需要“yield request”（这样就不会保存此 url 的内容）。 I tried add a conditional like:我尝试添加一个条件，如：

    if item['parameterA'] == 0:
       continue
    else:
       yield item

but as expected it does not work, because scrapy continues the loop even before the request is performed.但正如预期的那样，它不起作用，因为即使在执行请求之前，scrapy 也会继续循环。

Answer 1

From what I understand, you should make the decision inside the parse_article method:据我了解，您应该在parse_article方法中做出决定：

def parse_article(self,response):
    item = response.meta['item']
    item['parameterA'] = response.xpath('somepath').extract_first()

    if item['parameterA'] != "0":
        yield item

Note the use of the extract_first() and the quotes around 0 .请注意extract_first()的使用和0周围的引号。

Scrapy：如何防止带有条件项目值的收益请求？

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-05-11 15:39:49

Scrapy：如何防止带有条​​件项目值的收益请求？

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-05-11 15:39:49

Scrapy：如何防止带有条件项目值的收益请求？

解决方案1
1 已采纳 2016-05-11 15:39:49