简体   繁体   中英

Scrapy: How to do I prevent a yield request with a conditional item value?

I'm parsing a list of urls, and I want to avoid saving some url resulted item on the condition of some its value. My code is something like this:

start_urls = [www.rootpage.com]
def parse(self,response):
    item = CreatedItem()
    url_list = response.xpath('somepath').extract()
    for url in url_list:
        request =  scrapy.Request(item['url'],callback=self.parse_article)
        request.meta['item'] = item
        yield request

 def parse_article(self,response):
     item = response.meta['item']
     item['parameterA'] = response.xpath('somepath').extract()
     yield item

Now I want that in case item['parameterA'] follows a condition, there is no need to "yield request" (so that no saving for this url occurs). I tried add a conditional like:

    if item['parameterA'] == 0:
       yield item

but as expected it does not work, because scrapy continues the loop even before the request is performed.

From what I understand, you should make the decision inside the parse_article method:

def parse_article(self,response):
    item = response.meta['item']
    item['parameterA'] = response.xpath('somepath').extract_first()

    if item['parameterA'] != "0":
        yield item

Note the use of the extract_first() and the quotes around 0 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM