简体   繁体   中英

Scrapy/Python: Replace empty string

So here is my Scrapy crawler code. I am trying to extract meta data values from a website. No metadata appears more than once on a page.

class MySpider(BaseSpider):
    name = "courses"
    start_urls = ['http://www.example.com/listing']
    allowed_domains = ["example.com"]
    def parse(self, response):
     hxs = Selector(response)
    #for courses in response.xpath(response.body):
     for courses in response.xpath("//meta"):
     yield {
                'ScoreA': courses.xpath('//meta[@name="atarbur"]/@content').extract_first(),
                'ScoreB': courses.xpath('//meta[@name="atywater"]/@content').extract_first(),
                'ScoreC': courses.xpath('//meta[@name="atarsater"]/@content').extract_first(),
                'ScoreD': courses.xpath('//meta[@name="clearlywaur"]/@content').extract_first(),
               }
     for url in hxs.xpath('//ul[@class="scrapy"]/li/a/@href').extract():
      yield Request(response.urljoin(url), callback=self.parse)

So what I am trying to achieve is that if the values of any of the Scores is an empty string (''), I want to repalce it with 0 (zero). I am not sure how to add conditional logic inside the 'yield' block.

Any help is very appreciated.

Thanks

extract_first() method has an optional parameter for default value, however in your case you can just use or expression:

foo = response.xpath('//foo').extract_first('').strip() or 0

in this case if extract_first() returns a string without any text it will evaluate to `False so the latest member of the evalution(0) will be taken instead.

To convert the string type to something else try:

foo = int(response.xpath('//foo').extract_first('').strip() or 0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM