当带有extract()的Scrapy选择器返回None时如何设置默认值？

Question

I am trying to yield the value of a tag that isn't always present in the pages that I scrape with Scrapy.我试图产生一个标签的价值，这个标签并不总是出现在我用 Scrapy 抓取的页面中。 I am using the extract() function rather than extract_first() .我使用的是extract()函数而不是extract_first() 。 Therefore I cannot seem to set a default value, like suggested in this SO post .因此，我似乎无法设置默认值，就像这篇 SO post 中建议的那样。

This doesn't work:这不起作用：

def parse(self, response):
        yield {
          'comments': response.css('[itemprop=commentCount]::attr(content)').extract(default=None)
          }

How can I set None as default when I want to use extract() rather than extract_first() ?当我想使用extract()而不是extract_first()时，如何将None设置为默认值？

Thanks very much in advance!首先十分感谢！

Answer 1

Try this syntax:试试这个语法：

{'comments': response.css('[itemprop=commentCount]::attr(content)').extract() or None}

If result of response.css(CSS) is empty list, then None will be assigned as value of comments key.如果response.css(CSS)是空列表，则None将被分配为comments键的值。 Otherwise, actual value will be assigned否则，将分配实际值

Answer 2

.extract() yields the output as a list and .extract_first() yields a string. .extract()将输出作为列表产生，而.extract_first()产生一个字符串。

response.xpath('xpath_of_the_component').extract_first(default="default_value").split()

This line of code will again convert the string to a list and set the default value, if not available.这行代码将再次将字符串转换为列表并设置默认值（如果不可用）。

当带有extract()的Scrapy选择器返回None时如何设置默认值？

问题描述

2 个解决方案

解决方案1
4 2018-11-10 11:07:08

解决方案2
1 2020-06-23 17:56:32

当带有extract()的Scrapy选择器返回None时如何设置默认值？

问题描述

2 个解决方案

解决方案1 4 2018-11-10 11:07:08

解决方案2 1 2020-06-23 17:56:32

解决方案1
4 2018-11-10 11:07:08

解决方案2
1 2020-06-23 17:56:32