Python Scrapy - scrapy.请求不工作

Question

My scraper crawls 0 pages and I think the problem resides in the last line of code in the parse method:我的抓取器抓取了 0 个页面，我认为问题出在 parse 方法的最后一行代码中：

def parse(self, response):
        all_companies = response.xpath('//header[@class = "card-header"]')

        for company in all_companies:
            company_url = company.xpath('./a[@class = "card-header-scorecard"]/@href').extract_first()
            yield scrapy.Request(url=company_url, callback = self.parse_company)

I tested the retrieval of the company_url with the scraps shell and they are all returned correctly.我用碎片 shell 测试了 company_url 的检索，它们都正确返回。 The scraper accesses each of those urls and scrapes the items using the parse_company method.抓取器访问每个 url 并使用 parse_company 方法抓取项目。

Before using yield I was using the Rule feature and it worked perfectly together with parse_company so I know this method works, however I had to change my approach out of necessity.在使用 yield 之前，我使用的是 Rule 功能，它与 parse_company 一起工作得很好，所以我知道这个方法有效，但是我不得不改变我的方法。

rules = (
    Rule(LinkExtractor(restrict_css=".card-header > a"), callback="parse_company")
)

Answer 1

You are using CrawlSpider and in latest versions of scrapy CrawlSpider's default callback is _parse instead of parse .您正在使用CrawlSpider ，在 scrapy 的最新版本中，CrawlSpider 的默认回调是_parse而不是parse 。 If you want to override default callback then use _parse or you can use scrapy.Spider instead of scrapy.CrawlSpider如果你想覆盖默认回调然后使用_parse或者你可以使用scrapy.Spider而不是scrapy.CrawlSpider

Python Scrapy - scrapy.请求不工作

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-12-06 10:34:31

Python Scrapy - scrapy.请求不工作

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-12-06 10:34:31

解决方案1
1 已采纳 2022-12-06 10:34:31