Scrapy在parse中没有yield request后直接停止

Question

I'm trying to make a spider that goes through a certain amount of start urls and if the resulting page is the right one I yield another request.我正在尝试制作一个通过一定数量的起始 url 的蜘蛛，如果结果页面是正确的，我会产生另一个请求。 The problem is that if I try anyway of not yielding a second request the spider will stop directly.问题是，如果我尝试不产生第二个请求，蜘蛛将直接停止。 There are no problems if I yield the second request.如果我 yield 第二个请求没有问题。

Here is the relevant code:这是相关代码：

def start_requests(self):
        urls = ['https://www.hltv.org' + player for player in self.hashPlayers]
        print(len(urls))
        for url in urls:
            return [scrapy.Request(url=url, callback=self.parse)]

    def parse(self, response):
        result = response.xpath("//div[@class = 'playerTeam']//a/@href").get()
        if result is None:
            result = response.xpath("//span[contains(concat(' ',normalize-space(@class),' '),' profile-player-stat-value bold ')]//a/@href").get()

        if result is not None:
            yield scrapy.Request(
                url = "https://www.hltv.org" + result,
                callback = self.parseTeam
            )

So I want a way to make the spider to continue after I call the parse function and don't yield a request.所以我想要一种方法让蜘蛛在我调用解析 function 之后继续并且不产生请求。

Answer 1

def start_requests(self):
    urls = ['https://www.hltv.org' + player for player in self.hashPlayers]
    print(len(urls))
    for url in urls:
        return [scrapy.Request(url=url, callback=self.parse)]

If you use return , the function is terminated, the loop won't iterate to the next value and a single request will be sent to the Scrapy Engine.如果您使用return ， function 将终止，循环将不会迭代到下一个值，并且单个请求将发送到 Scrapy 引擎。 Replace it with yield so it returns a generator.用yield替换它，这样它就返回一个生成器。

Scrapy在parse中没有yield request后直接停止

问题描述

1 个解决方案

解决方案1
1 2020-09-20 20:04:02

Scrapy在parse中没有yield request后直接停止

问题描述

1 个解决方案

解决方案1 1 2020-09-20 20:04:02

解决方案1
1 2020-09-20 20:04:02