如何反转 Python 中的列表顺序并在返回 None 时停止收益？

Question

I am generating pagination links which I suspect exists with Python 3.x:我正在生成我怀疑 Python 3.x 存在的分页链接：

start_urls = [
    'https://...',
    'https://...' # list full of URLs
]

def start_requests(self):
    for url in self.start_urls:
        yield scrapy.Request(
            url = url,
            meta={'handle_httpstatus_list': [301]},
            callback=self.parse,
        )

def parse(self, response):
    for i in range(1, 6):
        url = response.url + '&pn='+str(i)
        yield scrapy.Request(url, self.parse_item)

def parse_item(self, response):

        # check if no results page
        if response.xpath('//*[@id="searchList"]/div[1]').extract_first():
            self.logger.info('No results found on  %s', response.url)
            return None
        ...

Those URLs will be processed by scrapy in parse_item.这些 URL 将由 parse_item 中的 scrapy 处理。 Now there are 2 problems:现在有2个问题：

The order is reverse and I do not understand why.顺序是相反的，我不明白为什么。 It will request pagen umbers: 5,4,3,2,1 instead of 1,2,3,4,5它将请求页码：5,4,3,2,1 而不是 1,2,3,4,5
If the no results are found on page 1, the entire series could be stoped.如果在第 1 页上没有找到任何结果，则可以停止整个系列。 Parse Item returns already "None", but the I guess I need to adapt the method "parse" to exit the for loop and continue. Parse Item 已经返回“None”，但我想我需要调整“parse”方法以退出 for 循环并继续。 How?如何？

Answer 1

The scrapy.Request you generate are running in parallel - In other words, there is guarantee for the order how you get the response as it depends on the server.您生成的scrapy.Request是并行运行的 - 换句话说，可以保证您获得响应的顺序，因为它取决于服务器。

If some of the requests, depends on the response of of a request, you should yield those requests in its parse callback.如果某些请求取决于请求的响应，则应在其解析回调中产生这些请求。

For example:例如：

def parse(self, response):
    url = response.url + '&pn='+str(1)
    yield scrapy.Request(url, self.parse_item, cb_kwargs=dict(page=1, base_url=response.url))
                             

def parse_item(self, response,page, base_url):
        # check if no results page
        if response.xpath('//*[@id="searchList"]/div[1]').extract_first():
            if page < 6:
                url = base_url + '&pn='+str(page+1)
                yield scrapy.Request(url, self.parse_item, cb_kwargs=dict(base_url=base_url,page=page+1))
        else:
            # your code
            yield ...

如何反转 Python 中的列表顺序并在返回 None 时停止收益？

问题描述

1 个解决方案

解决方案1
1 2020-12-17 08:35:29

如何反转 Python 中的列表顺序并在返回 None 时停止收益？

问题描述

1 个解决方案

解决方案1 1 2020-12-17 08:35:29

解决方案1
1 2020-12-17 08:35:29