How to change the sequence of requests in scrapy?

Question

I am trying to scrap multiple page into one item:

A
|-- a
|-- b
|-- c
B
|-- a
...

By scraping page A and its subpages (a, b, c) I'll get 1 item. My code is huge but here is the shrinked version:

class MySpider(scrapy.Spider):
    def parse(self, response):
        for li in response.xpath('//li'):
            item = MyItem()
            ...
            meta = {
                'item': item,
                'href': href,
            }
            url = response.urljoin(href + '?a')
            yield scrapy.Request(url, callback=self.parse_a, meta=meta)

    def parse_a(self, response):
        ...

        url = response.urljoin(href + '?b')
        yield scrapy.Request(url, callback=self.parse_b, meta=meta)


    def parse_b(self, response):
        ...

        url = response.urljoin(href + '?c')
        yield scrapy.Request(url, callback=self.parse_c, meta=meta)


    def parse_c(self, response):
        ...
        yield item

Script works fine but here is the problem: Crawler scrabs pages in following order: A, B, C, Aa, Ba, Ca, Ab, Bb, ... since there are too many pages to scrab nothing is saved until all of the pages are scrabed. And when I change yield to return on parse method it scrabs the way I want A, Aa, Ab, Ac but it doesn't scrab B, C, ...

Answer 1

如果要强制执行这种类型的订单，我现在想到的唯一方法是在Item Pipeline中指定订单，以便您将返回Ac Bc Cc ...

How to change the sequence of requests in scrapy?

Question

1 answers

solution1
1 2016-01-12 23:31:26

How to change the sequence of requests in scrapy?

Question

1 answers

solution1 1 2016-01-12 23:31:26

solution1
1 2016-01-12 23:31:26