简体   繁体   中英

How to constructing url in start_urls list in scrapy framework python

I am new to scrapy and python.
In my case:

Page A:

http://www.example.com/search?keyword=city&style=1&page=1  
http://www.example.com/search?keyword=city&style=1&page=2  
http://www.example.com/search?keyword=city&style=1&page=3 

Rules is:

    `for i in range(50):
        "http://www.example.com/search?keyword=city&style=1&page=%s" % i`

Page B:

http://www.example.com/city_detail_0001.html  
http://www.example.com/city_detail_0100.html  
http://www.example.com/city_detail_0053.html  

No rules, Because Page B is match the keyword for search.

So, This means,If I want grab some information from Page B,
First, I must use the Page A to sifting link of the Page B.
In the past, I usually two step:
1. I create scrapy A, and grab the Page B's link in a txt file
2. And in scrapy B, I read the txt file to the "start_urls"

Now, can u please guide me, that how can i construct the "start_urls" in one spider?

the start_requests method is what you need. After that, keep passing the requests and parse the response bodies on callback methods.

class MySpider(Spider):
    name = 'example'

    def start_requests(self):
        for i in range(50):
            yield Request('myurl%s' % i, callback=self.parse)

    def parse(self, response):
        # get my information for page B
        yield Request('pageB', callback=self.parse_my_item)

    def parse_my_item(self, response):
        item = {}
        # real parsing method for my items    
        yield item

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM