简体   繁体   中英

How to scrape data from multiple pages using Scrapy?

I'm trying to scrape data from multiple pages using Scrapy. I'musing the code below, what am I doing wrong?

import scrapy 
class CollegeSpider(scrapy.Spider):

    name = 'college'
    allowed_domains = ['https://engineering.careers360.com/colleges/list-of-engineering-colleges-in-India?sort_filter=alpha']
    start_urls = ['https://engineering.careers360.com/colleges/list-of-engineering-colleges-in-India?sort_filter=alpha/']
    def parse(self,response):
        for college in response.css('div.title'):
            if college.css('a::text').extract_first():
                yield {'college_name':college.css('a::text').extract_first()}
    next_page_url=response.css('li.page-next>a::attr(href)').extract_first()
    next_page_url=response.urljoin(next_page_url)
    yield scrapy.Request(url=next_page_url,callback=self.praise)

Why do you think you are doing something wrong? Does it show any error? If so, the output should be included in the question in the first place. If it's not doing what you expected, again, you should tell us.

Anyway, looking at the code, there are at least two possible errors:

  • allowed_domains should be just a domain name, not full URL, as documented .
  • when you yield new Request to the next page, you should give callback=self.parse instead of self.praise to process the response the same way as the first URL

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM