How to scrape data from multiple pages using Scrapy?

Question

I'm trying to scrape data from multiple pages using Scrapy. I'musing the code below, what am I doing wrong?

import scrapy 
class CollegeSpider(scrapy.Spider):

    name = 'college'
    allowed_domains = ['https://engineering.careers360.com/colleges/list-of-engineering-colleges-in-India?sort_filter=alpha']
    start_urls = ['https://engineering.careers360.com/colleges/list-of-engineering-colleges-in-India?sort_filter=alpha/']
    def parse(self,response):
        for college in response.css('div.title'):
            if college.css('a::text').extract_first():
                yield {'college_name':college.css('a::text').extract_first()}
    next_page_url=response.css('li.page-next>a::attr(href)').extract_first()
    next_page_url=response.urljoin(next_page_url)
    yield scrapy.Request(url=next_page_url,callback=self.praise)

Answer 1

Why do you think you are doing something wrong? Does it show any error? If so, the output should be included in the question in the first place. If it's not doing what you expected, again, you should tell us.

Anyway, looking at the code, there are at least two possible errors:

allowed_domains should be just a domain name, not full URL, as documented .
when you yield new Request to the next page, you should give callback=self.parse instead of self.praise to process the response the same way as the first URL

How to scrape data from multiple pages using Scrapy?

Question

1 answers

solution1
0 2018-02-19 07:09:39

How to scrape data from multiple pages using Scrapy?

Question

1 answers

solution1 0 2018-02-19 07:09:39

solution1
0 2018-02-19 07:09:39