简体   繁体   中英

Can't go to next page with Scrapy

I want to move on to the next page after scraping an eventsbrite page but it isn't working even after using Scrapy's Crawlspider.

here's the code to traverse the pages

 allowed_domains = ["eventbrite.com"]
start_urls = ["https://www.eventbrite.com/d/nigeria--lagos/events/?crt=regular&end_date=01%2F31%2F2018&page=1&sort=best&start_date=12%2F01%2F2017",
]    
 def parse(self, response):
    events = Selector(response).xpath('//div[@class="list-card-v2 l-mar-top-2 js-d-poster"]')

    for event in events:
        name = event.xpath('a/div[@class="list-card__body"]/div[@class="list-card__title"]/text()').extract()
        venue = event.xpath('a/div[@class="list-card__body"]/div[@class="list-card__venue"]/text()').extract()
        date = event.xpath('a/div[@class="list-card__body"]/time[@class="list-card__date"]/text()').extract()
        event_type = event.xpath('a/div[@class="list-card__header"]/span/text()').extract()
        category = event.xpath('div/div[@class="list-card__tags"]/a/text()').extract()
        image= event.xpath('a/div[@class="list-card__header"]/div/img[@class="js-poster-image"]').extract()
        image_url= event.xpath('a/div[@class="list-card__header"]/div/img[@class="js-poster-image"]/@src').extract()

        name = ''.join(name).replace('\n', '').strip()
        date = ''.join(date).replace('\n', '').strip()
        venue = ''.join(venue).replace('\n', '').strip()


        yield EventsItem(name=name, venue=venue, date=date,
                         event_type=event_type, category=category,
                         image_urls=image_url, images=image)

        next_page = response.xpath('//a[@data-automation="next-page"]/@href').extract_first()
        if next_page is not None:
            next_page = response.urljoin(next_page)
            yield scrapy.Request(next_page, callback=self.parse)

here's an image of the the element. I dont know if it's because the href attribute is empty or a wrong xpath.

下一页html元素的图像

Any help is welcomed, thank you.

In place of the last line which is:

yield scrapy.Request(next_page, callback=self.parse)

Try this one:

yield scrapy.Request(next_page, callback=self.parse, dont_filter=True)

Note: Be careful about the allowed URLs. In some cases they should not contain http or https . In those cases use, for example, google.com instead of https://www.google.com .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM