when I run this code, the spider only crawls 2 pages and stop. It doesn't go to the next page.
# -*- coding: utf-8 -*-
import scrapy
class P1Spider(scrapy.Spider):
name = 'p1'
allowed_domains = ['www.visit.ferienmesse.ch']
start_urls = ['https://www.visit.ferienmesse.ch/de/aussteller']
def parse(self, response):
for data in response.xpath('//ul[@class="ngn-search-list ngn-mobile-filter"]/li'):
yield {
'Link': response.urljoin(data.xpath('.//h2[@class="ngn-content-box-title"]/a/@href').get()),
'Title': data.xpath('//h2[@class="ngn-content-box-title"]/a/bdi/text()').get(),
'Address': data.xpath('.//span[@class="ngn-hallname"]/text()').get(),
'Code': data.xpath('.//span[@class="ngn-stand"]/text()').get()
}
next_page = response.xpath('//li[@class="arrow "]/a/@href').get()
if next_page:
yield scrapy.Request(url=response.urljoin(next_page), callback=self.parse)
Change the next page selector to this and see if it works:
next_page = response.css('.pagination li.arrow a[rel="next"]::attr(href)').get()
Reason
From the second page on, you have 2 li with the class of arrow
.
You can read more about selectors here: https://docs.scrapy.org/en/latest/topics/selectors.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.