Am trying to select a "next" navigation link and cannot seem to find the right combination selector in scrapy.
This is the web url: search page on boat listing site
the link I'm trying to select is this tag:
<a rel="nofollow" class="icon-chevron-right " href="/boats-for-sale/condition-used/type-power/class-power-sport-fishing/?year=2006-2014&length=40-65&page=2"><span class="aria-fixes">2</span></a>
I've tried many combinations of response.xpath and response.css selectors but can't seem to find the right combination.
Using google chrome inspector, I get this xpath: //*[@id="root"]/div[2]/div[2]/div[2]/div/div[3]/a[9]
Ultimately, I'm trying to get the href attribute of the tag which contains the URL I want to follow.
Am I running into problems with the rel='nofollow' attribute and a scrapy setting?
EDIT - this code used to work but now get an error on the css selector:
def parse(self, response):
listing_objs = response.xpath("//div[@class = 'listings-container']/a")
for listing in listing_objs:
yield response.follow(listing.attrib['href'], callback= self.parse_detail)
next_page = response.css("a.icon-chevron-right").attrib['href']
if next_page is not None:
yield response.follow(next_page, callback = self.parse)
In this case you can access any page of the website bye adding &page=#
at the end of URL, this approach will satisfy accessing next page content after current page have been crawled.
For instance you can do something like this:
def start_request(self):
main_url = "https://www.yachtworld.com/boats-for-sale/condition-used/type-power" \
"/class-power-sport-fishing/?year=2006-2014&length=40-65&page=%(page)s"
for i in range(pages):
yield scrapy.Request(main_url % {'page': i}, callback=self.parse)
@Piron's answer above is probably the easiest way to iterate over pages, but should you still want to go the Xpath route:
response.xpath(".//div[@class='search-page-nav']/a[@class='icon-chevron-right']/@href/text()")
Where search-page-nav is the parent div class of the other page links, icon-chevron-right is the particular class of the a tag you're looking for, @href selects the link of that a tag, and text() converts the attribute to text.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.