简体   繁体   中英

Xpath or css selector - scrapy

Am trying to select a "next" navigation link and cannot seem to find the right combination selector in scrapy.

This is the web url: search page on boat listing site

the link I'm trying to select is this tag:

<a rel="nofollow" class="icon-chevron-right " href="/boats-for-sale/condition-used/type-power/class-power-sport-fishing/?year=2006-2014&amp;length=40-65&amp;page=2"><span class="aria-fixes">2</span></a>

I've tried many combinations of response.xpath and response.css selectors but can't seem to find the right combination.

Using google chrome inspector, I get this xpath: //*[@id="root"]/div[2]/div[2]/div[2]/div/div[3]/a[9]

Ultimately, I'm trying to get the href attribute of the tag which contains the URL I want to follow.

Am I running into problems with the rel='nofollow' attribute and a scrapy setting?

EDIT - this code used to work but now get an error on the css selector:

def parse(self, response):

        listing_objs =  response.xpath("//div[@class = 'listings-container']/a")
        for listing in listing_objs:

            yield response.follow(listing.attrib['href'], callback= self.parse_detail)

        next_page = response.css("a.icon-chevron-right").attrib['href']

        if next_page is not None:
            yield response.follow(next_page, callback = self.parse)

In this case you can access any page of the website bye adding &page=# at the end of URL, this approach will satisfy accessing next page content after current page have been crawled.
For instance you can do something like this:

def start_request(self):
    main_url = "https://www.yachtworld.com/boats-for-sale/condition-used/type-power" \
        "/class-power-sport-fishing/?year=2006-2014&length=40-65&page=%(page)s"
    for i in range(pages):
        yield scrapy.Request(main_url % {'page': i}, callback=self.parse)

@Piron's answer above is probably the easiest way to iterate over pages, but should you still want to go the Xpath route:

response.xpath(".//div[@class='search-page-nav']/a[@class='icon-chevron-right']/@href/text()")

Where search-page-nav is the parent div class of the other page links, icon-chevron-right is the particular class of the a tag you're looking for, @href selects the link of that a tag, and text() converts the attribute to text.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM