如何抓取此鏈接的所有頁面

Question

我想抓取此鏈接的所有頁面， http://www.jobisjob.co.uk/search ? directUserSearch=true&whatInSearchBox=&whereInSearchBox=london 。

我嘗試了不同的方法，但沒有得到任何解決方案。

下面是我的代碼

import scrapy

    class jobisjobSpider(scrapy.Spider):
    
        enter code here
        name = 'jobisjob'
        allowed_domains = ['jobisjob.co.uk']
    
        start_urls = ['http://www.jobisjob.co.uk/search?directUserSearch=true&whatInSearchBox=&whereInSearchBox=london']
    
    
        def parse(self, response):
            
            for sel in response.xpath('//div[@id="ajax-results"]/div[@class="offer_list "]/div[@class="box_offer"]/div[@class="offer"]'):
                
                item = JobgoItem()
                item['title'] = sel.xpath('strong[@class="title"]/a/text()').extract()
                item['description'] = sel.xpath('p[@class="description"]/text()').extract()
                item['company'] = sel.xpath('p[@class="company"]/span[@itemprop="hiringOrganization"]/a[@itemprop="name"]/text()').extract()
                item['location'] = sel.xpath('p[@class="company"]/span/span[@class="location"]/span/text()').extract()
    
                
                yield item
    
            next_page = response.css("div.wrap paginator results > ul > li > a[rel='nofollow']::attr('href')")
            if next_page:
    
                url = response.urljoin(next_page[0].extract())
                print "next page: " + str(url)
    
                yield scrapy.Request(url)

任何人都可以幫助解決這個問題，我是python的新手

Answer 1

您在下一頁選擇器中出錯。 您當前的選擇器使用名稱wrap搜索標簽，然后使用類wrap搜索div內的paginator 。

右選擇器是

div.wrap.paginator.results > ul > li > a:last-child[rel='nofollow']::attr('href')

如何抓取此鏈接的所有頁面

問題描述

1 個解決方案

解決方案1
0 2016-06-23 11:32:59

如何抓取此鏈接的所有頁面

問題描述

1 個解決方案

解決方案1 0 2016-06-23 11:32:59

解決方案1
0 2016-06-23 11:32:59