[英]how to scrape all the pages of this link
我想抓取此鏈接的所有頁面, http://www.jobisjob.co.uk/search ? directUserSearch=true&whatInSearchBox=&whereInSearchBox=london 。
我嘗試了不同的方法,但沒有得到任何解決方案。
下面是我的代碼
import scrapy
class jobisjobSpider(scrapy.Spider):
enter code here
name = 'jobisjob'
allowed_domains = ['jobisjob.co.uk']
start_urls = ['http://www.jobisjob.co.uk/search?directUserSearch=true&whatInSearchBox=&whereInSearchBox=london']
def parse(self, response):
for sel in response.xpath('//div[@id="ajax-results"]/div[@class="offer_list "]/div[@class="box_offer"]/div[@class="offer"]'):
item = JobgoItem()
item['title'] = sel.xpath('strong[@class="title"]/a/text()').extract()
item['description'] = sel.xpath('p[@class="description"]/text()').extract()
item['company'] = sel.xpath('p[@class="company"]/span[@itemprop="hiringOrganization"]/a[@itemprop="name"]/text()').extract()
item['location'] = sel.xpath('p[@class="company"]/span/span[@class="location"]/span/text()').extract()
yield item
next_page = response.css("div.wrap paginator results > ul > li > a[rel='nofollow']::attr('href')")
if next_page:
url = response.urljoin(next_page[0].extract())
print "next page: " + str(url)
yield scrapy.Request(url)
任何人都可以幫助解決這個問題,我是python的新手
您在下一頁選擇器中出錯。 您當前的選擇器使用名稱wrap
搜索標簽,然后使用類wrap
搜索div
內的paginator
。
右選擇器是
div.wrap.paginator.results > ul > li > a:last-child[rel='nofollow']::attr('href')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.