I wanted to scrape the website.
https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab
I want to extract
of the job postings.
I tried few xpath's for the location,company and title and nothing was working. I also tried to write it to a CSV file. All location,company and title comes out blank. I think my xpath is not correct
import scrapy
class JobItem(scrapy.Item):
# Data structure to store the title, company name and location of the job
title = scrapy.Field()
company = scrapy.Field()
location = scrapy.Field()
class stackoverflow(scrapy.Spider):
name = 'stack_bot'
start_urls = ['https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab']
def parse(self, response):
for a_el in response.xpath('//div[@class="listResults"]'):
section = JobItem()
section['title'] = ?
section['company'] = ?
section['location'] = ?
yield section
Can anyone help me with the xpath for the title,company and location. Also the xpath('//div[@class="listResults"]')
is correct.
I am not sure that xpath('//div[@class="listResults"]')
is correct. It gives only one element. Here is my version of code:
def parse(self, response):
for a_el in response.xpath('//div[contains(@class, "-job-summary")]'):
section = JobItem()
section['title'] = a_el.css('h2 a::text').get()
section['company'] = a_el.xpath('.//div[contains(@class, "-company")]/span[1]/text()').get()
section['location'] = a_el.xpath('.//div[contains(@class, "-company")]/span[2]/text()').get()
yield section
Consider using the RSS feed as source as this will be more robust over time
https://stackoverflow.com/jobs/feed
Then you can use the following css selectors to generate lists you can list(zip()) together
titles selector: item title
companies selector: a10\\:author
locations: location
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.