Trying to extract data using python/scrapy and not able to find the correct xpath

Question

I wanted to scrape the website.

https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab

I want to extract

Title
Location
Company

of the job postings.

I tried few xpath's for the location,company and title and nothing was working. I also tried to write it to a CSV file. All location,company and title comes out blank. I think my xpath is not correct

import scrapy


class JobItem(scrapy.Item):
    # Data structure to store the title, company name and location of the job
    title = scrapy.Field()
    company = scrapy.Field()
    location = scrapy.Field()

class stackoverflow(scrapy.Spider):
    name = 'stack_bot'
    start_urls = ['https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab']

    def parse(self, response):
        for a_el in response.xpath('//div[@class="listResults"]'):
            section = JobItem()
            section['title']   = ?
            section['company'] = ?
            section['location'] = ?
            yield section

Can anyone help me with the xpath for the title,company and location. Also the xpath('//div[@class="listResults"]') is correct.

Answer 1

I am not sure that xpath('//div[@class="listResults"]') is correct. It gives only one element. Here is my version of code:

def parse(self, response):
    for a_el in response.xpath('//div[contains(@class, "-job-summary")]'):
        section = JobItem()
        section['title']   = a_el.css('h2 a::text').get()
        section['company'] = a_el.xpath('.//div[contains(@class, "-company")]/span[1]/text()').get()
        section['location'] = a_el.xpath('.//div[contains(@class, "-company")]/span[2]/text()').get()
        yield section

Answer 2

Consider using the RSS feed as source as this will be more robust over time

https://stackoverflow.com/jobs/feed

Then you can use the following css selectors to generate lists you can list(zip()) together

titles selector: item title

companies selector: a10\\:author

locations: location

Trying to extract data using python/scrapy and not able to find the correct xpath

Question

2 answers

solution1
0 ACCPTED 2019-04-22 06:11:27

solution2
0 2019-04-22 06:46:00

Trying to extract data using python/scrapy and not able to find the correct xpath

Question

2 answers

solution1 0 ACCPTED 2019-04-22 06:11:27

solution2 0 2019-04-22 06:46:00

solution1
0 ACCPTED 2019-04-22 06:11:27

solution2
0 2019-04-22 06:46:00