使用 scrapy 将数据存储在 csv 中

Question

Below is my parse method of scrapy spider.下面是我对scrapy蜘蛛的解析方法。 My expected output in csv is three columns with corresponding values.我在 csv 中预期的 output 是具有相应值的三列。 Although in terminal output I get all the three columns (even it shows 84 items stored in output.csv, which correct).虽然在终端 output 中我得到了所有三列（即使它显示了 84 个项目存储在 output.csv 中，这是正确的）。 but in actual output file I only 1st column "Title. help appreciated但在实际的 output 文件中，我只有第一列“标题。帮助表示赞赏

EDIT:In JSON all the data is there编辑：在 JSON 中，所有数据都在那里

    def parse(self, response):
        for titl in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
            title = titl.strip()
            yield {"Title":title}
        for dep in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
            department = dep.strip()
            yield{"Department":department}
        for countr in response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract():
            country = countr.strip()
            yield{"Country":country}
scrapy crawl task -o output.csv

Complete code:完整代码：

class TaskUs(scrapy.Spider):
    name = 'task'
    start_urls = ["https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0"]

    # def start_requests(self):
    #     for URL in self.start_urls:
    #         yield scrapy.Request(url=URL, meta={'proxy': 'http://103.241.227.108:6666'}, callback=self.parse)

    def parse(self, response):
        # for titl in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
        #     title = titl.strip()
        #     yield {"Title":title}
        # for dep in response.xpath('//span[@class="jv-job-list-category"]/text()').extract():
        #     department = dep.strip()
        #     yield{"Department":department}
        # for countr in response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract():
        #     country = countr.strip()
        #     yield{"Country":country}
        ti = response.xpath('//span[@class="jv-job-list-title"]/text()').extract()
        de = response.xpath('//span[@class="jv-job-list-category"]/text()').extract()
        co = response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract()
        yield{'titl':ti, 'Depa': de, "Cou": co}

Answer 1

Here is the solution:这是解决方案：

CODE:代码：

import scrapy
class TaskUs(scrapy.Spider):
    name = 'task'
    start_urls = ["https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0"]

    def parse(self, response):
        tables = response.xpath('//*[@class="jv-job-list jv-search-list"]/tbody/tr')
        for table in tables:
            yield {
                'Title':table.xpath('.//*[@class="jv-job-list-name"]/span[1]/text()').get()
            }

OUTPUT: OUTPUT：

{'Title': 'Real Time Analyst'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'VP of Workforce Analytics'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Management Positions'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.engine] INFO: Closing spider (finished)
2021-08-15 21:51:14 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
 'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 1,
 'downloader/request_bytes': 686,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 15293,
 'downloader/response_count': 1,
 'downloader/response_status_count/200'

使用 scrapy 将数据存储在 csv 中

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-08-15 15:54:17

使用 scrapy 将数据存储在 csv 中

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-08-15 15:54:17

解决方案1
0 已采纳 2021-08-15 15:54:17