简体   繁体   English

使用 scrapy 将数据存储在 csv 中

[英]Issue storing data in csv using scrapy

Below is my parse method of scrapy spider.下面是我对scrapy蜘蛛的解析方法。 My expected output in csv is three columns with corresponding values.我在 csv 中预期的 output 是具有相应值的三列。 Although in terminal output I get all the three columns (even it shows 84 items stored in output.csv, which correct).虽然在终端 output 中我得到了所有三列(即使它显示了 84 个项目存储在 output.csv 中,这是正确的)。 but in actual output file I only 1st column "Title. help appreciated但在实际的 output 文件中,我只有第一列“标题。帮助表示赞赏

EDIT:In JSON all the data is there编辑:在 JSON 中,所有数据都在那里

    def parse(self, response):
        for titl in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
            title = titl.strip()
            yield {"Title":title}
        for dep in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
            department = dep.strip()
            yield{"Department":department}
        for countr in response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract():
            country = countr.strip()
            yield{"Country":country}
scrapy crawl task -o output.csv

Complete code:完整代码:

class TaskUs(scrapy.Spider):
    name = 'task'
    start_urls = ["https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0"]

    # def start_requests(self):
    #     for URL in self.start_urls:
    #         yield scrapy.Request(url=URL, meta={'proxy': 'http://103.241.227.108:6666'}, callback=self.parse)

    def parse(self, response):
        # for titl in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
        #     title = titl.strip()
        #     yield {"Title":title}
        # for dep in response.xpath('//span[@class="jv-job-list-category"]/text()').extract():
        #     department = dep.strip()
        #     yield{"Department":department}
        # for countr in response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract():
        #     country = countr.strip()
        #     yield{"Country":country}
        ti = response.xpath('//span[@class="jv-job-list-title"]/text()').extract()
        de = response.xpath('//span[@class="jv-job-list-category"]/text()').extract()
        co = response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract()
        yield{'titl':ti, 'Depa': de, "Cou": co}

Here is the solution:这是解决方案:

CODE:代码:

import scrapy
class TaskUs(scrapy.Spider):
    name = 'task'
    start_urls = ["https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0"]

    def parse(self, response):
        tables = response.xpath('//*[@class="jv-job-list jv-search-list"]/tbody/tr')
        for table in tables:
            yield {
                'Title':table.xpath('.//*[@class="jv-job-list-name"]/span[1]/text()').get()
            }

OUTPUT: OUTPUT:

{'Title': 'Real Time Analyst'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'VP of Workforce Analytics'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Management Positions'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.engine] INFO: Closing spider (finished)
2021-08-15 21:51:14 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
 'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 1,
 'downloader/request_bytes': 686,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 15293,
 'downloader/response_count': 1,
 'downloader/response_status_count/200'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM