使用Scrapy将数据收集到CSV文件中

Question

I'm learning how to use Scrapy 我正在学习如何使用Scrapy

spider.py 蜘蛛

import scrapy

class TestSetSpider(scrapy.Spider):
  name = "test_spider"
  start_urls = ['https://example.html']

  def parse(self, response):
     for brickset in response.xpath('//div[@class="product-name"]'):
       yield {
         'name': brickset.xpath('h1/text()').extract_first(),
       }

I run this spider with command: scrapy crawl test_spider -o test.csv 我使用以下命令运行此蜘蛛： scrapy crawl test_spider -o test.csv

This is working fine for //div[@class="product-name" , but I don't know how to add another CSS/XPath class in the same spider file 这对于//div[@class="product-name"很好用，但我不知道如何在同一蜘蛛文件中添加另一个CSS / XPath类

I'm trying this but it does't work 我正在尝试这个，但是没有用

import scrapy

class TestSetSpider(scrapy.Spider):
  name = "test_spider"
  start_urls = ['https://example.html']

  def parse(self, response):
     for test in response.xpath('//div[@class="product-name"]'):
       yield {
         'name': test.xpath('h1/text()').extract_first(),
       }

   def parse(self, response):
     for attempt in response.xpath('//div[@class="another-class"]'):
       yield {
         'color': attempt.xpath('h1/a/text()').extract_first(),
       }

Please help me to do this. 请帮我做到这一点。

Answer 1

def parse(self, response):
    product_name_lst = []

    # we will append all data to product_name_lst

    for test in response.xpath('//div[@class="product-name"]'):
        product_name_lst.append('name': test.xpath('h1/text()').extract_first())

    another_product_name_lst = []

    # we will append all data to another_product_name_lst

    for test in response.xpath('//div[@class="another-product-name"]'):
        another_product_name_lst.append('name': test.xpath('h1/text()').extract_first())

    # after that write to out.csv all the data you need from 
    # product_name_lst and another_prodct_name_lst   lists

    out_file = open('out.csv', 'a') # a meen append to file not rewrite file

    # and here you need to write in out.csv file 
    out.write(data) # data is what you need to write

    # and close the file
    out.close()

Answer 2

Just use two for loops: 只需使用两个for循环：

import scrapy

class TestSetSpider(scrapy.Spider):
  name = "test_spider"
  start_urls = ['https://example.html']

  def parse(self, response):
     for brickset in response.xpath('//div[@class="product-name"]'):
       yield {
         'name': brickset.xpath('h1/text()').extract_first(),
       }
     for brickset in response.xpath('//div[@class="another-class"]'):
       yield {
         'name': brickset.xpath('h1/text()').extract_first(),
       }

使用Scrapy将数据收集到CSV文件中

问题描述

spider.py 蜘蛛

2 个解决方案

解决方案1
0 2018-05-20 09:46:42

解决方案2
0 已采纳 2018-05-20 10:23:48

使用Scrapy将数据收集到CSV文件中

问题描述

spider.py 蜘蛛

2 个解决方案

解决方案1 0 2018-05-20 09:46:42

解决方案2 0 已采纳 2018-05-20 10:23:48

解决方案1
0 2018-05-20 09:46:42

解决方案2
0 已采纳 2018-05-20 10:23:48