[英]Collect data into a CSV file using Scrapy
I'm learning how to use Scrapy 我正在学习如何使用Scrapy
import scrapy
class TestSetSpider(scrapy.Spider):
name = "test_spider"
start_urls = ['https://example.html']
def parse(self, response):
for brickset in response.xpath('//div[@class="product-name"]'):
yield {
'name': brickset.xpath('h1/text()').extract_first(),
}
I run this spider with command: scrapy crawl test_spider -o test.csv
我使用以下命令运行此蜘蛛:
scrapy crawl test_spider -o test.csv
This is working fine for //div[@class="product-name"
, but I don't know how to add another CSS/XPath class in the same spider file 这对于
//div[@class="product-name"
很好用,但我不知道如何在同一蜘蛛文件中添加另一个CSS / XPath类
I'm trying this but it does't work 我正在尝试这个,但是没有用
import scrapy
class TestSetSpider(scrapy.Spider):
name = "test_spider"
start_urls = ['https://example.html']
def parse(self, response):
for test in response.xpath('//div[@class="product-name"]'):
yield {
'name': test.xpath('h1/text()').extract_first(),
}
def parse(self, response):
for attempt in response.xpath('//div[@class="another-class"]'):
yield {
'color': attempt.xpath('h1/a/text()').extract_first(),
}
Please help me to do this. 请帮我做到这一点。
def parse(self, response):
product_name_lst = []
# we will append all data to product_name_lst
for test in response.xpath('//div[@class="product-name"]'):
product_name_lst.append('name': test.xpath('h1/text()').extract_first())
another_product_name_lst = []
# we will append all data to another_product_name_lst
for test in response.xpath('//div[@class="another-product-name"]'):
another_product_name_lst.append('name': test.xpath('h1/text()').extract_first())
# after that write to out.csv all the data you need from
# product_name_lst and another_prodct_name_lst lists
out_file = open('out.csv', 'a') # a meen append to file not rewrite file
# and here you need to write in out.csv file
out.write(data) # data is what you need to write
# and close the file
out.close()
Just use two for
loops: 只需使用两个
for
循环:
import scrapy
class TestSetSpider(scrapy.Spider):
name = "test_spider"
start_urls = ['https://example.html']
def parse(self, response):
for brickset in response.xpath('//div[@class="product-name"]'):
yield {
'name': brickset.xpath('h1/text()').extract_first(),
}
for brickset in response.xpath('//div[@class="another-class"]'):
yield {
'name': brickset.xpath('h1/text()').extract_first(),
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.