[英]Python - How do I format scrapy data in a csv file?
我是 python 和 web 抓取的新手,我嘗試將抓取的數據存儲到 csv 文件中,但是輸出並不令人滿意。
當前 csv 輸出:
Title Image
Audi,Benz,BMW Image1,Image2,Image3
我想如何在 csv 文件中查看它:
Title Image
Audi Image1
Benz Image2
BMW Image3
這是在終端中鍵入以運行它的內容:
scrapy crawl testscraper -t csv -o test.csv
這是spider.py:
class TestSpiderSpider(scrapy.Spider):
name = 'testscraper'
page_number = 2
start_urls = ['https://jamaicaclassifiedonline.com/auto/cars/']
def parse(self, response):
items = scrapeItem()
product_title = response.css('.jco-card-title::text').extract()
product_imagelink = response.css('.card-image img::attr(data-src)').getall()
items['product_title'] = product_title
items['product_imagelink'] = product_imagelink
items.append('items')
yield items
他是 items.py 的代碼:
class scrapeItem(scrapy.Item):
product_title = scrapy.Field()
product_imagelink = scrapy.Field()
pass
您可以選擇包含汽車的每個 div 元素,然后遍歷這些元素,一一生成它們。
def parse(self, response):
for car in response.css('.col.l3.s12.m6'):
item = scrapeItem()
product_title = car.css('.jco-card-title::text').get()
product_imagelink = car.css('.card-image img::attr(data-src)').get()
# Some of the elements don't contain a title or a image_link, like ads for example.
if product_title and product_imagelink:
item['product_title'] = product_title.strip().replace('\n', '')
item['product_imagelink'] = product_imagelink
yield item
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.