[英]How to write scraped data into a CSV file in Scrapy?
我試圖通過提取子鏈接及其標題來抓取網站,然后將提取的標題及其相關鏈接保存到CSV文件中。 我運行以下代碼,創建了CSV文件,但它為空。 有什么幫助嗎?
我的Spider.py文件如下所示:
from scrapy import cmdline
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
class HyperLinksSpider(CrawlSpider):
name = "linksSpy"
allowed_domains = ["some_website"]
start_urls = ["some_website"]
rules = (Rule(LinkExtractor(allow=()), callback='parse_obj', follow=True),)
def parse_obj(self, response):
items = []
for link in LinkExtractor(allow=(), deny=self.allowed_domains).extract_links(response):
item = ExtractlinksItem()
for sel in response.xpath('//tr/td/a'):
item['title'] = sel.xpath('/text()').extract()
item['link'] = sel.xpath('/@href').extract()
items.append(item)
return items
cmdline.execute("scrapy crawl linksSpy".split())
我的pipelines.py是:
import csv
class ExtractlinksPipeline(object):
def __init__(self):
self.csvwriter = csv.writer(open('Links.csv', 'wb'))
def process_item(self, item, spider):
self.csvwriter.writerow((item['title'][0]), item['link'][0])
return item
我的items.py是:
import scrapy
class ExtractlinksItem(scrapy.Item):
# define the fields for your item here like:
title = scrapy.Field()
link = scrapy.Field()
pass
我還更改了settings.py:
ITEM_PIPELINES = {'extractLinks.pipelines.ExtractlinksPipeline': 1}
要輸出所有scrapy數據,其內置功能稱為Feed Exports 。
FEED_FORMAT
您需要做的是settings.py
文件中的兩個設置: FEED_FORMAT
-Feed的保存格式(在您的情況下為csv)和FEED_URI
-Feed的保存位置,例如~/my_feed.csv
我的相關答案包括一個用例:
https://stackoverflow.com/a/41473241/3737009
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.