简体   繁体   中英

Scrapy - How to export a cvs file with item key in header

I don't want to use the -o command to export csv but create it from my scrapy script. My csv file does export well with items, but I don't have the header. I would like to have a header whom correspond to my items' keys.

  • How I fix a header with items' keys ?

I saw in several forums and tutorials that header has to be defined in pipelines.py. I tried different solutions with open_spiders but it didn't work.

Here is my pipelines.py code :

class CsvWriterPipeline(object):
    def __init__(self):
        self.csvwriter = csv.writer(open(fichier1, 'wb'))

    def open_spider(self, spider):
        header_keys = item.fields.keys()
        self.csvwriter.writerow(header_keys)

    def process_item(self, item, spider):
        self.csvwriter.writerow(
            [item['nom_course'][0],
            item['nom_evenement'][0],
            item['distance'][0],
            item['date'][0],
            item['contact_1'][0],
            item['contact_2'][0],
            item['organisateur'][0],
            item['site_internet_evenement'][0],
            item['description'][0],
            item['prix'][0],
            item['nb_participant'][0],
            item['URL_Even'][0],
            item['pays'][0],
            item['region'][0],
            item['ville'][0],
            item['tag'][0]])
        return item 

settings.py

BOT_NAME = 'AHOTU_V2'

SPIDER_MODULES = ['AHOTU_V2.spiders']
NEWSPIDER_MODULE = 'AHOTU_V2.spiders'
ITEM_PIPELINES = {
    'AHOTU_V2.pipelines.CsvWriterPipeline': 800,
}

ROBOTSTXT_OBEY = True

When you open your spider there is no item at all. So below function doesn't work

def open_spider(self, spider):
    header_keys = item.fields.keys()
    self.csvwriter.writerow(header_keys)

What you should rather do is have a field to check if headers are written or not

class CsvWriterPipeline(object):
    def __init__(self):
        self.csvwriter = None 
        self.headers_written = False

    def open_spider(self, spider):
        self.csvwriter = csv.writer(open(fichier1, 'wb'))

    def process_item(self, item, spider):
        if not self.headers_written:
           header_keys = item.fields.keys()
           self.csvwriter.writerow(header_keys)
           self.headers_written = True

        self.csvwriter.writerow(
            [item['nom_course'][0],
            item['nom_evenement'][0],
            item['distance'][0],
            item['date'][0],
            item['contact_1'][0],
            item['contact_2'][0],
            item['organisateur'][0],
            item['site_internet_evenement'][0],
            item['description'][0],
            item['prix'][0],
            item['nb_participant'][0],
            item['URL_Even'][0],
            item['pays'][0],
            item['region'][0],
            item['ville'][0],
            item['tag'][0]])
        return item 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM