简体   繁体   中英

Exporting data to csv after scraping data using scrapy

Made this scraper that scrapes data correctly but the problem is with exporting it to csv. The default - o filname.csv doesn't paste data in the correct order. Need some guidance to do it.The item['name'] should in first column and item['link'] in second. This is the code.

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import re
from ..items import WebscItem


class YuSpider(CrawlSpider):
    name = 'yu'
    allowed_domains = ['farfeshplus.com',
                       'wintv.live']
    start_urls = ['https://www.farfeshplus.com/Video.asp?ZoneID=297']

    rules = (
        Rule(LinkExtractor(restrict_xpaths='//td[@class="text6"]'), callback='parse_item', follow=True),

    )

    def parse_item(self, response):
        items = WebscItem()
        for url in response.xpath('//html'):
            items['name'] = url.xpath('//h1/div/text()').extract()

            yield items

            frames = url.xpath('//iframe[@width="750"]/@src').extract_first()

            yield scrapy.Request(url=frames, callback=self.parse_frame)

    def parse_frame(self, response):
        items = WebscItem()
        URL = response.xpath('//body/script').extract_first()

        
        mp4 = re.compile(r"(?<=mp4:\s\[\')(.*)\'\]")
        link = mp4.findall(URL)[0]
       
        items['link'] = link
        yield items

You need to use FEED_EXPORT_FIELDS in your settings.py

If you want to export data to a csv you could maybe use Pandas.

First you should make a Pandas-Dataframe from your and then you can export this dataframe to a csv:

from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
                   'mask': ['red', 'purple'],
                   'weapon': ['sai', 'bo staff']})
df.to_csv()

I'm not sure if this is what you are looking for

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM