简体   繁体   English

Scrapy output JSON 或 CSV

[英]Scrapy output JSON or CSV

I'm trying web scraping using this code settings.py我正在尝试使用此代码settings.py抓取 web

FEED_EXPORT_ENCODING = 'utf-8'

import datetime
now = datetime.datetime.now ()
formatted = now.strftime ("%Y%m%d_%H%M")
FEED_URI = f'\\C:\\Users\\Acer\\Desktop\\{formatted}.csv'
FEED_TYPE = 'csv'

with this special_offers.py有了这个 special_offers.py

# -*- coding: utf-8 -*-
import scrapy
import datetime


class SpecialOffersSpider(scrapy.Spider):
    name = 'special_offers'
    allowed_domains = ['www.tinydeal.com']

    def start_requests(self):
        yield scrapy.Request(url='https://www.tinydeal.com/specials.html', callback=self.parse, headers={
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
        })

    def parse(self, response):
        for product in response.xpath("//ul[@class='productlisting-ul']/div/li"):
            yield {
                'title': product.xpath(".//a[@class='p_box_title']/text()").get(),
                'url': response.urljoin(product.xpath(".//a[@class='p_box_title']/@href").get()),
                'discounted_price': product.xpath(".//div[@class='p_box_price']/span[1]/text()").get(),
                'original_price': product.xpath(".//div[@class='p_box_price']/span[2]/text()").get(),
                'User-Agent': response.request.headers['User-Agent'].decode('utf-8'),
                'datetime': datetime.datetime.now().strftime("%Y%m%d %H%M")

            }

        next_page = response.xpath("//a[@class='nextPage']/@href").get()

        if next_page:
            yield scrapy.Request(url=next_page, callback=self.parse, headers={
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
            })

then I open terminal and use然后我打开终端并使用

scrapy crawl special_offers

the problem is, when I export JSON the data came without comma between }{.问题是,当我导出 JSON 时,数据在 }{ 之间没有逗号。 making my file not read by Power BI for example例如,使 Power BI 无法读取我的文件

when I export CSV the data came differente then I expect when I open using EXCEL当我导出 CSV 时,数据会有所不同,然后我期望使用 EXCEL 打开时

CSV data example {"title": "ABS Plastic Case for Raspberry Pi 3 Model B & Raspberry Pi 2 E-524988", "url": " https://www.tinydeal.com/abs-plastic-case-for-raspberry-pi-3-model-b-raspberry-pi-2-p-163950.html ", "discounted_price": "R$12.74", "original_price": "R$13.66 ", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2330"} {"title": "3M 9001 KN90 Dust Masks Respirator Anti-dust PM2.5 Industrial Construction Polle RTH-562440", "url": " https://www.tinydeal.com/3m-9001-kn90-dust-masks-respirator-anti-dust-pm25-industrial-construction-polle-p-179487.html ", "discounted_price": "R$10.29", "original_price": "R$12.40 ", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gec CSV data example {"title": "ABS Plastic Case for Raspberry Pi 3 Model B & Raspberry Pi 2 E-524988", "url": " https://www.tinydeal.com/abs-plastic-case-for- raspberry-pi-3-model-b-raspberry-pi-2-p-163950.html ", "discounted_price": "R$12.74", "original_price": "R$13.66", "User-Agent": "Mozilla/ 5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2330"} {"title": "3M 9001 KN90 Dust Masks Respirator防尘 PM2.5 工业建筑 Polle RTH-562440", "url": " https://www.tinydeal.com/3m-9001-kn90-dust-masks-respirator-anti-dust-pm25-industrial-construction -polle-p-179487.html ”,“discounted_price”:“R$10.29”,“original_price”:“R$12.40”,“用户代理”:“Mozilla/5.0(Windows NT 10.0;Win64;x64)AppleWebKit/537.36 (KHTML,如 Gec ko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2330"} {"title": "2-in-1 Vintage Blue Rhinestone Necklace + Earring Jewelry Set DJA-562974", "url": " https://www.tinydeal.com/2-in-1-vintage-blue-rhinestone-necklace-earring-jewelry-set-p-180097.html ", "discounted_price": "R$11.77", "original_price": "R$30.77 ", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; ko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2330"} {"title": "2-in-1 Vintage 蓝色水钻项链 + 耳环首饰套装 DJA-562974", "url": " https://www.tinydeal.com/2-in-1-vintage-blue-rhinestone-necklace-earring-jewelry-set-p-180097.html ", "discounted_price": "R$1. “R$30.77”,“用户代理”:“Mozilla/5.0(Windows NT 10.0; Win64;赢64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2330"} {"title": "64GB USB 2.0 Flash Drive USB Pen Drive U Disk EFM-561923", "url": " https://www.tinydeal.com/64gb-usb-20-flash-drive-usb-pen-drive-u-disk-p-178875.html ", "discounted_price": "R$34.83", "original_price": "R$99.43 ", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2330"} x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2330"} {"title": "64GB USB 2.0 Flash Drive USB Pen Drive U Disk EFM-561923" , "url": " https://www.tinydeal.com/64gb-usb-20-flash-drive-usb-pen-drive-u-disk-p-178875.ZFC35FDC70D5FC69D2698$83A822C7A53EZ ", "discounted38_price:". ", "original_price": "R$99.43", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "日期时间”:“20200420 2330”}

JSON data example JSON 数据示例

{ "title": "ABS Plastic Case for Raspberry Pi 3 Model B & Raspberry Pi 2 E-524988", "url": " https://www.tinydeal.com/abs-plastic-case-for-raspberry-pi-3-model-b-raspberry-pi-2-p-163950.html ", "discounted_price": "R$12.74", "original_price": "R$13.66 ", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2329" } { "title": "3M 9001 KN90 Dust Masks Respirator Anti-dust PM2.5 Industrial Construction Polle RTH-562440", "url": " https://www.tinydeal.com/3m-9001-kn90-dust-masks-respirator-anti-dust-pm25-industrial-construction-polle-p-179487.html ", "discounted_price": "R$10.29", "original_price": "R$12.40 ", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "dat { "title": "用于 Raspberry Pi 3 Model B & Raspberry Pi 2 E-524988 的 ABS 塑料外壳", "url": " https://www.tinydeal.com/abs-plastic-case-for-raspberry-pi -3-model-b-raspberry-pi-2-p-163950.html ", "discounted_price": "R$12.74", "original_price": "R$13.66", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2329" } { "title": "3M 9001 KN90 Dust Masks Respirator Anti-dust PM2.5 工业建筑Polle RTH-562440", "url":" https://www.tinydeal.com/3m-9001-kn90-dust-masks-respirator-anti-dust-pm25-industrial-construction-polle- p-179487.html ", "discounted_price": "R$10.29", "original_price": "R$12.40", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,像壁虎)Chrome/76.0.3809.100 Safari/537.36", "dat etime": "20200420 2329" } { "title": "2-in-1 Vintage Blue Rhinestone Necklace + Earring Jewelry Set DJA-562974", "url": " https://www.tinydeal.com/2-in-1-vintage-blue-rhinestone-necklace-earring-jewelry-set-p-180097.html ", "discounted_price": "R$11.77", "original_price": "R$30.77 ", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; etime": "20200420 2329" } { "title": "二合一复古蓝色水钻项链 + 耳环首饰套装 DJA-562974", "url": " https://www.tinydeal.com/2-in -1-vintage-blue-rhinestone-necklace-earring-jewelry-set-p-180097.html ”,“discounted_price”:“R$11.77”,“original_price”:“R$30.77”,“User-Agent”:“Mozilla /5.0(Windows NT 10.0; Win64;赢64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2329" } x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36", "datetime": "20200420 2329" }

Can anyone tell me where I'm going wrong in these outputs?谁能告诉我这些输出哪里出错了?

How are you getting the scraped data?您如何获取抓取的数据? From what you showed I suspected that you copied from the terminal.根据您显示的内容,我怀疑您是从终端复制的。 Is that it?是这样吗? If it's, there's a way to save it direct into a file, using this command:如果是,则可以使用以下命令将其直接保存到文件中:

scrapy crawl special_offers -o <where save the file>/special_offers.json

Hopefully this solves your problem.希望这可以解决您的问题。 Please, let me know.请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM