简体   繁体   English

Python-Scrapy爬网

[英]Python - Scrapy crawling the web

well here is my project/spider , works fine.... 好吧,这是我的项目/蜘蛛,工作正常。

    # -*- coding: utf-8 -*-
import scrapy
import time

class SccbotakiSpider(scrapy.Spider):
    name = 'SccBotaki'
    start_urls = ['url']
    time.sleep(1)

    def parse(self, response):
        daten = response.css('#daten').extract()
        cartext = response.css('div.car_header > b::text').extract()
        spacerimg = response.css('div.rechts > img::attr(src)').extract()
        inhalt = response.css('div.inhalt')
        prodname = inhalt.css('div.prod-name::text').extract()
        artnr = inhalt.css('div.art-nr > span::text').extract()
        avaible = inhalt.css('div.ampel > img::attr(src)').extract()
        price = inhalt.css('div.preis::text').extract()


        for item in zip(prodname,artnr,avaible,price):
            scraped_info = {
            'prodname' : item[0] ,
            'artnr' : item[1] ,
            'avaible' : item[2] ,
            'price' : item[3] ,
        }
            yield scraped_info

check out the url inside of image because i cannot use tiny url inside this post URL Image 检出图片内的网址,因为我不能在此帖子内使用微小的网址
but i wanted to scrape daten,cartext,spacerimg aswell im gonna get different/bad results btw in settings.py i did like that to export into csv file: 但是我想刮擦daten,cartext,spacerimg以及im在settings.py中会得到不同/不好的结果,我确实想将其导出到csv文件中:

    #Export as CSV Feed
    FEED_FORMAT = "csv"
    FEED_URI = "UltraRacing.csv"

so, my question is why i cannot scrape like my image when im adding "daten,cartext,spacerimg"? 所以,我的问题是为什么我在添加“ daten,cartext,spacerimg”时无法刮擦我的图像? if i did scrape all of them together im gonna get in csv just 1 row with all of the informations in 1 cell and if ill remove the "daten,cartext,spacerimg from the loop", ill get the perfect results.... 如果我确实将它们全部都刮了,我将在1个单元格中将所有信息放入csv中,如果生病了,请从循环中删除“ daten,cartext,spacerimg”,生病了。

hope this make sense... 希望这有道理...

你试图zip不同大小的列表: prodnameartnravaibleprice有41元,但datencartext只有1元和spacerimg是9个元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM