简体   繁体   English

Scrapy:如何用无用的值替换字典中的其他项目

[英]Scrapy: How to substitute useless values for other items from a dictionary

I'am currently scraping prices from a website, most products have maximum and minimum prices, but not all of them have minimum prices.我目前正在从网站上抓取价格,大多数产品都有最高和最低价格,但并非所有产品都有最低价格。 Those who don't have minimum, trow useless values I have been replacing for no text "", but I would like to replace those empty values with maximum price (basically because if a price doesn't change minimum and maximum are the same).那些没有最小值的人,我一直在为无文本“”替换那些无用的值,但我想用最高价格替换那些空值(基本上是因为如果价格没有改变,最小值和最大值是相同的) .

The code is extensive, so I have the following libraries imported:代码很广泛,所以我导入了以下库:

import os
import scrapy
from ..items import TutorialItem
import pandas as pd
from scrapy.http import Request
from scrapy.http import FormRequest
from scrapy.selector import Selector
from scrapy.utils.response import open_in_browser
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

class KikoSpider2(scrapy.Spider):
    name = "kiko2"

    login_page = 'https://www.kikowireless.com/login'
    formdata = {'email': 'thisisan@email.com',
                 'password': 'QQntDXqK9'}

The code goes on..代码继续..

The important thing comes here:重要的事情来了:

def parse_products(self, response):
        items = TutorialItem()
        category = response.meta['category']

        article_name = response.css('#content .name a::text').extract()
        article_price = [ x.replace('$', '').replace('\n', '').replace('\t', '').replace(' ', '') for x in response.css('.price::text').extract()]
        article_price_min = [x.replace('\t', '').replace(
        '$', '').replace('\n', 'n').split()[-1].replace('n', '') for x in response.css('.discount::text').extract()] 

        items['article_name'] = article_name
        items['article_price'] = article_price
        items['article_price_min'] = article_price_min
        for item in zip(article_name, article_price, article_price_min):
            scraped_info = {'supplier_url' : item[0],
                                'supplier_item_name' : item[1],
                                'max_price' : item[2],
                                'min_price' : item[3],
                                  }
                # print(scraped_info)
            df_result = pd.DataFrame.from_dict(scraped_info.items())
            print(df_result)
            yield scraped_info

the code line article_price = [ x.replace('$', '').replace('\n', '').replace('\t', '').replace(' ', '') for x in response.css('.price::text').extract()] Extracts the minimum price of the articles, what can I do to fill the blank spaces in it with the article_price corresponding to the same underlying item.代码行article_price = [ x.replace('$', '').replace('\n', '').replace('\t', '').replace(' ', '') for x in response.css('.price::text').extract()]中提取文章的最低价格,我该怎么做才能用与相同基础项目对应的article_price填充其中的空白。

That's quite simple.这很简单。

#this if checks if the value is not null or empty
if article_price_min:
  items['article_price_min'] = article_price_min
else:
  items['article_price_min'] = article_price

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM