[英]Scrapy: How to substitute useless values for other items from a dictionary
I'am currently scraping prices from a website, most products have maximum and minimum prices, but not all of them have minimum prices.我目前正在从网站上抓取价格,大多数产品都有最高和最低价格,但并非所有产品都有最低价格。 Those who don't have minimum, trow useless values I have been replacing for no text "", but I would like to replace those empty values with maximum price (basically because if a price doesn't change minimum and maximum are the same).
那些没有最小值的人,我一直在为无文本“”替换那些无用的值,但我想用最高价格替换那些空值(基本上是因为如果价格没有改变,最小值和最大值是相同的) .
The code is extensive, so I have the following libraries imported:代码很广泛,所以我导入了以下库:
import os
import scrapy
from ..items import TutorialItem
import pandas as pd
from scrapy.http import Request
from scrapy.http import FormRequest
from scrapy.selector import Selector
from scrapy.utils.response import open_in_browser
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
class KikoSpider2(scrapy.Spider):
name = "kiko2"
login_page = 'https://www.kikowireless.com/login'
formdata = {'email': 'thisisan@email.com',
'password': 'QQntDXqK9'}
The code goes on..代码继续..
The important thing comes here:重要的事情来了:
def parse_products(self, response):
items = TutorialItem()
category = response.meta['category']
article_name = response.css('#content .name a::text').extract()
article_price = [ x.replace('$', '').replace('\n', '').replace('\t', '').replace(' ', '') for x in response.css('.price::text').extract()]
article_price_min = [x.replace('\t', '').replace(
'$', '').replace('\n', 'n').split()[-1].replace('n', '') for x in response.css('.discount::text').extract()]
items['article_name'] = article_name
items['article_price'] = article_price
items['article_price_min'] = article_price_min
for item in zip(article_name, article_price, article_price_min):
scraped_info = {'supplier_url' : item[0],
'supplier_item_name' : item[1],
'max_price' : item[2],
'min_price' : item[3],
}
# print(scraped_info)
df_result = pd.DataFrame.from_dict(scraped_info.items())
print(df_result)
yield scraped_info
the code line article_price = [ x.replace('$', '').replace('\n', '').replace('\t', '').replace(' ', '') for x in response.css('.price::text').extract()]
Extracts the minimum price of the articles, what can I do to fill the blank spaces in it with the article_price
corresponding to the same underlying item.代码行
article_price = [ x.replace('$', '').replace('\n', '').replace('\t', '').replace(' ', '') for x in response.css('.price::text').extract()]
中提取文章的最低价格,我该怎么做才能用与相同基础项目对应的article_price
填充其中的空白。
That's quite simple.这很简单。
#this if checks if the value is not null or empty
if article_price_min:
items['article_price_min'] = article_price_min
else:
items['article_price_min'] = article_price
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.