简体   繁体   English

在 dict pandas 中写入多个列表

[英]Write multi list in dict pandas

I try to parse site and have faiced with the next problem.我尝试解析站点并解决了下一个问题。 I`m sure that maximum ammount of images of each goods is 7. Each image link write to list.我确定每件商品的最大图片数量是 7。每个图片链接都会写入列表。 And then save it in excell.然后保存在excel中。 So each link have the column like in file 1.xlsx.因此,每个链接都有类似于文件 1.xlsx 中的列。 But there are some goods that have 3 or 5 images.但是有些商品有 3 或 5 个图像。 So if the ammount of images less then 7, I want to fill the another field with empty string.因此,如果图像数量少于 7,我想用空字符串填充另一个字段。 But I get the result like in file 2.xlsx.但我得到的结果类似于文件 2.xlsx。 Please, help me to fix that problem.请帮我解决这个问题。

from datetime import datetime, timedelta
from time import sleep
import time, csv
from csv import reader
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
import requests, json


def get_html(url):
    headers={
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
        }
    r = requests.get(url, headers=headers).content
    return r

goods_link = ['https://www.johnlewis.com/a-a-k-s-hana-raffia-cross-body-bag-navy-multi/p5559710']


Images1 = []
Images2 = []
Images3 = []
Images4 = []
Images5 = []
Images6 = []
Images7 = []
Img = []

for i in goods_link:
    soup = BeautifulSoup(get_html(i), 'html.parser')

    imgContainer = soup.find('div', {'class':'ProductImages_productImagesContainer__1v2kP'})
    imgAll = imgContainer.find_all('div', {'class':'ImageMagnifier_zoomable-image-container__db7jH'})
    for j in imgAll:
        imgSrc = j.find('img').get('src').split('?$rsp')[0]
        Img.append(imgSrc)

    [x.append(y) for x,y in zip([Images1, Images2, Images3, Images4, Images5, Images6, Images7], Img)]

info = {}


for ii in Images1:
    info.setdefault('Images1',[])
    info['Images1'].append(ii)
for ii in Images2:
    info.setdefault('Images2',[])
    info['Images2'].append(ii)
for ii in Images3:
    info.setdefault('Images3',[])
    info['Images3'].append(ii)
for ii in Images4:
    info.setdefault('Images4',[])
    info['Images4'].append(ii)
for ii in Images5:
    info.setdefault('Images5',[])
    info['Images5'].append(ii)
for ii in Images6:
    info.setdefault('Images6',[])
    info['Images6'].append(ii)
for ii in Images7:
    info.setdefault('Images7',[])
    info['Images7'].append(ii)

df = pd.DataFrame.from_dict(info)
df.to_excel('./output.xlsx')

print('Finish')

IIUC you want to fill all 7 columns for each row, even if this row has less than 7 images. IIUC 您希望为每行填充所有 7 列,即使该行的图像少于 7 个。
The step of creating a dictionary is superflous.创建字典的步骤是多余的。 You can list all you images in lists that you append in a list of lists and create your DataFrame from it.您可以在列表中列出您附加到列表列表中的所有图像,并从中创建您的 DataFrame。 You can specify the headers with columns= :您可以使用columns=指定标题:

def get_html(url):
    headers={
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
        }
    r = requests.get(url, headers=headers).content
    return r

goods_link = ['https://www.johnlewis.com/a-a-k-s-hana-raffia-cross-body-bag-navy-multi/p5559710']


headers = ["Images1", "Images2", "Images3", "Images4", "Images5", "Images6", "Images7"]
img_table = []

for link in goods_link:
    img_row = [None]*7
    soup = BeautifulSoup(get_html(link), 'html.parser')

    imgContainer = soup.find('div', {'class':'ProductImages_productImagesContainer__1v2kP'})
    imgAll = imgContainer.find_all('div', {'class':'ImageMagnifier_zoomable-image-container__db7jH'})
    for j, div_obj in enumerate(imgAll):
        imgSrc = div_obj.find('img').get('src').split('?$rsp')[0]
        img_row[j]=imgSrc

    img_table.append(img_row)   

df = pd.DataFrame(img_table, columns=headers)
df.to_excel('./output.xlsx')

print('Finish')

What was missing was to create a list of None with length 7, then using enumerate to replace the element at index j with the corresponding link.缺少的是创建一个长度为 7 的None列表,然后使用enumerate将索引j处的元素替换为相应的链接。

Please try to name your variables in a way that makes the code easier to understand next time.请尝试以一种使代码下次更容易理解的方式命名您的变量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM