简体   繁体   English

如何从Python中的dataframe一次创建多个zip文件夹

[英]How to create multiple zip folders at once from a dataframe in Python

I have a dataframe consisting of users and a list of pdfs related to each of those users.我有一个 dataframe 由用户和与每个用户相关的 pdf 列表组成。 The pdfs have no standard naming convention, there can be any number of pdfs to a list and the number of users is much longer than the example below. pdf 没有标准的命名约定,列表中可以有任意数量的 pdf,用户数量比下面的示例长得多。

import pandas as pd
from zipfile import ZipFile

data = {'name':['aaron', 'ben', 'charlie', 'daniel'],
       'pdfs':[['aaron1.pdf', 'aaron2.pdf', 'aaron3.pdf'],
               ['ben1.pdf', 'ben2.pdf'],
               ['charlie3.pdf', 'charlie5.pdf'],
                ['dan_age.pdf', 'daniel1.pdf']]}

df = pd.DataFrame(data, columns = ['name', 'pdfs'])

Using the ZipFile package I want to run a loop to create a single zip folder for each user that contains within it the relevant pdf documents for only that individual user.使用 ZipFile package 我想运行一个循环,为每个用户创建一个 zip 文件夹,其中包含仅供该个人用户使用的相关 pdf 文档。

I can successfully create a zip folder for each user in the dataframe using the first two lines of the for loop, however I cannot map the pdf lists to each individual user so that only the pdfs related to each user appear in the correct zip file.我可以使用 for 循环的前两行为 dataframe 中的每个用户成功创建一个 zip 文件夹,但是我不能 map pdf 列表给每个用户,以便只有与每个用户相关的 pdf 出现在正确的 zip 文件中。

users = df['name']
pdfs = df['pdfs']

for user in users:
    zipfiles = ZipFile(user + ".zip", 'w'),
    for zip in zipfiles:
        for lists in pdfs:
            for pdf in lists:
                zip.write(pdf)

Using a for loop I want to create seperate zip folders named 'aaron.zip', 'ben.zip', 'charlie.zip', 'daniel.zip' with each folder only containing the pdfs related to that user.我想使用 for 循环创建单独的 zip 文件夹,分别命名为“aaron.zip”、“ben.zip”、“charlie.zip”、“daniel.zip”,每个文件夹仅包含与该用户相关的 pdf。

I think you're making 3 mistakes:我认为你犯了3个错误:

  1. You try to iterate over a newly created, empty zip-archive (sidenote: don't use zip as variable name, you're overriding a builtin function):您尝试迭代一个新创建的空 zip 存档(旁注:不要使用zip作为变量名,您正在覆盖内置函数):

     zipfiles = ZipFile(user + ".zip", 'w'), for zip in zipfiles:
  2. You try to write every file in df["pdfs"] in the user-zip-archive, not only the ones from the user:您尝试在用户压缩存档中的df["pdfs"]中写入每个文件,而不仅仅是来自用户的文件:

     for lists in pdfs: for pdf in lists:
  3. You don't use the newly created zip-archive to write to:您不使用新创建的 zip 存档写入:

     zip.write(pdf)

You could try the following instead:您可以尝试以下操作:

def zip_user_files(row):
    user, files = row
    with ZipFile(user + ".zip", "w") as archive:
        for file in files:
            archive.write(file)

df[["name", "pdfs"]].apply(zip_user_files, axis=1)

I have used df[["name", "pdfs"]] instead of df just in case the dataframe has actually more columns.我使用df[["name", "pdfs"]]而不是df以防万一 dataframe 实际上有更多列。 If that's not the case just use df .如果不是这种情况,请使用df

Alternative if you don't want to use .apply :如果您不想使用.apply的替代方法:

for user, files in zip(df["name"], df["pdfs"]):
    with ZipFile(user + ".zip", "w") as archive:
        for file in files:
            archive.write(file)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM