[英]How to add files from multiple zip files into the single zip file
我想將具有通用 substring 的多個 zip 文件中的文件放入單個 zipfile
我有一個文件夾“temp”,其中包含一些.zip 文件和其他一些文件
filename1_160645.zip
filename1_165056.zip
filename1_195326.zip
filename2_120528.zip
filename2_125518.zip
filename3_171518.zip
test.xlsx
filename19_161518.zip
我有以下 dataframe df_filenames 包含文件名的前綴
filename_prefix
filename1
filename2
filename3
如果臨時文件夾中有多個.zip 文件,其前綴與 dataframe df_filenames 中存在的前綴相同,我想合並這些文件的內容
例如filename1_160645.zip
包含以下內容
1a.csv
1b.csv
並且filename1_165056.zip
包含以下內容
1d.csv
並且filename1_195326.zip
包含以下內容
1f.csv
將上述2個文件的內容合並到filename1_160645.zip
后, filename1_160645.zip
的內容將是
1a.csv
1b.csv
1d.csv
1f.csv
最后只有以下文件將保留臨時文件夾
filename1_160645.zip
filename2_120528.zip
filename3_171518.zip
test.xlsx
filename19_161518.zip
我已經編寫了以下代碼,但它不工作
import os
import zipfile as zf
import pandas as pd
df_filenames=pd.read_excel('filename_prefix.xlsx')
#Get the list of all the filenames in the temp folder
lst_fnames=os.listdir(r'C:\Users\XYZ\Downloads\temp')
#take only .zip files
lst_fnames=[fname for fname in lst_fnames if fname.endswith('.zip')]
#take distinct prefixes in the dataframe
df_prefixes=df_filenames['filename_prefix'].unique()
for prefix in df_prefixes:
#this list will contain zip files with the same prefixes
lst=[]
#total count of files in the lst
count=0
for fname in lst_fnames:
if prefix in fname:
#print(prefix)
lst.append(fname)
#print(lst)
#if the list has more than 1 zip files,merge them
if len(lst)>1:
print(lst)
with zf.ZipFile(lst[0], 'a') as f1:
print(f1.filename)
for f in lst[1:]:
with zf.ZipFile(path+'\\'+f, 'r') as f:
print(f.filename) #getting entire path of the file here,not just filename
[f1.writestr(t[0], t[1].read()) for t in ((n, f.open(n)) for n in f.namelist())]
print(f1.namelist())
將文件名包含filename1的文件內容合並到filename1_160645.zip,
``filename1_160645.zip```的內容應該是
1a.csv
1b.csv
1d.csv
1f.csv
but nothing has changed when I double click filename1_160645.zip
Basically, 1a.csv,1b.csv,1d.csv,1f.csv are not part of filename1_160645.zip
我會使用shutil
來獲得更高級別的視圖來處理存檔文件。 此外,使用pathlib
為給定的文件路徑提供了很好的方法/屬性。 結合groupby
,我們可以輕松提取出相互關聯的目標文件。
import itertools
import shutil
from pathlib import Path
import pandas as pd
filenames = pd.read_excel('filename_prefix.xlsx')
prefixes = filenames['filename_prefix'].unique()
path = Path.cwd() # or change to Path('path/to/desired/dir/')
zip_files = (file for file in path.iterdir() if file.suffix == '.zip')
target_files = sorted(file for file in zip_files
if any(file.stem.startswith(pre) for pre in prefixes))
file_groups = itertools.groupby(target_files, key=lambda x: x.stem.split('_')[0])
for _, group in file_groups:
first, *rest = group
if not rest:
continue
temp_dir = path / first.stem
temp_dir.mkdir()
shutil.unpack_archive(first, extract_dir=temp_dir)
for item in rest:
shutil.unpack_archive(item, extract_dir=temp_dir)
item.unlink()
shutil.make_archive(temp_dir, 'zip', temp_dir)
shutil.rmtree(temp_dir)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.