简体   繁体   English

Python - 编写特定大小的 zip 非常不可靠

[英]Python - Writing zips of a specific size is very unreliable

I'm attempting to write a script in python that will zip up a directory.我正在尝试在 python 中编写一个脚本,它将 zip 放在一个目录中。 I want to zip files until the zip file is around 500MB, then start a new zip, until all the files have been zipped.我想 zip 文件,直到 zip 文件在 500MB 左右,然后启动一个新的 zip,直到所有文件都被压缩。 The rules then look like:然后规则如下所示:

1) Walks a directory finding all files (using os.walk)
2) Begin writing said files to a zip until the zip is ~500MB
3) Once I've reached that limit, start a new zip following same rules (~500MB limit)
4) End result being N zips all around ~500MB or less

My code right now looks like:我的代码现在看起来像:

#!/usr/bin/env python3

from zipfile import ZipFile
import os
import math as m

current_dir = os.getcwd()
deal_name = current_dir.split("/")[-1:][0]
deal_folder = f'{current_dir}/{deal_name}'
deal_folder_exists = os.path.isdir(deal_folder)
file_paths = []
vol = 1
ZIP_MAX_SIZE = 500

if not deal_folder_exists:
    print(f'Can not find deal folder: {deal_folder}')
    raise Exception('Missing Deal Folder')

# Generate a list of all files to be written to zip
for root, directories, files in os.walk(deal_folder):
    if files:
        # We have files to add to the zip
        for file in files:
            file_paths.append(f'.{root.replace(deal_folder, "")}/{file}')

# Change into the deal folder
os.chdir(deal_folder)

# writing files to a zipfile
deal_zip_path = f'../{deal_name}-vol{vol}.zip'
deal_zip = ZipFile(deal_zip_path, 'w')

# Just a dict for keepin track of the end size of each zip
zip_data = {deal_zip_path: 0}

# Begin looping over the files, and writing the files to the zip
for file in file_paths:
    deal_zip.write(file)
    size = round(sum([info.file_size for info in deal_zip.infolist()]) / 1e+6)

    # Track the current size
    zip_data[deal_zip_path] = size

    # If the current size exceeds the max, bump the vol var and start a new zip
    if size > ZIP_MAX_SIZE:
        deal_zip.close()
        vol += 1
        deal_zip_path = f'../{deal_name}-vol{vol}.zip'
        deal_zip = ZipFile(deal_zip_path, 'w')

# Close the final zip
deal_zip.close()

# Log the deets
print(zip_data)
print('All files zipped successfully!')

The zip_data print looks like this: zip_data打印如下所示:

{
    '../magical-holiday-goodies-vol1.zip': 542, 
    '../magical-holiday-goodies-vol2.zip': 503, 
    '../magical-holiday-goodies-vol3.zip': 505, 
    '../magical-holiday-goodies-vol4.zip': 545, 
    # sometime later
    '../magical-holiday-goodies-vol15.zip': 309
}

So it appears that the script is doing exactly what it should be doing.所以看起来脚本正在做它应该做的事情。 However, the end results of the zip are super unpredictable.然而,zip 的最终结果是超级不可预测的。 For instance, vol1.zip above says it should be 542MB, when in reality I get:例如,上面的 vol1.zip 说它应该是 542MB,而实际上我得到:

注意 12.2 MB 的压缩包...

Any idea why my logging shows the correct file sizes, when in reality the resulting zip sizes are all over the place?知道为什么我的日志记录显示正确的文件大小,而实际上生成的 zip 大小到处都是?

It turns out ZipFile is just storing the files.事实证明ZipFile只是存储文件。

Replace the constructor call with: deal_zip = ZipFile(deal_zip_path, 'w', compression = ZIP_DEFLATED)将构造函数调用替换为: deal_zip = ZipFile(deal_zip_path, 'w', compression = ZIP_DEFLATED)

Also: from zipfile import ZipFile, ZIP_DEFLATED另外: from zipfile import ZipFile, ZIP_DEFLATED

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM