简体   繁体   English

为什么 Python zipfile 不能提供与命令行 zip 相同的输出 .zip 文件大小?

[英]Why does Python zipfile not give the same output .zip file size as command-line zip?

Here is the size of the file generated by zip :这是zip生成的文件的大小:

$ seq 10000 > 1.txt 
$ zip 1 1.txt
  adding: 1.txt (deflated 54%)
$ ls -og 1.zip 
-rw-r--r-- 1 22762 Aug 29 10:04 1.zip

Here is an equivalent python script:这是一个等效的python脚本:

import zipfile
z = zipfile.ZipFile(sys.argv[1], 'w', zipfile.ZIP_DEFLATED)
fn = sys.argv[1]
z.writestr(zipfile.ZipInfo(fn), sys.stdin.read())
z.close()

The size of the zip file generated is the following:生成的 zip 文件大小如下:

$ seq 10000 | ./main.py 2.zip 2.txt
$ ls -go 2.zip 
-rw-r--r-- 1 49002 Aug 29 10:15 2.zip

Does anybody know why the python version does not generate the zip file as small as the one generated by zip ?有谁知道为什么Python版本不会生成压缩文件小所产生的一个zip

It turns out (checked in python 3) that when ZipInfo is used, writestr() will not use compression and compresslevel of zipfile.ZipFile.__init() .事实证明(在Python 3中检查),当ZipInfo被使用, writestr()将不使用compressioncompresslevelzipfile.ZipFile.__init() This an example of bad API design.这是一个糟糕的 API 设计的例子。 It should have been designed whether ZipInfo is used, compression and compresslevel from the constructor are always used.应该设计是否使用ZipInfo,总是使用构造函数中的compressioncompresslevel

When passing a ZipInfo instance as the zinfo_or_arcname parameter, the compression method used will be that specified in the compress_type member of the given ZipInfo instance. By default, the ZipInfo constructor sets this member to ZIP_STORED.

Because of this, there is basically no compression in the python code shown on the original post.正因为如此,原帖中展示的python代码基本没有压缩。 Therefore, the file size generated by the python code is large.因此,python代码生成的文件大小较大。

Another problem of this API design is the parameter compression from the constructor is the same as compress_type of .writestr() but they are not named the same.这个 API 设计的另一个问题是构造函数的参数compression.writestr() compress_type相同,但它们的名称不同。 This is another poor design.这是另一个糟糕的设计。 There is no reason to give different names for literally the same thing.没有理由为字面上相同的事物赋予不同的名称。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM