[英]Why does Python zipfile not give the same output .zip file size as command-line zip?
Here is the size of the file generated by zip
:这是
zip
生成的文件的大小:
$ seq 10000 > 1.txt
$ zip 1 1.txt
adding: 1.txt (deflated 54%)
$ ls -og 1.zip
-rw-r--r-- 1 22762 Aug 29 10:04 1.zip
Here is an equivalent python script:这是一个等效的python脚本:
import zipfile
z = zipfile.ZipFile(sys.argv[1], 'w', zipfile.ZIP_DEFLATED)
fn = sys.argv[1]
z.writestr(zipfile.ZipInfo(fn), sys.stdin.read())
z.close()
The size of the zip file generated is the following:生成的 zip 文件大小如下:
$ seq 10000 | ./main.py 2.zip 2.txt
$ ls -go 2.zip
-rw-r--r-- 1 49002 Aug 29 10:15 2.zip
Does anybody know why the python version does not generate the zip file as small as the one generated by zip
?有谁知道为什么Python版本不会生成压缩文件小所产生的一个
zip
?
It turns out (checked in python 3) that when ZipInfo
is used, writestr()
will not use compression
and compresslevel
of zipfile.ZipFile.__init()
.事实证明(在Python 3中检查),当
ZipInfo
被使用, writestr()
将不使用compression
和compresslevel
的zipfile.ZipFile.__init()
This an example of bad API design.这是一个糟糕的 API 设计的例子。 It should have been designed whether ZipInfo is used,
compression
and compresslevel
from the constructor are always used.应该设计是否使用ZipInfo,总是使用构造函数中的
compression
和compresslevel
。
When passing a ZipInfo instance as the zinfo_or_arcname parameter, the compression method used will be that specified in the compress_type member of the given ZipInfo instance. By default, the ZipInfo constructor sets this member to ZIP_STORED.
Because of this, there is basically no compression in the python code shown on the original post.正因为如此,原帖中展示的python代码基本没有压缩。 Therefore, the file size generated by the python code is large.
因此,python代码生成的文件大小较大。
Another problem of this API design is the parameter compression
from the constructor is the same as compress_type
of .writestr()
but they are not named the same.这个 API 设计的另一个问题是构造函数的参数
compression
与.writestr()
compress_type
相同,但它们的名称不同。 This is another poor design.这是另一个糟糕的设计。 There is no reason to give different names for literally the same thing.
没有理由为字面上相同的事物赋予不同的名称。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.