[英]What Is The Best Python Zip Module To Handle Large Files?
EDIT: Specifically compression and extraction speeds. 编辑:特别是压缩和提取速度。
Any Suggestions? 有什么建议?
Thanks 谢谢
So I made a random-ish large zipfile: 所以我做了一个随机的大型zipfile:
$ ls -l *zip
-rw-r--r-- 1 aleax 5000 115749854 Nov 18 19:16 large.zip
$ unzip -l large.zip | wc
23396 93633 2254735
ie, 116 MB with 23.4K files in it, and timed things: 即116 MB,其中包含23.4K文件,以及定时的东西:
$ time unzip -d /tmp large.zip >/dev/null
real 0m14.702s
user 0m2.586s
sys 0m5.408s
this is the system-supplied commandline unzip binary -- no doubt as finely-tuned and optimized as a pure C executable can be. 这是系统提供的命令行解压缩二进制文件 - 毫无疑问,它与纯C可执行文件一样经过精细调整和优化。 Then (after cleaning up /tmp;-)...:
然后(清理/ tmp之后; - )...:
$ time py26 -c'from zipfile import ZipFile; z=ZipFile("large.zip"); z.extractall("/tmp")'
real 0m13.274s
user 0m5.059s
sys 0m5.166s
...and this is Python with its standard library - a bit more demanding of CPU time, but over 10% faster in real, that is, elapsed time. ......这是带有标准库的Python - 对CPU时间要求更高,但实际速度提高了10%,即经过的时间。
You're welcome to repeat such measurements of course (on your specific platform -- if it's CPU-poor, eg a slow ARM chip, then Python's extra demands of CPU time may end up making it slower -- and your specific zipfiles of interest, since each large zipfile will have a very different mix and quite possibly performance). 当然,欢迎重复此类测量(在您的特定平台上 - 如果它的CPU很差,例如慢速ARM芯片,那么Python对CPU时间的额外需求可能最终使其变慢 - 以及您感兴趣的特定zip文件,因为每个大型zipfile将有一个非常不同的混合,很可能性能)。 But what this suggests to me is that there isn't that much space to build a Python extension much faster than good old
zipfile
-- since Python using it beats the pure-C, system-included unzip!-) 但是,这对我来说是没有太多空间来构建Python扩展比快旧的
zipfile
快得多 - 因为Python使用它比纯C,系统包含的解压缩! - )
For handling large files without loading them into memory, use the new stream-based methods in Python 2.6's version of zipfile
, such as ZipFile.open
. 要处理大文件而不将它们加载到内存中,请在Python 2.6的
zipfile
版本中使用新的基于流的方法,例如ZipFile.open
。 Don't use extract
or extractall
unless you have strongly sanitised the filenames in the ZIP. 除非您强烈清理了ZIP中的文件名,否则请勿使用
extract
或extractall
。
(You used to have to read
all the bytes into memory, or hack around it like zipstream ; this is now obsolete.) (您以前必须将所有字节
read
入内存,或者像zipstream一样破解它;现在已经过时了。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.