简体   繁体   English

带有 Zipfile 模块 Python 的大型 Zip 文件

[英]Large Zip Files with Zipfile Module Python

I have never used the zip file module before.我以前从未使用过 zip 文件模块。 I have a directory that contains thousands of zip files i need to process.我有一个目录,其中包含我需要处理的数千个 zip 文件。 These files can be up to 6GB big.这些文件最大可达 6GB。 I have looked through some documentation but a lot of them are not clear on what the best methods are for reading large zip files without needing to extract.我查看了一些文档,但其中很多都不清楚读取大型 zip 文件而不需要提取的最佳方法是什么。

I stumbled up this: Read a large zipped text file line by line in python我偶然发现了这个: 在 python 中逐行读取一个大的压缩文本文件

So in my solution I tried to emulate it and use it like I would reading a normal text file with the with open function因此,在我的解决方案中,我尝试模拟它并像使用打开的 function 读取普通文本文件一样使用它

with open(odfslogp_obj, 'rb', buffering=102400) as odfslog

So I wrote the following based off the answer from that link:所以我根据该链接的答案写了以下内容:

for odfslogp_obj in odfslogs_plist:
    with zipfile.ZipFile(odfslogp_obj, mode='r') as z:
        with z.open(buffering=102400) as f:
            for line in f:
                print(line)

But this gives me an "unexpected keyword" error for z.open()但这给了我一个 z.open() 的“意外关键字”错误

Question is, is there documentation that explains what keywords, the z.open() function would take?问题是,是否有文档解释了 z.open() function 会采用哪些关键字? I only found one for the.ZipFile() function.我只找到了一个.ZipFile() function。

I wanna make sure my code isn't using up too much memory while processing these files line by line.我想确保我的代码在逐行处理这些文件时不会使用太多 memory 。

odfslogp_obj is a Path object btw odfslogp_obj 是路径 object 顺便说一句

When I take off the buffering and just have z.open(), I get an error saying: TypeError: open() missing 1 required positional argument: 'name'当我取消缓冲并且只有 z.open() 时,我收到一条错误消息: TypeError: open() missing 1 required positional argument: 'name'

Once you've opened the zipfile, you still need to open the individual files it contains.打开 zipfile 后,您仍然需要打开其中包含的各个文件。 That the second z.open you had problems with.你遇到的第二个z.open有问题。 Its not the builtin python open and it doesn't have a "buffering" parameter.它不是内置的 python open并且没有“缓冲”参数。 See ZipFile.open参见ZipFile.open

Once the zipfile is opened you can enumate its files and open them in turn.打开 zipfile 后,您可以枚举其文件并依次打开它们。 ZipFile.open opens in binary mode, which may be a different problem, depending on what you want to do with the file. ZipFile.open 以二进制模式打开,这可能是一个不同的问题,具体取决于您要对文件执行的操作。

for odfslogp_obj in odfslogs_plist:
    with zipfile.ZipFile(odfslogp_obj, mode='r') as z:
        for name in z.namelist():
            with z.open(name) as f:
                for line in f:
                    print(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM