[英]How to read from a zip file within zip file in Python?
I have a file that I want to read that is itself zipped within a zip archive. 我有一个我想要阅读的文件,它本身是在zip存档中压缩的。 For example, parent.zip contains child.zip, which contains child.txt.
例如,parent.zip包含child.zip,其中包含child.txt。 I am having trouble reading child.zip.
我在阅读child.zip时遇到了麻烦。 Can anyone correct my code?
谁能纠正我的代码?
I assume that I need to create child.zip as a file-like object and then open it with a second instance of zipfile, but being new to python my zipfile.ZipFile(zfile.open(name)) is silly. 我假设我需要创建一个类似文件的对象的child.zip,然后用第二个zipfile实例打开它,但是对于python我是新的zipfile.ZipFile(zfile.open(name))是愚蠢的。 It raises a zipfile.BadZipfile: "File is not a zip file" on (independently validated) child.zip
它引发了一个zipfile.BadZip文件:“文件不是一个zip文件”on(独立验证)child.zip
import zipfile
with zipfile.ZipFile("parent.zip", "r") as zfile:
for name in zfile.namelist():
if re.search(r'\.zip$', name) is not None:
# We have a zip within a zip
with **zipfile.ZipFile(zfile.open(name))** as zfile2:
for name2 in zfile2.namelist():
# Now we can extract
logging.info( "Found internal internal file: " + name2)
print "Processing code goes here"
When you use the .open()
call on a ZipFile
instance you indeed get an open file handle. 当您在
ZipFile
实例上使用.open()
调用时,您确实获得了一个打开的文件句柄。 However, to read a zip file, the ZipFile
class needs a little more. 但是,要读取 zip文件,
ZipFile
类需要更多。 It needs to be able to seek on that file, and the object returned by .open()
is not seekable in your case. 它需要能够在该文件上进行搜索 ,并且
.open()
返回的对象在您的情况下是不可.open()
。 Only Python 3 (3.2 and up) produces a ZipExFile
object that supports seeking (provided the underlying file handle for the outer zip file is seekable, and nothing is trying to write to the ZipFile
object). 只有Python 3(3.2及更高版本)生成一个支持搜索的
ZipExFile
对象(前提是外部zip文件的底层文件句柄是可搜索的,并且没有任何东西试图写入ZipFile
对象)。
The workaround is to read the whole zip entry into memory using .read()
, store it in a BytesIO
object (an in-memory file that is seekable) and feed that to ZipFile
: 解决方法是使用读取整个拉链进入存储器
.read()
其存储在一个BytesIO
对象(一个内存文件,它是可搜索)和饲料,为ZipFile
:
from io import BytesIO
# ...
zfiledata = BytesIO(zfile.read(name))
with zipfile.ZipFile(zfiledata) as zfile2:
or, in the context of your example: 或者,在您的示例中:
import zipfile
from io import BytesIO
with zipfile.ZipFile("parent.zip", "r") as zfile:
for name in zfile.namelist():
if re.search(r'\.zip$', name) is not None:
# We have a zip within a zip
zfiledata = BytesIO(zfile.read(name))
with zipfile.ZipFile(zfiledata) as zfile2:
for name2 in zfile2.namelist():
# Now we can extract
logging.info( "Found internal internal file: " + name2)
print "Processing code goes here"
To get this to work with python33 (under windows but that might be unrelevant) i had to do : 为了使这与python33一起工作(在windows下但可能不相关)我必须这样做:
import zipfile, re, io
with zipfile.ZipFile(file, 'r') as zfile:
for name in zfile.namelist():
if re.search(r'\.zip$', name) != None:
zfiledata = io.BytesIO(zfile.read(name))
with zipfile.ZipFile(zfiledata) as zfile2:
for name2 in zfile2.namelist():
print(name2)
cStringIO does not exist so i used io.BytesIO cStringIO不存在所以我使用了io.BytesIO
Here's a function I came up with. 这是我想出的一个功能。 (Copied from here .)
(从这里复制。)
def extract_nested_zipfile(path, parent_zip=None):
"""Returns a ZipFile specified by path, even if the path contains
intermediary ZipFiles. For example, /root/gparent.zip/parent.zip/child.zip
will return a ZipFile that represents child.zip
"""
def extract_inner_zipfile(parent_zip, child_zip_path):
"""Returns a ZipFile specified by child_zip_path that exists inside
parent_zip.
"""
memory_zip = StringIO()
memory_zip.write(parent_zip.open(child_zip_path).read())
return zipfile.ZipFile(memory_zip)
if ('.zip' + os.sep) in path:
(parent_zip_path, child_zip_path) = os.path.relpath(path).split(
'.zip' + os.sep, 1)
parent_zip_path += '.zip'
if not parent_zip:
# This is the top-level, so read from disk
parent_zip = zipfile.ZipFile(parent_zip_path)
else:
# We're already in a zip, so pull it out and recurse
parent_zip = extract_inner_zipfile(parent_zip, parent_zip_path)
return extract_nested_zipfile(child_zip_path, parent_zip)
else:
if parent_zip:
return extract_inner_zipfile(parent_zip, path)
else:
# If there is no nesting, it's easy!
return zipfile.ZipFile(path)
Here's how I tested it: 这是我测试它的方式:
echo hello world > hi.txt
zip wrap1.zip hi.txt
zip wrap2.zip wrap1.zip
zip wrap3.zip wrap2.zip
print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap1.zip').open('hi.txt').read()
print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap2.zip/wrap1.zip').open('hi.txt').read()
print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap3.zip/wrap2.zip/wrap1.zip').open('hi.txt').read()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.