![](/img/trans.png)
[英]Remove multiple levels of directory while extracting from zipfile python
[英]Python remove entry from zipfile
我目前正在為容器格式編寫一個開源庫,其中涉及修改 zip 存檔。 因此我使用了 python 內置的 zipfile 模塊。 由於一些限制,我決定修改模塊並將其與我的庫一起發布。 這些修改包括用於從 python 問題跟蹤器中刪除 zip 文件中的條目的補丁: https : zipfile.remove.2.patch
更具體地說,我包含了來自 ubershmekel 的 zipfile.remove.2.patch。 在對 Python-2.7 進行一些修改后,補丁根據附帶的單元測試工作得很好。
但是盡管如此,我還是遇到了一些問題,在刪除、添加和刪除 + 添加文件時沒有關閉中間的 zipfile。
Error
Traceback (most recent call last):
File "/home/martin/git/pyCombineArchive/tests/test_zipfile.py", line 1590, in test_delete_add_no_close
self.assertEqual(zf.read(fname), data)
File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 948, in read
with self.open(name, "r", pwd) as fp:
File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 1003, in open
% (zinfo.orig_filename, fname))
BadZipFile: File name in directory 'foo.txt' and header 'bar.txt' differ.
這意味着 zip 文件沒問題,但不知何故中央字典/條目標題被搞砸了。 此單元測試重現此錯誤:
def test_delete_add_no_close(self):
fname_list = ["foo.txt", "bar.txt", "blu.bla", "sup.bro", "rollah"]
data_list = [''.join([chr(randint(0, 255)) for i in range(100)]) for i in range(len(fname_list))]
# add some files to the zip
with zipfile.ZipFile(TESTFN, "w") as zf:
for fname, data in zip(fname_list, data_list):
zf.writestr(fname, data)
for no in range(0, 2):
with zipfile.ZipFile(TESTFN, "a") as zf:
zf.remove(fname_list[no])
zf.writestr(fname_list[no], data_list[no])
zf.remove(fname_list[no+1])
zf.writestr(fname_list[no+1], data_list[no+1])
# try to access prior deleted/added file and prior last file (which got moved, while delete)
for fname, data in zip(fname_list, data_list):
self.assertEqual(zf.read(fname), data)
我修改后的 zipfile 模塊和完整的單元測試文件可以在這個要點中找到: https : //gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4
經過一些密集的調試后,我很確定移動剩余塊時出了點問題。 (在刪除文件之后存儲的那些)所以我繼續重寫這個代碼部分,所以它一次復制這些文件/塊。 我還重寫了每個文件的文件頭(以確保它是有效的)和 zipfile 末尾的中央目錄。 我的刪除功能現在看起來像這樣:
def remove(self, member):
"""Remove a file from the archive. Only works if the ZipFile was opened
with mode 'a'."""
if "a" not in self.mode:
raise RuntimeError('remove() requires mode "a"')
if not self.fp:
raise RuntimeError(
"Attempt to modify ZIP archive that was already closed")
fp = self.fp
# Make sure we have an info object
if isinstance(member, ZipInfo):
# 'member' is already an info object
zinfo = member
else:
# Get info object for member
zinfo = self.getinfo(member)
# start at the pos of the first member (smallest offset)
position = min([info.header_offset for info in self.filelist]) # start at the beginning of first file
for info in self.filelist:
fileheader = info.FileHeader()
# is member after delete one?
if info.header_offset > zinfo.header_offset and info != zinfo:
# rewrite FileHeader and copy compressed data
# Skip the file header:
fp.seek(info.header_offset)
fheader = fp.read(sizeFileHeader)
if fheader[0:4] != stringFileHeader:
raise BadZipFile("Bad magic number for file header")
fheader = struct.unpack(structFileHeader, fheader)
fname = fp.read(fheader[_FH_FILENAME_LENGTH])
if fheader[_FH_EXTRA_FIELD_LENGTH]:
fp.read(fheader[_FH_EXTRA_FIELD_LENGTH])
if zinfo.flag_bits & 0x800:
# UTF-8 filename
fname_str = fname.decode("utf-8")
else:
fname_str = fname.decode("cp437")
if fname_str != info.orig_filename:
if not self._filePassed:
fp.close()
raise BadZipFile(
'File name in directory %r and header %r differ.'
% (zinfo.orig_filename, fname))
# read the actual data
data = fp.read(fheader[_FH_COMPRESSED_SIZE])
# modify info obj
info.header_offset = position
# jump to new position
fp.seek(info.header_offset, 0)
# write fileheader and data
fp.write(fileheader)
fp.write(data)
if zinfo.flag_bits & _FHF_HAS_DATA_DESCRIPTOR:
# Write CRC and file sizes after the file data
fp.write(struct.pack("<LLL", info.CRC, info.compress_size,
info.file_size))
# update position
fp.flush()
position = fp.tell()
elif info != zinfo:
# move to next position
position = position + info.compress_size + len(fileheader) + self._get_data_descriptor_size(info)
# Fix class members with state
self.start_dir = position
self._didModify = True
self.filelist.remove(zinfo)
del self.NameToInfo[zinfo.filename]
# write new central directory (includes truncate)
fp.seek(position, 0)
self._write_central_dir()
fp.seek(self.start_dir, 0) # jump to the beginning of the central directory, so it gets overridden at close()
您可以在最新版本的 gist 中找到完整代碼: https : //gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4
或在我正在編寫的庫的回購中: https : //github.com/FreakyBytes/pyCombineArchive
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.