簡體   English   中英

Python從zipfile中刪除條目

[英]Python remove entry from zipfile

我目前正在為容器格式編寫一個開源庫,其中涉及修改 zip 存檔。 因此我使用了 python 內置的 zipfile 模塊。 由於一些限制,我決定修改模塊並將其與我的庫一起發布。 這些修改包括用於從 python 問題跟蹤器中刪除 zip 文件中的條目的補丁: https : zipfile.remove.2.patch更具體地說,我包含了來自 ubershmekel 的 zipfile.remove.2.patch。 在對 Python-2.7 進行一些修改后,補丁根據附帶的單元測試工作得很好。

但是盡管如此,我還是遇到了一些問題,在刪除、添加和刪除 + 添加文件時沒有關閉中間的 zipfile。

Error
Traceback (most recent call last):
  File "/home/martin/git/pyCombineArchive/tests/test_zipfile.py", line 1590, in test_delete_add_no_close
    self.assertEqual(zf.read(fname), data)
  File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 948, in read
    with self.open(name, "r", pwd) as fp:
  File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 1003, in open
    % (zinfo.orig_filename, fname))
BadZipFile: File name in directory 'foo.txt' and header 'bar.txt' differ.

這意味着 zip 文件沒問題,但不知何故中央字典/條目標題被搞砸了。 此單元測試重現此錯誤:

def test_delete_add_no_close(self):
    fname_list = ["foo.txt", "bar.txt", "blu.bla", "sup.bro", "rollah"]
    data_list = [''.join([chr(randint(0, 255)) for i in range(100)]) for i in range(len(fname_list))]

    # add some files to the zip
    with zipfile.ZipFile(TESTFN, "w") as zf:
        for fname, data in zip(fname_list, data_list):
            zf.writestr(fname, data)

    for no in range(0, 2):
        with zipfile.ZipFile(TESTFN, "a") as zf:
            zf.remove(fname_list[no])
            zf.writestr(fname_list[no], data_list[no])
            zf.remove(fname_list[no+1])
            zf.writestr(fname_list[no+1], data_list[no+1])

            # try to access prior deleted/added file and prior last file (which got moved, while delete)
            for fname, data in zip(fname_list, data_list):
                self.assertEqual(zf.read(fname), data)

我修改后的 zipfile 模塊和完整的單元測試文件可以在這個要點中找到: https : //gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4

經過一些密集的調試后,我很確定移動剩余塊時出了點問題。 (在刪除文件之后存儲的那些)所以我繼續重寫這個代碼部分,所以它一次復制這些文件/塊。 我還重寫了每個文件的文件頭(以確保它是有效的)和 zipfile 末尾的中央目錄。 我的刪除功能現在看起來像這樣:

def remove(self, member):
    """Remove a file from the archive. Only works if the ZipFile was opened
    with mode 'a'."""

    if "a" not in self.mode:
        raise RuntimeError('remove() requires mode "a"')
    if not self.fp:
        raise RuntimeError(
              "Attempt to modify ZIP archive that was already closed")
    fp = self.fp

    # Make sure we have an info object
    if isinstance(member, ZipInfo):
        # 'member' is already an info object
        zinfo = member
    else:
        # Get info object for member
        zinfo = self.getinfo(member)

    # start at the pos of the first member (smallest offset)
    position = min([info.header_offset for info in self.filelist])  # start at the beginning of first file
    for info in self.filelist:
        fileheader = info.FileHeader()
        # is member after delete one?
        if info.header_offset > zinfo.header_offset and info != zinfo:
            # rewrite FileHeader and copy compressed data
            # Skip the file header:
            fp.seek(info.header_offset)
            fheader = fp.read(sizeFileHeader)
            if fheader[0:4] != stringFileHeader:
                raise BadZipFile("Bad magic number for file header")

            fheader = struct.unpack(structFileHeader, fheader)
            fname = fp.read(fheader[_FH_FILENAME_LENGTH])
            if fheader[_FH_EXTRA_FIELD_LENGTH]:
                fp.read(fheader[_FH_EXTRA_FIELD_LENGTH])

            if zinfo.flag_bits & 0x800:
                # UTF-8 filename
                fname_str = fname.decode("utf-8")
            else:
                fname_str = fname.decode("cp437")

            if fname_str != info.orig_filename:
                if not self._filePassed:
                    fp.close()
                raise BadZipFile(
                      'File name in directory %r and header %r differ.'
                      % (zinfo.orig_filename, fname))

            # read the actual data
            data = fp.read(fheader[_FH_COMPRESSED_SIZE])

            # modify info obj
            info.header_offset = position
            # jump to new position
            fp.seek(info.header_offset, 0)
            # write fileheader and data
            fp.write(fileheader)
            fp.write(data)
            if zinfo.flag_bits & _FHF_HAS_DATA_DESCRIPTOR:
                # Write CRC and file sizes after the file data
                fp.write(struct.pack("<LLL", info.CRC, info.compress_size,
                        info.file_size))
            # update position
            fp.flush()
            position = fp.tell()

        elif info != zinfo:
            # move to next position
            position = position + info.compress_size + len(fileheader) + self._get_data_descriptor_size(info)

    # Fix class members with state
    self.start_dir = position
    self._didModify = True
    self.filelist.remove(zinfo)
    del self.NameToInfo[zinfo.filename]

    # write new central directory (includes truncate)
    fp.seek(position, 0)
    self._write_central_dir()
    fp.seek(self.start_dir, 0)  # jump to the beginning of the central directory, so it gets overridden at close()

您可以在最新版本的 gist 中找到完整代碼: https : //gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4

或在我正在編寫的庫的回購中: https : //github.com/FreakyBytes/pyCombineArchive

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM