简体   繁体   中英

Extract files from zip without keep the top-level folder with python zipfile

I'm using the current code to extract the files from a zip file while keeping the directory structure:

zip_file = zipfile.ZipFile('archive.zip', 'r')
zip_file.extractall('/dir/to/extract/files/')
zip_file.close()

Here is a structure for an example zip file:

/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg

At the end I want this:

/dir/to/extract/file.jpg
/dir/to/extract/file1.jpg
/dir/to/extract/file2.jpg

But it should ignore only if the zip file has a top-level folder with all files inside it, so when I extract a zip with this structure:

/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
/dir2/file.txt
/file.mp3

It should stay like this:

/dir/to/extract/dir1/file.jpg
/dir/to/extract/dir1/file1.jpg
/dir/to/extract/dir1/file2.jpg
/dir/to/extract/dir2/file.txt
/dir/to/extract/file.mp3

Any ideas?

If I understand your question correctly, you want to strip any common prefix directories from the items in the zip before extracting them.

If so, then the following script should do what you want:

import sys, os
from zipfile import ZipFile

def get_members(zip):
    parts = []
    # get all the path prefixes
    for name in zip.namelist():
        # only check files (not directories)
        if not name.endswith('/'):
            # keep list of path elements (minus filename)
            parts.append(name.split('/')[:-1])
    # now find the common path prefix (if any)
    prefix = os.path.commonprefix(parts)
    if prefix:
        # re-join the path elements
        prefix = '/'.join(prefix) + '/'
    # get the length of the common prefix
    offset = len(prefix)
    # now re-set the filenames
    for zipinfo in zip.infolist():
        name = zipinfo.filename
        # only check files (not directories)
        if len(name) > offset:
            # remove the common prefix
            zipinfo.filename = name[offset:]
            yield zipinfo

args = sys.argv[1:]

if len(args):
    zip = ZipFile(args[0])
    path = args[1] if len(args) > 1 else '.'
    zip.extractall(path, get_members(zip))

读取ZipFile.namelist()返回的条目以查看它们是否在同一目录中,然后打开/读取每个条目并将其写入使用open()的文件中。

This might be a problem with the zip archive itself. In a python prompt try this to see if the files are in the correct directories in the zip file itself.

import zipfile

zf = zipfile.ZipFile("my_file.zip",'r')
first_file = zf.filelist[0]
print file_list.filename

This should say something like "dir1" repeat the steps above substituting and index of 1 into filelist like so first_file = zf.filelist[1] This time the output should look like 'dir1/file1.jpg' if this is not the case then the zip file does not contain directories and will be unzipped all to one single directory.

Based on the @ekhumoro's answer I come up with a simpler funciton to extract everything on the same level, it is not exactly what you are asking but I think can help someone.

    def _basename_members(self, zip_file: ZipFile):
        for zipinfo in zip_file.infolist():
            zipinfo.filename = os.path.basename(zipinfo.filename)
            yield zipinfo

    from_zip="some.zip"
    to_folder="some_destination/"
    with ZipFile(file=from_zip, mode="r") as zip_file:
        os.makedirs(to_folder, exist_ok=True)
        zip_infos = self._basename_members(zip_file)
        zip_file.extractall(path=to_folder, members=zip_infos)

Basically you need to do two things:

  1. Identify the root directory in the zip.
  2. Remove the root directory from the paths of other items in the zip.

The following should retain the overall structure of the zip while removing the root directory:

import typing, zipfile

def _is_root(info: zipfile.ZipInfo) -> bool:
    if info.is_dir():
        parts = info.filename.split("/")
        # Handle directory names with and without trailing slashes.
        if len(parts) == 1 or (len(parts) == 2 and parts[1] == ""):
            return True
    return False

def _members_without_root(archive: zipfile.ZipFile, root_filename: str) -> typing.Generator:
    for info in archive.infolist():
        parts = info.filename.split(root_filename)
        if len(parts) > 1 and parts[1]:
            # We join using the root filename, because there might be a subdirectory with the same name.
            info.filename = root_filename.join(parts[1:])
            yield info

with zipfile.ZipFile("archive.zip", mode="r") as archive:
    # We will use the first directory with no more than one path segment as the root.
    root = next(info for info in archive.infolist() if _is_root(info))
    if root:
        archive.extractall(path="/dir/to/extract/", members=_members_without_root(archive, root.filename))
    else:
        print("No root directory found in zip.")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM