I'm using the current code to extract the files from a zip file while keeping the directory structure:
zip_file = zipfile.ZipFile('archive.zip', 'r')
zip_file.extractall('/dir/to/extract/files/')
zip_file.close()
Here is a structure for an example zip file:
/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
At the end I want this:
/dir/to/extract/file.jpg
/dir/to/extract/file1.jpg
/dir/to/extract/file2.jpg
But it should ignore only if the zip file has a top-level folder with all files inside it, so when I extract a zip with this structure:
/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
/dir2/file.txt
/file.mp3
It should stay like this:
/dir/to/extract/dir1/file.jpg
/dir/to/extract/dir1/file1.jpg
/dir/to/extract/dir1/file2.jpg
/dir/to/extract/dir2/file.txt
/dir/to/extract/file.mp3
Any ideas?
If I understand your question correctly, you want to strip any common prefix directories from the items in the zip before extracting them.
If so, then the following script should do what you want:
import sys, os
from zipfile import ZipFile
def get_members(zip):
parts = []
# get all the path prefixes
for name in zip.namelist():
# only check files (not directories)
if not name.endswith('/'):
# keep list of path elements (minus filename)
parts.append(name.split('/')[:-1])
# now find the common path prefix (if any)
prefix = os.path.commonprefix(parts)
if prefix:
# re-join the path elements
prefix = '/'.join(prefix) + '/'
# get the length of the common prefix
offset = len(prefix)
# now re-set the filenames
for zipinfo in zip.infolist():
name = zipinfo.filename
# only check files (not directories)
if len(name) > offset:
# remove the common prefix
zipinfo.filename = name[offset:]
yield zipinfo
args = sys.argv[1:]
if len(args):
zip = ZipFile(args[0])
path = args[1] if len(args) > 1 else '.'
zip.extractall(path, get_members(zip))
读取ZipFile.namelist()
返回的条目以查看它们是否在同一目录中,然后打开/读取每个条目并将其写入使用open()
的文件中。
This might be a problem with the zip archive itself. In a python prompt try this to see if the files are in the correct directories in the zip file itself.
import zipfile
zf = zipfile.ZipFile("my_file.zip",'r')
first_file = zf.filelist[0]
print file_list.filename
This should say something like "dir1" repeat the steps above substituting and index of 1 into filelist like so first_file = zf.filelist[1]
This time the output should look like 'dir1/file1.jpg' if this is not the case then the zip file does not contain directories and will be unzipped all to one single directory.
Based on the @ekhumoro's answer I come up with a simpler funciton to extract everything on the same level, it is not exactly what you are asking but I think can help someone.
def _basename_members(self, zip_file: ZipFile):
for zipinfo in zip_file.infolist():
zipinfo.filename = os.path.basename(zipinfo.filename)
yield zipinfo
from_zip="some.zip"
to_folder="some_destination/"
with ZipFile(file=from_zip, mode="r") as zip_file:
os.makedirs(to_folder, exist_ok=True)
zip_infos = self._basename_members(zip_file)
zip_file.extractall(path=to_folder, members=zip_infos)
Basically you need to do two things:
The following should retain the overall structure of the zip while removing the root directory:
import typing, zipfile
def _is_root(info: zipfile.ZipInfo) -> bool:
if info.is_dir():
parts = info.filename.split("/")
# Handle directory names with and without trailing slashes.
if len(parts) == 1 or (len(parts) == 2 and parts[1] == ""):
return True
return False
def _members_without_root(archive: zipfile.ZipFile, root_filename: str) -> typing.Generator:
for info in archive.infolist():
parts = info.filename.split(root_filename)
if len(parts) > 1 and parts[1]:
# We join using the root filename, because there might be a subdirectory with the same name.
info.filename = root_filename.join(parts[1:])
yield info
with zipfile.ZipFile("archive.zip", mode="r") as archive:
# We will use the first directory with no more than one path segment as the root.
root = next(info for info in archive.infolist() if _is_root(info))
if root:
archive.extractall(path="/dir/to/extract/", members=_members_without_root(archive, root.filename))
else:
print("No root directory found in zip.")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.