[英]Extract files from zip without keeping the structure using python ZipFile?
I try to extract all files from .zip containing subfolders in one folder.我尝试从一个文件夹中包含子文件夹的 .zip 中提取所有文件。 I want all the files from subfolders extract in only one folder without keeping the original structure.我希望子文件夹中的所有文件仅提取到一个文件夹中,而不保留原始结构。 At the moment, I extract all, move the files to a folder, then remove previous subfolders.目前,我提取所有文件,将文件移动到一个文件夹,然后删除以前的子文件夹。 The files with same names are overwrited.具有相同名称的文件将被覆盖。
Is it possible to do it before writing files?是否可以在写入文件之前做到这一点?
Here is a structure for example:下面是一个结构示例:
my_zip/file1.txt
my_zip/dir1/file2.txt
my_zip/dir1/dir2/file3.txt
my_zip/dir3/file4.txt
At the end I whish this:最后我希望这样:
my_dir/file1.txt
my_dir/file2.txt
my_dir/file3.txt
my_dir/file4.txt
What can I add to this code ?我可以在此代码中添加什么?
import zipfile
my_dir = "D:\\Download\\"
my_zip = "D:\\Download\\my_file.zip"
zip_file = zipfile.ZipFile(my_zip, 'r')
for files in zip_file.namelist():
zip_file.extract(files, my_dir)
zip_file.close()
if I rename files path from zip_file.namelist(), I have this error:如果我从 zip_file.namelist() 重命名文件路径,则会出现以下错误:
KeyError: "There is no item named 'file2.txt' in the archive"
This opens file handles of members of the zip archive, extracts the filename and copies it to a target file (that's how ZipFile.extract
works, without taking care of subdirectories).这将打开 zip 存档成员的文件句柄,提取文件名并将其复制到目标文件(这就是ZipFile.extract
的工作方式,无需处理子目录)。
import os
import shutil
import zipfile
my_dir = r"D:\Download"
my_zip = r"D:\Download\my_file.zip"
with zipfile.ZipFile(my_zip) as zip_file:
for member in zip_file.namelist():
filename = os.path.basename(member)
# skip directories
if not filename:
continue
# copy file (taken from zipfile's extract)
source = zip_file.open(member)
target = open(os.path.join(my_dir, filename), "wb")
with source, target:
shutil.copyfileobj(source, target)
It is possible to iterate over the ZipFile.infolist()
.可以迭代ZipFile.infolist()
。 On the returned ZipInfo
objects you can then manipulate the filename
to remove the directory part and finally extract it to a specified directory.在返回的ZipInfo
对象上,您可以操作filename
以删除目录部分,最后将其提取到指定目录。
import zipfile
import os
my_dir = "D:\\Download\\"
my_zip = "D:\\Download\\my_file.zip"
with zipfile.ZipFile(my_zip) as zip:
for zip_info in zip.infolist():
if zip_info.filename[-1] == '/':
continue
zip_info.filename = os.path.basename(zip_info.filename)
zip.extract(zip_info, my_dir)
Just extract to bytes in memory,compute the filename, and write it there yourself, instead of letting the library do it - -mostly, just use the "read()" instead of "extract()" method:只需提取到内存中的字节,计算文件名,然后自己写在那里,而不是让库来做——大多数情况下,只需使用“read()”而不是“extract()”方法:
Python 3.6+ update(2020) - the same code from the original answer, but using pathlib.Path
, which ease file-path manipulation and other operations (like "write_bytes") Python 3.6+ 更新(2020) - 与原始答案相同的代码,但使用pathlib.Path
,可简化文件路径操作和其他操作(如“write_bytes”)
from pathlib import Path
import zipfile
import os
my_dir = Path("D:\\Download\\")
my_zip = my_dir / "my_file.zip"
zip_file = zipfile.ZipFile(my_zip, 'r')
for files in zip_file.namelist():
data = zip_file.read(files, my_dir)
myfile_path = my_dir / Path(files.filename).name
myfile_path.write_bytes(data)
zip_file.close()
Original code in answer without pathlib:没有 pathlib的答案中的原始代码:
import zipfile
import os
my_dir = "D:\\Download\\"
my_zip = "D:\\Download\\my_file.zip"
zip_file = zipfile.ZipFile(my_zip, 'r')
for files in zip_file.namelist():
data = zip_file.read(files, my_dir)
# I am almost shure zip represents directory separator
# char as "/" regardless of OS, but I don't have DOS or Windos here to test it
myfile_path = os.path.join(my_dir, files.split("/")[-1])
myfile = open(myfile_path, "wb")
myfile.write(data)
myfile.close()
zip_file.close()
A similar concept to the solution of Gerhard Götz , but adapted for extracting single files instead of the entire zip:与Gerhard Götz 的解决方案类似的概念,但适用于提取单个文件而不是整个 zip:
with ZipFile(zipPath, 'r') as zipObj:
zipInfo = zipObj.getinfo(path_in_zip))
zipInfo.filename = os.path.basename(destination)
zipObj.extract(zipInfo, os.path.dirname(os.path.realpath(destination)))
In case you are getting badZipFile error.如果您遇到 badZipFile 错误。 you can unzip the archive using 7zip sub process.您可以使用 7zip 子进程解压缩存档。 assuming you have installed the 7zip then use the following code.假设您已经安装了 7zip,然后使用以下代码。
import subprocess
my_dir = destFolder #destination folder
my_zip = destFolder + "/" + filename.zip #file you want to extract
ziploc = "C:/Program Files/7-Zip/7z.exe" #location where 7zip is installed
cmd = [ziploc, 'e',my_zip ,'-o'+ my_dir ,'*.txt' ,'-r' ]
#extracting only txt files and from all subdirectories
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.