I have a zip file with internal folder structure as:
CODE
`-- CODE
`-- CODE
`-- CODE
|-- 2019
| |-- file1.txt
| `-- file2.txt
|-- 2020
| `-- file3.txt
`-- 2021
|-- file4.txt
`-- file5.txt
And I want to unzip the files in folder structure as given below:
CODE
|-- 2019
| |-- file1.txt
| `-- file2.txt
|-- 2020
| `-- file3.txt
`-- 2021
|-- file4.txt
`-- file5.txt
I could hard code it, however, since it is a repeating request, can I programmatically handle this to unzip only folders which have files in them.
My current code is:
def unzipfiles(incoming_path):
for path,subdirs,files in os.walk(incoming_path):
for name in files:
if(name.endswith('.zip')):
with zipfile.ZipFile(os.path.join(incoming_path,name), 'r') as zip_ref:
for file in zip_ref.namelist():
out_path=os.path.join(incoming_path,file)
out_path=out_path.replace('CODE/','')
if(out_path[:-1]!=incoming_path):
zip_ref.extract(file,out_path)
However, it is not working correctly, and creating more folders than present in zip file.
This code works for me.
def removeEmptyFolders(path, removeRoot=True):
if not os.path.isdir(path):
return
files = os.listdir(path)
if len(files):
for f in files:
fullpath = os.path.join(path, f)
if os.path.isdir(fullpath):
removeEmptyFolders(fullpath)
files = os.listdir(path)
if len(files) == 0 and removeRoot:
os.rmdir(path)
The solution that I use, is mapping the full path of the files, to a relative shorter name. For the solution I will take the zip structure as provided by the OP.
import os
import re
import pathlib
import shutil
import zipfile
from pprint import pprint
if __name__ == '__main__':
toplevel = os.path.join('files')
new_structure = dict()
# Let's just extract everything
with zipfile.ZipFile('CODE.zip', 'r') as zip_file:
for zip_info in zip_file.infolist():
path = pathlib.PurePath(zip_info.filename)
# This writes the data from the old file to a new file.
if str(path.parent) in new_structure:
source = zip_file.open(zip_info)
target = open(os.path.join(new_structure[str(path.parent)], path.name), "wb")
with source, target:
shutil.copyfileobj(source, target)
# Create the new folder structure mapping, based on the year name.
# The matches are based on numbers in this example, but can be specified.
if re.match('\d+', path.name):
new_structure[str(path)] = os.path.join(toplevel, path.name)
os.makedirs(new_structure[str(path)], exist_ok=True)
pprint(new_structure)
The output ( pprint
), shows the remapping structure:
{'CODE\\CODE\\CODE\\CODE\\2019': 'files\\2019',
'CODE\\CODE\\CODE\\CODE\\2020': 'files\\2020',
'CODE\\CODE\\CODE\\CODE\\2021': 'files\\2021'}
The output is a new folder with the following structure:
files
|-- 2019
| |-- file1.txt
| `-- file2.txt
|-- 2020
| `-- file3.txt
`-- 2021
|-- file4.txt
`-- file5.txt
There are two interesting points to make:
Regex pattern matching is used to determine the file paths '\d+'
, which simply accepts a list of numbers, if you want to be more precise you can use \d{4}
to exactly match four digits.
This method only assumes one lower level, in other words, multiple nested files will not be unpacked properly. For this the line if str(path.parent) in new_structure:
has to be changed to take into account multiple parents path.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.