I have a file structure that emulates the following:
I always have a folder, that folder contains an EXCEL folder and a bunch of text documents. Each EXCEL folder has a bunch of .xlsx files.
This same idea can be replicated an infinite amount of times following that same structure. I am trying to go into each EXCEL folder for each directory, remove all files with a .xlsx extension and continue this process until all the EXCEL folders have been visited.
This is a little bit of code I am failing with:
def clean_out_excel_test_data():
#For each folder in the test_log directory
#Open each folder
#for each_folder that contains the word EXCEL
#open each_folder
#for each file in each_folder, remove it
log_directory = "test_log_data/"
for each_folder in sorted(os.listdir(log_directory)):
print each_folder + ' is in the root'
for each_folder2 in sorted(os.listdir('%s/%s'%(log_directory,each_folder))) if os.path.isdir(each_folder2):
print '\t-' + each_folder2 + ' is a sub-folder'
for each_excel_file in sorted(os.listdir('%s/%s/%s'%(log_directory,each_folder, each_folder2))):
print '\t\t-' + each_excel_file + ' is a sub excel file'
I realize my code is garbage, but I wanted to at least show what I am going for.
Let os.walk
handle the directory traversal for you:
for root, dirs, files in os.walk('/path/to/test_log_data'):
if 'EXCEL' not in root:
continue
for fname in files:
if fname.endswith('.xlsx'):
os.remove(os.path.join(root, fname))
I would use os.walk()
.
you could do something like:
for root, dirs, files in os.walk(YOUR_BASE_DIR):
for f in files:
if f.endswith(".xlsx"):
os.remove(os.path.join(root, f))
The above will remove ALL xlsx file, regardless of what sub-directory they're in. Should be easy enough to modify it to screen for directory name.
If you want to use listdir()
, I'd recommend recursively walking a directory, adding all files to a queue, then iterate over it and remove the appropriate ones.
Your syntax error is coming from here:
for each_folder in sorted(os.listdir(log_directory)):
print each_folder + ' is in the root'
for each_folder2 in sorted(os.listdir('%s/%s'%(log_directory,each_folder))) if os.path.isdir(each_folder2):
print '\t-' + each_folder2 + ' is a sub-folder' # ^ Here
for each_excel_file in sorted(os.listdir('%s/%s/%s'%(log_directory,each_folder, each_folder2))):
print '\t\t-' + each_excel_file + ' is a sub excel file'
You are trying to use a if statement in a for loop, you can't do that. Simply move the if
into another block:
for each_folder in sorted(os.listdir(log_directory)):
print each_folder + ' is in the root'
for each_folder2 in sorted(os.listdir('%s/%s' (log_directory,each_folder))):
if os.path.isdir(each_folder2):
print '\t-' + each_folder2 + ' is a sub-folder'
for each_excel_file in sorted(os.listdir('%s/%s/%s'%(log_directory,each_folder, each_folder2))):
print '\t\t-' + each_excel_file + ' is a sub excel file'
It's still messy code, which could undoubtedly be done a better way, but that should get rid of your current error.
Steven Rumbalski's answer seems a bit neater though :)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.