简体   繁体   中英

Iterating through directories and removing files based off extensions in Python

I have a file structure that emulates the following: 在此处输入图片说明

I always have a folder, that folder contains an EXCEL folder and a bunch of text documents. Each EXCEL folder has a bunch of .xlsx files.

This same idea can be replicated an infinite amount of times following that same structure. I am trying to go into each EXCEL folder for each directory, remove all files with a .xlsx extension and continue this process until all the EXCEL folders have been visited.

This is a little bit of code I am failing with:

def clean_out_excel_test_data():
    #For each folder in the test_log directory
        #Open each folder
            #for each_folder that contains the word EXCEL
                #open each_folder
                    #for each file in each_folder, remove it

    log_directory = "test_log_data/"

    for each_folder in sorted(os.listdir(log_directory)):
        print each_folder + ' is in the root'
        for each_folder2 in sorted(os.listdir('%s/%s'%(log_directory,each_folder))) if os.path.isdir(each_folder2):
            print '\t-' + each_folder2 + ' is a sub-folder'
            for each_excel_file in sorted(os.listdir('%s/%s/%s'%(log_directory,each_folder, each_folder2))):
                print '\t\t-' + each_excel_file + ' is a sub excel file'

I realize my code is garbage, but I wanted to at least show what I am going for.

Let os.walk handle the directory traversal for you:

for root, dirs, files in os.walk('/path/to/test_log_data'):
    if 'EXCEL' not in root:
        continue
    for fname in files:
        if fname.endswith('.xlsx'):
            os.remove(os.path.join(root, fname))

I would use os.walk() .

you could do something like:

for root, dirs, files in os.walk(YOUR_BASE_DIR):
    for f in files:
        if f.endswith(".xlsx"):
            os.remove(os.path.join(root, f))

The above will remove ALL xlsx file, regardless of what sub-directory they're in. Should be easy enough to modify it to screen for directory name.

If you want to use listdir() , I'd recommend recursively walking a directory, adding all files to a queue, then iterate over it and remove the appropriate ones.

Your syntax error is coming from here:

for each_folder in sorted(os.listdir(log_directory)):
    print each_folder + ' is in the root' 
    for each_folder2 in sorted(os.listdir('%s/%s'%(log_directory,each_folder))) if os.path.isdir(each_folder2):
        print '\t-' + each_folder2 + ' is a sub-folder' #                       ^ Here
        for each_excel_file in sorted(os.listdir('%s/%s/%s'%(log_directory,each_folder, each_folder2))):
            print '\t\t-' + each_excel_file + ' is a sub excel file'

You are trying to use a if statement in a for loop, you can't do that. Simply move the if into another block:

for each_folder in sorted(os.listdir(log_directory)):
    print each_folder + ' is in the root'
    for each_folder2 in sorted(os.listdir('%s/%s' (log_directory,each_folder))): 
        if os.path.isdir(each_folder2):
            print '\t-' + each_folder2 + ' is a sub-folder'
            for each_excel_file in sorted(os.listdir('%s/%s/%s'%(log_directory,each_folder, each_folder2))):
                print '\t\t-' + each_excel_file + ' is a sub excel file'

It's still messy code, which could undoubtedly be done a better way, but that should get rid of your current error.

Steven Rumbalski's answer seems a bit neater though :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM