简体   繁体   中英

Scan files recursively and delete empty directories in python

I have the following structure:

Dir 1
|___Dir 2
   |___file 1
   |___file 2...
Dir 3
|___Dir 4
   |___file 3...

I would like to be able to find each file recursively, process the file in my own way, once done, delete the file, move to the next. Then if the directory is empty, delete that as as well, working my way up until nothing is left.

Just no sure how to proceed.

This is what I have:

for root, dirs, files in os.walk(dir):
    path = root.split('/')
    for file in files:
        file = os.path.join(root, file)
        process_file(file)
        os.remove(file)

Which is fine, but I would like then to delete the subdirs if and only they are empty.

Well, I guess this will do, have to run os.walk though...

def get_files(src_dir):
# traverse root directory, and list directories as dirs and files as files
    for root, dirs, files in os.walk(src_dir):
        path = root.split('/')
        for file in files:
            process(os.path.join(root, file))
                    os.remove(os.path.join(root, file))

def del_dirs(src_dir):
    for dirpath, _, _ in os.walk(src_dir, topdown=False):  # Listing the files
        if dirpath == src_dir:
            break
        try:
            os.rmdir(dirpath)
        except OSError as ex:
            print(ex)


def main():
    get_files(src_dir)
    del_dirs(src_dir)


if __name__ == "__main__":
    main()

I realize this post is older and there may be no point in adding an additional example, but at a glance I thought it would be easier for a beginner to grasp than some of the others here because there's no joining, it only imports one module, and it gives good examples of how to use some built-in functions [open() & len()] and new Python3 string formatting with str.format. It also shows how simple populating contents to a file is in the print() function, using file = filename.

This script will scan a root directory with os.walk(), check the length of directories and files and perform conditions based on what it finds. It also increments a counter to determine the number of directories used & empty, and it outputs the information to a file. I wrote this example in Python 3.4, and it worked for my purposes. If anyone has ideas for improving the logic please post in the comments so we can all learn a new perspective to solving the problem.

import os
#declare the root directory
root_dir = 'C:\\tempdir\\directory\\directory\\'
#initialize the counters
empty_count = 0
used_count = 0
#Set the file to write to. 'x' will indicate to create a new file and open it for writing
outfile = open('C:\\tempdir\\directories.txt', 'x')
for curdir, subdirs, files in os.walk(root_dir):
    if len(subdirs) == 0 and len(files) == 0: #check for empty directories. len(files) == 0 may be overkill
        empty_count += 1 #increment empty_count
        print('Empty directory: {}'.format(curdir), file = outfile) #add empty results to file
        os.rmdir(curdir) #delete the directory
    elif len(subdirs) > 0 and len(files) > 0: #check for used directories
        used_count += 1 #increment used_count
        print('Used directory: {}'.format(curdir), file = outfile) #add used results to file

#add the counters to the file
print('empty_count: {}\nused_count: {}'.format(empty_count, used_count), file = outfile) 
outfile.close() #close the file

Here is another solution which I think is efficient. Of course, efficiency can be improve by using os.scandir .

First, I define a general purpose rec_rmdir function (reccursive rmdir) which browse directory tree recursively.

  • The function process each files and each sub-directory first.
  • Then it tries to remove the current directory.
  • The preserve flag is used to preserve the root directory.

The algorithm is a classic Depth-first search .

import os
import stat


def rec_rmdir(root, callback, preserve=True):
    for path in (os.path.join(root, p) for p in os.listdir(root)):
        st = os.stat(path)
        if stat.S_ISREG(st.st_mode):
            callback(path)
        elif stat.S_ISDIR(st.st_mode):
            rec_rmdir(path, callback, preserve=False)
    if not preserve:
        try:
            os.rmdir(root)
        except IOError:
            pass

Then, it is easy to define a function which process the file and remove it.

def process_file_and_remove(path):
    # process the file
    # ...
    os.remove(path)

Classic usage:

rec_rmdir("/path/to/root", process_file_and_remove)

Looks like am late to the party. Nevertheless, here's another solution that can help beginners.

Imports

import os

from contextlib import suppress

Include in an appropriate function

# Loop for processing files
for root, _, files in os.walk(dir):
    path = root.split('/')
    for file in files:
        file = os.path.join(root, file)

        # Assuming process_file() returns True on success
        if process_file(file):
            os.remove(file)

Include in an appropriate function

# Loop for deleting empty directories
for root, _, _ in os.walk(dir):
        # Ignore directory not empty errors; nothing can be done about it if we want
        # to retain files that failed to be processsed. The entire deletion would
        # hence be silent.
        with suppress(OSError):
            os.removedirs(root)

This is just for removing empty directories and also pulling out single files of directories. It seems to only answer one part of the question, sorry.

I added a loop at the end to keep trying till it can't find anymore. I made the function return a count of removed directories.

My access denied errors were fixed by: shutil.rmtree fails on Windows with 'Access is denied'

import os
import shutil


def onerror(func, path, exc_info):
    """
    Error handler for ``shutil.rmtree``.

    If the error is due to an access error (read only file)
    it attempts to add write permission and then retries.

    If the error is for another reason it re-raises the error.

    Usage : ``shutil.rmtree(path, ignore_errors=False, onerror=onerror)``
    """
    import stat

    if not os.access(path, os.W_OK):
        # Is the error an access error ?
        os.chmod(path, stat.S_IWUSR)
        func(path)
    else:
        raise


def get_empty_dirs(path):
    # count of removed directories
    count = 0
    # traverse root directory, and list directories as dirs and files as files
    for root, dirs, files in os.walk(path):
        try:
            # if a directory is empty there will be no sub-directories or files
            if len(dirs) is 0 and len(files) is 0:
                print u"deleting " + root
                # os.rmdir(root)
                shutil.rmtree(root, ignore_errors=False, onerror=onerror)
                count += 1
            # if a directory has one file lets pull it out.
            elif len(dirs) is 0 and len(files) is 1:
                print u"moving " + os.path.join(root, files[0]) + u" to " + os.path.dirname(root)
                shutil.move(os.path.join(root, files[0]), os.path.dirname(root))
                print u"deleting " + root
                # os.rmdir(root)
                shutil.rmtree(root, ignore_errors=False, onerror=onerror)
                count += 1
        except WindowsError, e:
            # I'm getting access denied errors when removing directory.
            print e
        except shutil.Error, e:
            # Path your moving to already exists
            print e
    return count


def get_all_empty_dirs(path):
    # loop till break
    total_count = 0
    while True:
        # count of removed directories
        count = get_empty_dirs(path)
        total_count += count
        # if no removed directories you are done.
        if count >= 1:
            print u"retrying till count is 0, currently count is: %d" % count
        else:
            break

    print u"Total directories removed: %d" % total_count
    return total_count


count = get_all_empty_dirs(os.getcwdu())  # current directory
count += get_all_empty_dirs(u"o:\\downloads\\")  # other directory
print u"Total of all directories removed: %d" % count
import os

#Top level of tree you wish to delete empty directories from.
currentDir = r'K:\AutoCAD Drafting Projects\USA\TX\Image Archive'

index = 0

for root, dirs, files in os.walk(currentDir):
    for dir in dirs:
        newDir = os.path.join(root, dir)
        index += 1
        print str(index) + " ---> " + newDir

        try:
            os.removedirs(newDir)
            print "Directory empty! Deleting..."
            print " "
        except:
            print "Directory not empty and will not be removed"
            print " "

Nice and simple. The key is using os.removedirs under a try statement. It is already recursive.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM