简体   繁体   中英

Zero all filesize in a large directory tree (delete file content, keep files)

How can I delete the content (zero the filesize) of a large directory tree (10 GB, 1K files) but keep the entire tree structure, filenames, extensions. (If I can keep the original last write time [last content modification time] that's a bonus).

I have seen several suggestions for individual files, but can not figure out the way to make this work for the entire CWD.

def deleteContent(fName):
    with open(fName, "w"):
        pass

Running following as administrator should reset all content to an empty file and retain the lastwritetime's of the files

gci c:\temp\test\*.* -recurse | % {    
    $LastWriteTime = $PSItem.LastWriteTime
    clear-content $PSItem;
    $PSItem.LastWriteTime = $LastWriteTime
}

os.walk() returns all directories as a list of following tuple:

(directory, list of folders in the directory, list of files in the directory)

When we combine your code with os.walk() :

import os

for tuple in os.walk("top_directory"):
    files = tuple[2]
    dir = tuple[0]
    for file in files:
        with open(os.path.join(dir, file), "w"):
            pass

All good answers, but I can see two more challenges with the answers provided:

When traversing over a directory tree, you may want to limit the depth it goes to, this to protect you from very large directory trees. And secondly Windows has a limitation (enforced by Explorer) of 256 characters in the filename and path. While this limitation will produce various OS errors, there is a workaround for this.

Lets start with the workaround for the maximum length of the filepath, you can do something like the following as a workaround:

import os
import platform


def full_path_windows(filepath):
    """
    Filenames and paths have a default limitation of 256 characters in Windows.
    By inserting '\\\\?\\' at the start of the path it removes this limitation.

    This function inserts '\\\\?\\' at the start of the path, on Windows only
    Only if the path starts with '<driveletter>:\\' e.g 'C:\\'.

    It will also normalise the characters/case of the path.

    """
    if platform.system() == 'Windows':
        if filepath[1:3] == ':\\':
            return u'\\\\?\\' + os.path.normcase(filepath)
    return os.path.normcase(filepath)

There are mentions of write protect, or file in use, or any other condition which may result in not being able to write to the file, this can be checked (without actually writing) by the following:

import os

def write_access(filepath):
    """
    Usage:

    read_access(filepath)

    This function returns True if Write Access is obtained
    This function returns False if Write Access is not obtained
    This function returns False if the filepath does not exists

    filepath = must be an existing file
    """
    if os.path.isfile(filepath):
        return os.access(filepath, os.W_OK)
    return False

For setting minimum depth or maximum depth, you can do something like this:

import os


def get_all_files(rootdir, mindepth = 1, maxdepth = float('inf')):
    """
    Usage:

    get_all_files(rootdir, mindepth = 1, maxdepth = float('inf'))

    This returns a list of all files of a directory, including all files in
    subdirectories. Full paths are returned.

    WARNING: this may create a very large list if many files exists in the 
    directory and subdirectories. Make sure you set the maxdepth appropriately.

    rootdir  = existing directory to start
    mindepth = int: the level to start, 1 is start at root dir, 2 is start 
               at the sub direcories of the root dir, and-so-on-so-forth.
    maxdepth = int: the level which to report to. Example, if you only want 
               in the files of the sub directories of the root dir, 
               set mindepth = 2 and maxdepth = 2. If you only want the files
               of the root dir itself, set mindepth = 1 and maxdepth = 1
    """    
    file_paths = []
    root_depth = rootdir.rstrip(os.path.sep).count(os.path.sep) - 1
    for dirpath, dirs, files in os.walk(rootdir):
        depth = dirpath.count(os.path.sep) - root_depth
        if mindepth <= depth <= maxdepth:
            for filename in files:
                file_paths.append(os.path.join(dirpath, filename))
        elif depth > maxdepth:
            del dirs[:]  
    return file_paths

Now to roll the above code up in a single function, this should give you an idea:

import os

def clear_all_files_content(rootdir, mindepth = 1, maxdepth = float('inf')):
    not_cleared = []
    root_depth = rootdir.rstrip(os.path.sep).count(os.path.sep) - 1
    for dirpath, dirs, files in os.walk(rootdir):
        depth = dirpath.count(os.path.sep) - root_depth
        if mindepth <= depth <= maxdepth:
            for filename in files:
                filename = os.path.join(dirpath, filename)
                if filename[1:3] == ':\\':
                    filename = u'\\\\?\\' + os.path.normcase(filename)            
                if (os.path.isfile(filename) and os.access(filename, os.W_OK)):
                    with open(filename, 'w'): 
                        pass
                else:
                    not_cleared.append(filename)
        elif depth > maxdepth:
            del dirs[:]  
    return not_cleared

This does not maintain the "last write time".

It will return the list not_cleared , which you can check for files which encountered a write access problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM