简体   繁体   中英

Recursively delete all files except a certain number in each directory

I have a large collection of files contained in directories for testing. I need to keep the directory structure for my application but want to thin out the files for faster testing. I want to limit the number of files a directory can have to 3. How can I do that in linux?

To clarify what I would like to accomplish, a solution in Python:

import sys, os
for root, dirs, files in os.walk(sys.argv[1]):
    for index, file in enumerate(files):
        if index > int(sys.argv[2]) - 1: os.remove(os.path.join(root, file))

Usage:

python thinout.py /path/to/thin\ out/ <maximum_number_of_files_per_directory>

Example:

python thinout.py testing\ data 3

I found a smiliar question about doing this for one directory, but not recursively .

I would do something like this in bash:

for dir in `find . -type d`; pushd $dir; rm `ls | awk 'NR>3'`; popd; done;

Or this version might be better:

for dir in `find . -type d`; pushd $dir; rm `find . -maxdepth 1 -type f | tail -n +3`; popd; done;

Of course - just randomly deleting all but the first 3 files in the directory is always a little risky. Buyer beware...

By the way, I did not test this myself. Just typed in what came to mind. You'll likely have to tweak it a little to get it to work right. Again, buyer beware.

This quite lengthy sequence will work with files containing spaces etc., and just leave the first three alphabetically sorted files in each subdir.

EDIT: applied mklement 's improvement to cope with directories that need escaping.

find /var/testfiles/ -type d -print0 | while IFS= read -r -d '' subdir; \
do cd "$subdir"; find . -mindepth 1 -maxdepth 1 -type f -print0 | \
sort --zero-terminated | tr '\0' '\n' | tail -n+4 | tr '\n' '\0' | \
xargs --null --no-run-if-empty rm ; cd "$OLDPWD" ; done

Since my version of tail doesn't support a --zero or --null flag for line terminators, I had to work around that with tr . Suggestions for improvements are welcome.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM