如何查找最新的修改文件并使用SHELL代码删除它们

Question

I need some help with a shell code. 我需要一些有关Shell代码的帮助。 Now I have this code: 现在我有这段代码：

find $dirname -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35-

This code finds duplicated files (with same content) in a given directory. 此代码在给定目录中查找重复的文件（内容相同）。 What I need to do is to update it - find out latest (by date) modified file (from duplicated files list), print that file name and also give opportunity to delete that file in terminal. 我需要做的是更新它-从重复的文件列表中找出最新的（按日期）修改的文件，打印该文件名，还提供在终端中删除该文件的机会。

Answer 1

Doing this in pure bash is a tad awkward, it would be a lot easier to write this in perl or python. 纯bash中这样做是一点点尴尬，这将是一个更容易在Perl或Python写这个。

Also, if you were looking to do this with a bash one-liner, it might be feasible, but I really don't know how. 另外，如果您希望使用bash单线执行此操作，这可能是可行的，但我真的不知道如何。

Anyhoo, if you really want a pure bash solution below is an attempt at doing what you describe. Anyhoo，如果您真的想要以下纯bash解决方案，请尝试做您描述的事情。

Please note that: 请注意：

I am not actually calling rm, just echoing it - don't want to destroy your files 我实际上不是在呼叫rm，只是回声-不想破坏您的文件
There's a "read -u 1" in there that I'm not entirely happy with. 我不完全满意其中的“ read -u 1”。

Here's the code: 这是代码：

#!/bin/bash

buffer=''

function process {
    if test -n "$buffer"
    then
        nbFiles=$(printf "%s" "$buffer" | wc -l)
        echo "================================================================================="
        echo "The following $nbFiles files are byte identical and sorted from oldest to newest:"
        ls -lt -c -r $buffer
        lastFile=$(ls -lt -c -r $buffer | tail -1)
        echo

        while true
        do
            read -u 1 -p "Do you wish to delete the last file $lastFile (y/n/q)? " answer
            case $answer in
                [Yy]* ) echo rm $lastFile; break;;
                [Nn]* ) echo skipping; break;;
                [Qq]* ) exit;;
                * ) echo "please answer yes, no or quit";;
            esac
        done
        echo
    fi
}

find . -type f -exec md5sum '{}' ';' |
sort                                 |
uniq --all-repeated=separate -w 33   |
cut -c 35-                           |
while read -r line
do
    if test -z "$line"
    then
        process
        buffer=''
    else
        buffer=$(printf "%s\n%s" "$buffer" "$line")
    fi
done
process

echo "done"

Answer 2

Here's a "naive" solution implemented in bash (except for two external commands: md5sum , of course, and stat used only for user's comfort, it's not part of the algorithm). 这是一个用bash实现的“幼稚”解决方案（除了两个外部命令： md5sum （当然， stat仅用于用户舒适度，它不是算法的一部分））。 The thing implements a 100% Bash quicksort (that I'm kind of proud of): 这件事实现了100％Bash快速排序（我为此感到自豪）：

#!/bin/bash

# Finds similar (based on md5sum) files (recursively) in given
# directory. If several files with same md5sum are found, sort
# them by modified (most recent first) and prompt user for deletion
# of the oldest

die() {
   printf >&2 '%s\n' "$@"
   exit 1
}

quicksort_files_by_mod_date() {
    if ((!$#)); then
        qs_ret=()
        return
    fi
    # the return array is qs_ret
    local first=$1
    shift
    local newers=()
    local olders=()
    qs_ret=()
    for i in "$@"; do
        if [[ $i -nt $first ]]; then
            newers+=( "$i" )
        else
            olders+=( "$i" )
        fi
    done
    quicksort_files_by_mod_date "${newers[@]}"
    newers=( "${qs_ret[@]}" )
    quicksort_files_by_mod_date "${olders[@]}"
    olders=( "${qs_ret[@]}" )
    qs_ret=( "${newers[@]}" "$first" "${olders[@]}" )
}

[[ -n $1 ]] || die "Must give an argument"
[[ -d $1 ]] || die "Argument must be a directory"

dirname=$1

shopt -s nullglob
shopt -s globstar

declare -A files
declare -A hashes

for file in "$dirname"/**; do
    [[ -f $file ]] || continue
    read md5sum _ < <(md5sum -- "$file")
    files[$file]=$md5sum
    ((hashes[$md5sum]+=1))
done

has_found=0
for hash in "${!hashes[@]}"; do
    ((hashes[$hash]>1)) || continue
    files_with_same_md5sum=()
    for file in "${!files[@]}"; do
        [[ ${files[$file]} = $hash ]] || continue
        files_with_same_md5sum+=( "$file" )
    done
    has_found=1
    echo "Found ${hashes[$hash]} files with md5sum=$hash, sorted by modified (most recent first):"
    # sort them by modified date (using quicksort :p)
    quicksort_files_by_mod_date "${files_with_same_md5sum[@]}"
    for file in "${qs_ret[@]}"; do
      printf "   %s %s\n" "$(stat --printf '%y' -- "$file")" "$file"
    done
    read -p "Do you want to remove the oldest? [yn] " answer
    if [[ ${answer,,} = y ]]; then
       echo rm -fv -- "${qs_ret[@]:1}"
    fi
done

if((!has_found)); then
    echo "Didn't find any similar files in directory \`$dirname'. Yay."
fi

I guess the script is self-explanatory (you can read it like a story). 我猜该脚本是不言自明的（您可以像读故事一样阅读它）。 It uses the best practices I know of, and is 100% safe regarding any silly characters in file names (eg, spaces, newlines, file names starting with hyphens, file names ending with a newline, etc.). 它使用了我所知道的最佳实践，并且对于文件名中的任何愚蠢字符（例如，空格，换行符，以连字符开头的文件名，以换行符结尾的文件名等）都是100％安全的。

It uses bash's globs, so it might be a bit slow if you have a bloated directory tree. 它使用bash的glob，因此如果您拥有directory肿的目录树，可能会有点慢。

There are a few error checkings, but many are missing, so don't use as-is in production! 错误检查很少，但许多错误检查遗失，因此请勿在生产中使用原样！ (it's a trivial but rather tedious taks to add these). （添加这些内容很琐碎但很乏味）。

The algorithm is as follows: scan each file in the given directory tree; 算法如下：扫描给定目录树中的每个文件； for each file, will compute its md5sum and store in associative arrays: 对于每个文件，将计算其md5sum并存储在关联数组中：

files with keys the file names and values the md5sums. files与密钥的文件名称和值MD5SUMS。
hashes with keys the hashes and values the number of files the md5sum of which is the key. hashes用密钥哈希和值文件的md5sum这是关键的数量。

After this is done, we'll scan through all the found md5sum, select only the ones that correspond to more than one file, then select all files with this md5sum, then quicksort them by modified date, and prompt the user. 完成此操作后，我们将扫描找到的所有md5sum，仅选择与多个文件相对应的文件，然后选择具有此md5sum的所有文件，然后按修改日期对它们进行快速排序，并提示用户。

A sweet effect when no dups are found: the script nicely informs the user about it. 没有找到重复时的甜蜜效果：该脚本很好地将其通知用户。

I would not say it's the most efficient way of doing things (might be better in, eg, Perl), but it's really a lot of fun, surprisingly easy to read and follow, and you can potentially learn a lot by studying it! 我不会说这是做事的最有效方法（例如在Perl中可能会更好），但这确实很有趣，令人惊讶地易于阅读和遵循，并且您可以通过学习而学到很多东西！

It uses a few bashisms and features that only are in bash version ≥ 4 它使用了仅在bash版本≥4中使用的一些bashisms和功能

Hope this helps! 希望这可以帮助！

Remark. 备注。 If on your system date has the -r switch, you can replace the stat command by: 如果在系统date具有-r开关，则可以通过以下方式替换stat命令：

date -r "$file"

Remark. 备注。 I left the echo in front of rm . 我把echo放在rm前面。 Remove it if you're happy with how the script behaves. 如果对脚本的行为感到满意，请将其删除。 Then you'll have a script that uses 3 external commands :) . 然后，您将拥有一个使用3个外部命令的脚本:) 。

如何查找最新的修改文件并使用SHELL代码删除它们

问题描述

2 个解决方案

解决方案1
0 2013-10-30 23:16:22

解决方案2
0 2013-10-31 09:48:13

如何查找最新的修改文件并使用SHELL代码删除它们

问题描述

2 个解决方案

解决方案1 0 2013-10-30 23:16:22

解决方案2 0 2013-10-31 09:48:13

解决方案1
0 2013-10-30 23:16:22

解决方案2
0 2013-10-31 09:48:13