如何查找最新的修改文件并使用SHELL代码删除它们

Question

我需要一些有关Shell代码的帮助。 现在我有这段代码：

find $dirname -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35-

此代码在给定目录中查找重复的文件（内容相同）。 我需要做的是更新它-从重复的文件列表中找出最新的（按日期）修改的文件，打印该文件名，还提供在终端中删除该文件的机会。

Answer 1

纯bash中这样做是一点点尴尬，这将是一个更容易在Perl或Python写这个。

另外，如果您希望使用bash单线执行此操作，这可能是可行的，但我真的不知道如何。

Anyhoo，如果您真的想要以下纯bash解决方案，请尝试做您描述的事情。

请注意：

我实际上不是在呼叫rm，只是回声-不想破坏您的文件
我不完全满意其中的“ read -u 1”。

这是代码：

#!/bin/bash

buffer=''

function process {
    if test -n "$buffer"
    then
        nbFiles=$(printf "%s" "$buffer" | wc -l)
        echo "================================================================================="
        echo "The following $nbFiles files are byte identical and sorted from oldest to newest:"
        ls -lt -c -r $buffer
        lastFile=$(ls -lt -c -r $buffer | tail -1)
        echo

        while true
        do
            read -u 1 -p "Do you wish to delete the last file $lastFile (y/n/q)? " answer
            case $answer in
                [Yy]* ) echo rm $lastFile; break;;
                [Nn]* ) echo skipping; break;;
                [Qq]* ) exit;;
                * ) echo "please answer yes, no or quit";;
            esac
        done
        echo
    fi
}

find . -type f -exec md5sum '{}' ';' |
sort                                 |
uniq --all-repeated=separate -w 33   |
cut -c 35-                           |
while read -r line
do
    if test -z "$line"
    then
        process
        buffer=''
    else
        buffer=$(printf "%s\n%s" "$buffer" "$line")
    fi
done
process

echo "done"

Answer 2

这是一个用bash实现的“幼稚”解决方案（除了两个外部命令： md5sum （当然， stat仅用于用户舒适度，它不是算法的一部分））。 这件事实现了100％Bash快速排序（我为此感到自豪）：

#!/bin/bash

# Finds similar (based on md5sum) files (recursively) in given
# directory. If several files with same md5sum are found, sort
# them by modified (most recent first) and prompt user for deletion
# of the oldest

die() {
   printf >&2 '%s\n' "$@"
   exit 1
}

quicksort_files_by_mod_date() {
    if ((!$#)); then
        qs_ret=()
        return
    fi
    # the return array is qs_ret
    local first=$1
    shift
    local newers=()
    local olders=()
    qs_ret=()
    for i in "$@"; do
        if [[ $i -nt $first ]]; then
            newers+=( "$i" )
        else
            olders+=( "$i" )
        fi
    done
    quicksort_files_by_mod_date "${newers[@]}"
    newers=( "${qs_ret[@]}" )
    quicksort_files_by_mod_date "${olders[@]}"
    olders=( "${qs_ret[@]}" )
    qs_ret=( "${newers[@]}" "$first" "${olders[@]}" )
}

[[ -n $1 ]] || die "Must give an argument"
[[ -d $1 ]] || die "Argument must be a directory"

dirname=$1

shopt -s nullglob
shopt -s globstar

declare -A files
declare -A hashes

for file in "$dirname"/**; do
    [[ -f $file ]] || continue
    read md5sum _ < <(md5sum -- "$file")
    files[$file]=$md5sum
    ((hashes[$md5sum]+=1))
done

has_found=0
for hash in "${!hashes[@]}"; do
    ((hashes[$hash]>1)) || continue
    files_with_same_md5sum=()
    for file in "${!files[@]}"; do
        [[ ${files[$file]} = $hash ]] || continue
        files_with_same_md5sum+=( "$file" )
    done
    has_found=1
    echo "Found ${hashes[$hash]} files with md5sum=$hash, sorted by modified (most recent first):"
    # sort them by modified date (using quicksort :p)
    quicksort_files_by_mod_date "${files_with_same_md5sum[@]}"
    for file in "${qs_ret[@]}"; do
      printf "   %s %s\n" "$(stat --printf '%y' -- "$file")" "$file"
    done
    read -p "Do you want to remove the oldest? [yn] " answer
    if [[ ${answer,,} = y ]]; then
       echo rm -fv -- "${qs_ret[@]:1}"
    fi
done

if((!has_found)); then
    echo "Didn't find any similar files in directory \`$dirname'. Yay."
fi

我猜该脚本是不言自明的（您可以像读故事一样阅读它）。 它使用了我所知道的最佳实践，并且对于文件名中的任何愚蠢字符（例如，空格，换行符，以连字符开头的文件名，以换行符结尾的文件名等）都是100％安全的。

它使用bash的glob，因此如果您拥有directory肿的目录树，可能会有点慢。

错误检查很少，但许多错误检查遗失，因此请勿在生产中使用原样！ （添加这些内容很琐碎但很乏味）。

算法如下：扫描给定目录树中的每个文件； 对于每个文件，将计算其md5sum并存储在关联数组中：

files与密钥的文件名称和值MD5SUMS。
hashes用密钥哈希和值文件的md5sum这是关键的数量。

完成此操作后，我们将扫描找到的所有md5sum，仅选择与多个文件相对应的文件，然后选择具有此md5sum的所有文件，然后按修改日期对它们进行快速排序，并提示用户。

没有找到重复时的甜蜜效果：该脚本很好地将其通知用户。

我不会说这是做事的最有效方法（例如在Perl中可能会更好），但这确实很有趣，令人惊讶地易于阅读和遵循，并且您可以通过学习而学到很多东西！

它使用了仅在bash版本≥4中使用的一些bashisms和功能

希望这可以帮助！

备注。 如果在系统date具有-r开关，则可以通过以下方式替换stat命令：

date -r "$file"

备注。 我把echo放在rm前面。 如果对脚本的行为感到满意，请将其删除。 然后，您将拥有一个使用3个外部命令的脚本:) 。

如何查找最新的修改文件并使用SHELL代码删除它们

问题描述

2 个解决方案

解决方案1
0 2013-10-30 23:16:22

解决方案2
0 2013-10-31 09:48:13

如何查找最新的修改文件并使用SHELL代码删除它们

问题描述

2 个解决方案

解决方案1 0 2013-10-30 23:16:22

解决方案2 0 2013-10-31 09:48:13

解决方案1
0 2013-10-30 23:16:22

解决方案2
0 2013-10-31 09:48:13