[英]How to find latest modified files and delete them with SHELL code
I need some help with a shell code. 我需要一些有关Shell代码的帮助。 Now I have this code:
现在我有这段代码:
find $dirname -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35-
This code finds duplicated files (with same content) in a given directory. 此代码在给定目录中查找重复的文件(内容相同)。 What I need to do is to update it - find out latest (by date) modified file (from duplicated files list), print that file name and also give opportunity to delete that file in terminal.
我需要做的是更新它-从重复的文件列表中找出最新的(按日期)修改的文件,打印该文件名,还提供在终端中删除该文件的机会。
Doing this in pure bash is a tad awkward, it would be a lot easier to write this in perl or python. 纯bash中这样做是一点点尴尬,这将是一个更容易在Perl或Python写这个。
Also, if you were looking to do this with a bash one-liner, it might be feasible, but I really don't know how. 另外,如果您希望使用bash单线执行此操作,这可能是可行的,但我真的不知道如何。
Anyhoo, if you really want a pure bash solution below is an attempt at doing what you describe. Anyhoo,如果您真的想要以下纯bash解决方案,请尝试做您描述的事情。
Please note that: 请注意:
Here's the code: 这是代码:
#!/bin/bash
buffer=''
function process {
if test -n "$buffer"
then
nbFiles=$(printf "%s" "$buffer" | wc -l)
echo "================================================================================="
echo "The following $nbFiles files are byte identical and sorted from oldest to newest:"
ls -lt -c -r $buffer
lastFile=$(ls -lt -c -r $buffer | tail -1)
echo
while true
do
read -u 1 -p "Do you wish to delete the last file $lastFile (y/n/q)? " answer
case $answer in
[Yy]* ) echo rm $lastFile; break;;
[Nn]* ) echo skipping; break;;
[Qq]* ) exit;;
* ) echo "please answer yes, no or quit";;
esac
done
echo
fi
}
find . -type f -exec md5sum '{}' ';' |
sort |
uniq --all-repeated=separate -w 33 |
cut -c 35- |
while read -r line
do
if test -z "$line"
then
process
buffer=''
else
buffer=$(printf "%s\n%s" "$buffer" "$line")
fi
done
process
echo "done"
Here's a "naive" solution implemented in bash (except for two external commands: md5sum
, of course, and stat
used only for user's comfort, it's not part of the algorithm). 这是一个用bash实现的“幼稚”解决方案(除了两个外部命令:
md5sum
(当然, stat
仅用于用户舒适度,它不是算法的一部分))。 The thing implements a 100% Bash quicksort (that I'm kind of proud of): 这件事实现了100%Bash快速排序(我为此感到自豪):
#!/bin/bash
# Finds similar (based on md5sum) files (recursively) in given
# directory. If several files with same md5sum are found, sort
# them by modified (most recent first) and prompt user for deletion
# of the oldest
die() {
printf >&2 '%s\n' "$@"
exit 1
}
quicksort_files_by_mod_date() {
if ((!$#)); then
qs_ret=()
return
fi
# the return array is qs_ret
local first=$1
shift
local newers=()
local olders=()
qs_ret=()
for i in "$@"; do
if [[ $i -nt $first ]]; then
newers+=( "$i" )
else
olders+=( "$i" )
fi
done
quicksort_files_by_mod_date "${newers[@]}"
newers=( "${qs_ret[@]}" )
quicksort_files_by_mod_date "${olders[@]}"
olders=( "${qs_ret[@]}" )
qs_ret=( "${newers[@]}" "$first" "${olders[@]}" )
}
[[ -n $1 ]] || die "Must give an argument"
[[ -d $1 ]] || die "Argument must be a directory"
dirname=$1
shopt -s nullglob
shopt -s globstar
declare -A files
declare -A hashes
for file in "$dirname"/**; do
[[ -f $file ]] || continue
read md5sum _ < <(md5sum -- "$file")
files[$file]=$md5sum
((hashes[$md5sum]+=1))
done
has_found=0
for hash in "${!hashes[@]}"; do
((hashes[$hash]>1)) || continue
files_with_same_md5sum=()
for file in "${!files[@]}"; do
[[ ${files[$file]} = $hash ]] || continue
files_with_same_md5sum+=( "$file" )
done
has_found=1
echo "Found ${hashes[$hash]} files with md5sum=$hash, sorted by modified (most recent first):"
# sort them by modified date (using quicksort :p)
quicksort_files_by_mod_date "${files_with_same_md5sum[@]}"
for file in "${qs_ret[@]}"; do
printf " %s %s\n" "$(stat --printf '%y' -- "$file")" "$file"
done
read -p "Do you want to remove the oldest? [yn] " answer
if [[ ${answer,,} = y ]]; then
echo rm -fv -- "${qs_ret[@]:1}"
fi
done
if((!has_found)); then
echo "Didn't find any similar files in directory \`$dirname'. Yay."
fi
I guess the script is self-explanatory (you can read it like a story). 我猜该脚本是不言自明的(您可以像读故事一样阅读它)。 It uses the best practices I know of, and is 100% safe regarding any silly characters in file names (eg, spaces, newlines, file names starting with hyphens, file names ending with a newline, etc.).
它使用了我所知道的最佳实践,并且对于文件名中的任何愚蠢字符(例如,空格,换行符,以连字符开头的文件名,以换行符结尾的文件名等)都是100%安全的。
It uses bash's globs, so it might be a bit slow if you have a bloated directory tree. 它使用bash的glob,因此如果您拥有directory肿的目录树,可能会有点慢。
There are a few error checkings, but many are missing, so don't use as-is in production! 错误检查很少,但许多错误检查遗失,因此请勿在生产中使用原样! (it's a trivial but rather tedious taks to add these).
(添加这些内容很琐碎但很乏味)。
The algorithm is as follows: scan each file in the given directory tree; 算法如下:扫描给定目录树中的每个文件; for each file, will compute its md5sum and store in associative arrays:
对于每个文件,将计算其md5sum并存储在关联数组中:
files
with keys the file names and values the md5sums. files
与密钥的文件名称和值MD5SUMS。 hashes
with keys the hashes and values the number of files the md5sum of which is the key. hashes
用密钥哈希和值文件的md5sum这是关键的数量。 After this is done, we'll scan through all the found md5sum, select only the ones that correspond to more than one file, then select all files with this md5sum, then quicksort them by modified date, and prompt the user. 完成此操作后,我们将扫描找到的所有md5sum,仅选择与多个文件相对应的文件,然后选择具有此md5sum的所有文件,然后按修改日期对它们进行快速排序,并提示用户。
A sweet effect when no dups are found: the script nicely informs the user about it. 没有找到重复时的甜蜜效果:该脚本很好地将其通知用户。
I would not say it's the most efficient way of doing things (might be better in, eg, Perl), but it's really a lot of fun, surprisingly easy to read and follow, and you can potentially learn a lot by studying it! 我不会说这是做事的最有效方法(例如在Perl中可能会更好),但这确实很有趣,令人惊讶地易于阅读和遵循,并且您可以通过学习而学到很多东西!
It uses a few bashisms and features that only are in bash version ≥ 4 它使用了仅在bash版本≥4中使用的一些bashisms和功能
Hope this helps! 希望这可以帮助!
Remark. 备注。 If on your system
date
has the -r
switch, you can replace the stat
command by: 如果在系统
date
具有-r
开关,则可以通过以下方式替换stat
命令:
date -r "$file"
Remark. 备注。 I left the
echo
in front of rm
. 我把
echo
放在rm
前面。 Remove it if you're happy with how the script behaves. 如果对脚本的行为感到满意,请将其删除。 Then you'll have a script that uses 3 external commands
:)
. 然后,您将拥有一个使用3个外部命令的脚本
:)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.