[英]how to remove only the duplication file under some directory ( with the same cksum )
我构建以下脚本,以删除具有相同cksum(或content)的文件
问题在于,该脚本可以两次删除文件,如以下示例所示(输出)
我的目标是仅删除复制文件,而不是源文件,
脚本输出:
Starting:
Same: /tmp/File_inventury.out /tmp/File_inventury.out.1
Remove: /tmp/File_inventury.out.1
Same: /tmp/File_inventury.out.1 /tmp/File_inventury.out
Remove: /tmp/File_inventury.out
Same: /tmp/File_inventury.out.2 /tmp/File_inventury.out.3
Remove: /tmp/File_inventury.out.3
Same: /tmp/File_inventury.out.3 /tmp/File_inventury.out.2
Remove: /tmp/File_inventury.out.2
Same: /tmp/File_inventury.out.4 /tmp/File_inventury.out
Remove: /tmp/File_inventury.out
Done.
。
我的脚本:
#!/bin/bash
DIR="/tmp"
echo "Starting:"
for file1 in ${DIR}/File_inventury.out*; do
for file2 in ${DIR}/File_inventury.out*; do
if [ $file1 != $file2 ]; then
diff "$file1" "$file2" 1>/dev/null
STAT=$?
if [ $STAT -eq 0 ]
then
echo "Same: $file1 $file2"
echo "Remove: $file2"
rm "$file1"
break
fi
fi
done
done
echo "Done."
。
无论如何,我想听听–关于如何删除具有相同内容或cksum的文件的其他选择(实际上只需要删除重复文件,而不是主文件)
请建议我们如何在solaris操作系统下做到这一点(例如,选择-查找一个liner,awk,sed等)。
这个版本应该更有效。 我对paste
匹配正确的行感到不安,但是看起来POSIX指定默认对glob'ing进行排序。
for i in *; do
date -u +%Y-%m-%dT%TZ -r "$i";
done > .stat; #store the last modification time in a sortable format
cksum * > .cksum; #store the cksum, size, and filename
paste .stat .cksum | #data for each file, 1 per row
sort | #sort by mtime so original comes first
awk '{
if($2 in f)
system("rm -v " $4); #rm if we have seen an occurrence of this cksum
else
f[$2]++ #count the first occurrence
}'
这应该以O(n * log(n))
时间运行,每个文件仅读取一次。
您可以将其放在shell脚本中,如下所示:
#!/bin/sh
for i in *; do
date -u +%Y-%m-%dT%TZ -r "$i";
done > .stat;
cksum * > .cksum;
paste .stat .cksum | sort | awk '{if($2 in f) system("rm -v " $4); else f[$2]++}';
rm .stat .cksum;
exit 0;
或单线执行:
for i in *; do date -u +%Y-%m-%dT%TZ -r "$i"; done > .stat; cksum * > .cksum; paste .stat .cksum | sort | awk '{if($2 in f) system("rm -v " $4); else f[$2]++}'; rm .stat .cksum;
我使用数组作为索引映射。 所以我认为这只是O(n)吗?
#!/bin/bash
arr=()
dels=()
for f in $1; do
read ck x fn <<< $(cksum $f)
if [[ -z ${arr[$ck]} ]]; then
arr[$ck]=$fn
else
echo "Same: ${arr[$ck]} $fn"
echo "Remove: $fn"
rm $fn
fi
done
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.