[英]Script to delete old files and leave the newest one in a directory in Linux
I have a backup tool that takes database backup daily and stores them with the following format:我有一个备份工具,它每天进行数据库备份并以以下格式存储它们:
*_DATE_*.*.sql.gz
with DATE
being in YYYY-MM-DD
format. DATE
采用YYYY-MM-DD
格式。
How could I delete old files (by comparing YYYY-MM-DD
in the filenames) matching the pattern above, while leaving only the newest one.如何删除与上述模式匹配的旧文件(通过比较文件名中的
YYYY-MM-DD
),而只留下最新的文件。
Example:例子:
wordpress_2020-01-27_06h25m.Monday.sql.gz
wordpress_2020-01-28_06h25m.Tuesday.sql.gz
wordpress_2020-01-29_06h25m.Wednesday.sql.gz
Ath the end only the last file, meaning wordpress_2020-01-29_06h25m.Wednesday.sql.gz
should remain.最后只有最后一个文件,意思是
wordpress_2020-01-29_06h25m.Wednesday.sql.gz
应该保留。
Assuming:假设:
_DATE_
portion does not contain underscores. _DATE_
部分的前面子字符串不包含下划线。 Then would you try the following:那你会不会尝试以下方法:
for f in *.sql.gz; do
echo "$f"
done | sort -t "_" -k 2 | head -n -1 | xargs rm --
If your head
and cut
commands support -z
option, following code will be more robust against special characters in the filenames:如果您的
head
和cut
命令支持-z
选项,则以下代码对文件名中的特殊字符将更加健壮:
for f in *.sql.gz; do
[[ $f =~ _([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})_ ]] && \
printf "%s\t%s\0" "${BASH_REMATCH[1]}" "$f"
done | sort -z | head -z -n -1 | cut -z -f 2- | xargs -0 rm --
NUL
character as a line delimiter and allows any special characters in the filenames.NUL
字符作为行分隔符,并允许文件名中包含任何特殊字符。DATE
portion from the filename, then prepend it to the filename as a first field separated by a tab character.DATE
部分,然后将其作为由制表符分隔的第一个字段添加到文件名中。DATE
string, exclude the last (newest) one, then retrieve the filename cutting the first field off, then remove those files.DATE
字符串对文件进行排序,排除最后一个(最新的)文件,然后检索删除第一个字段的文件名,然后删除这些文件。Goto the folder where you have *_DATE_*.*.sql.gz
files and try below command转到您拥有
*_DATE_*.*.sql.gz
文件的文件夹并尝试以下命令
ls -ltr *.sql.gz|awk '{print $9}'|awk '/2020/{print $0}' |xargs rm
or或者
use用
`ls -ltr |grep '2019-05-20'|awk '{print $9}'|xargs rm`
replace /2020/
with the pattern you want to delete.将
/2020/
替换为您要删除的模式。 example 2020-05-01
replace as /2020-05-01/
示例
2020-05-01
替换为/2020-05-01/
Since the pattern (glob) you present us is very generic, we have to make an assumption here.由于您向我们展示的模式 (glob) 非常通用,因此我们必须在这里做出假设。
assumption: the date pattern, is the first sequence that matches the regex
[0-9]{4}-[0-9]{2}-[0-9]{2}
假设:日期模式是与正则表达式
[0-9]{4}-[0-9]{2}-[0-9]{2}
匹配的第一个序列
Files are of the form: constant_string_<DATE>_*.sql.gz
文件格式为:
constant_string_<DATE>_*.sql.gz
a=( *.sql.gz )
unset a[${#a[@]}-1]
rm "${a[@]}"
Files are of the form: *_<DATE>_*.sql.gz
文件格式为:
*_<DATE>_*.sql.gz
Using this, it is easily done in the following way:使用它,可以通过以下方式轻松完成:
a=( *.sql.gz );
cnt=0; ref="0000-00-00"; for f in "${a[@]}"; do
[[ "$f" =~ [0-9]{4}(-[0-9]{2}){2} ]] \
&& [[ "$BASH_REMATCH" > "$ref" ]] \
&& ref="${BASH_REMATCH}" && refi=$cnt
((++cnt))
done
unset a[cnt]
rm "${a[@]}"
[[ expression ]]
<snip>[[ expression ]]
<snip>
An additional binary operator,=~
, is available, with the same precedence as==
and!=
.可以使用额外的二元运算符
=~
,其优先级与==
和!=
。 When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)).使用时,运算符右侧的字符串被视为扩展正则表达式并进行相应匹配(如 regex(3) 中所示)。 The return value is
0
if the string matches the pattern, and1
otherwise.如果字符串与模式匹配,则返回值为
0
,否则为1
。 If the regular expression is syntactically incorrect, the conditional expression's return value is2
.如果正则表达式在语法上不正确,则条件表达式的返回值为
2
。 If the shell optionnocasematch
is enabled, the match is performed without regard to the case of alphabetic characters.如果启用了 shell 选项
nocasematch
,则执行匹配时不考虑字母字符的大小写。 Any part of the pattern may be quoted to force it to be matched as a string.可以引用模式的任何部分以强制将其作为字符串进行匹配。 Substrings matched by parenthesized subexpressions within the regular expression are saved in the array variable
BASH_REMATCH
.正则表达式中括号
BASH_REMATCH
表达式匹配的子字符串保存在数组变量BASH_REMATCH
。 The element ofBASH_REMATCH
with index0
is the portion of the string matching the entire regular expression.索引为
0
的BASH_REMATCH
元素是与整个正则表达式匹配的字符串部分。 The element ofBASH_REMATCH
with indexn
is the portion of the string matching the nth parenthesized subexpression索引为
n
的BASH_REMATCH
元素是与第 n 个带括号的子表达式匹配的字符串部分source:
man bash
来源:
man bash
I found this in another question.我在另一个问题中发现了这一点。 Although it serves the purpose, but it does not handle the files based on their filenames.
虽然它达到了目的,但它不会根据文件名处理文件。
ls -tp | grep -v '/$' | tail -n +2 | xargs -I {} rm -- {}
Using two for loop使用两个 for 循环
#!/bin/bash
shopt -s nullglob ##: This might not be needed but just in case
##: If there are no files the glob will not expand
latest=
allfiles=()
unwantedfiles=()
for file in *_????-??-??_*.sql.gz; do
if [[ $file =~ _([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})_ ]]; then
allfiles+=("$file")
[[ $file > $latest ]] && latest=$file ##: The > is magical inside [[
fi
done
n=${#allfiles[@]}
if ((n <= 1)); then ##: No files or only one file don't remove it!!
printf '%s\n' "Found ${n:-0} ${allfiles[@]:-*sql.gz} file, bye!"
exit 0 ##: Exit gracefully instead
fi
for f in "${allfiles[@]}"; do
[[ $latest == $f ]] && continue ##: Skip the latest file in the loop.
unwantedfiles+=("$f") ##: Save all files in an array without the latest.
done
printf 'Deleting the following files: %s\n' "${unwantedfiles[*]}"
echo rm -rf "${unwantedfiles[@]}"
Relies heavily on the >
test operator inside [[
在很大程度上依赖于
>
测试操作员内部[[
You can create a new file with lower dates and should still be good.您可以创建一个具有较低日期的新文件,并且应该仍然很好。
The echo is there just to see what's going to happen.回声只是为了看看会发生什么。 Remove it if you're satisfied with the output.
如果您对输出感到满意,请将其删除。
I'm actually using this script via cron now, except for the *.sql.gz
part since I only have directories to match but the same date formant so I have, ????-??-??/
and only ([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})
as the regex pattern.我现在实际上是通过 cron 使用这个脚本的,除了
*.sql.gz
部分,因为我只有要匹配的目录但有相同的日期共振峰,所以我有, ????-??-??/
并且只有([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})
作为正则表达式模式。
You can use my Python script "rotate-archives" for smart delete backups.您可以使用我的 Python 脚本“rotate-archives”进行智能删除备份。 ( https://gitlab.com/k11a/rotate-archives ).
( https://gitlab.com/k11a/rotate-archives )。
An example of starting archives deletion:开始归档删除的示例:
rotate-archives.py test_mode=off age_from-period-amount_for_last_timeslot=7-5,31-14,365-180-5 archives_dir=/mnt/archives
As a result, there will remain archives from 7 to 30 days old with a time interval between archives of 5 days, from 31 to 364 days old with time interval between archives 14 days, from 365 days old with time interval between archives 180 days and the number of 5.因此,将保留 7 到 30 天的存档,存档时间间隔为 5 天,31 到 364 天的存档时间间隔为 14 天,存档时间间隔为 365 天,存档时间间隔为 180 天和5的数量。
But require move _date_
to beginning file name or script add current date for new files.但是需要将
_date_
移动到开始文件名或脚本为新文件添加当前日期。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.