简体   繁体   English

用于删除旧文件并将最新文件保留在 Linux 目录中的脚本

[英]Script to delete old files and leave the newest one in a directory in Linux

I have a backup tool that takes database backup daily and stores them with the following format:我有一个备份工具,它每天进行数据库备份并以以下格式存储它们:

*_DATE_*.*.sql.gz

with DATE being in YYYY-MM-DD format. DATE采用YYYY-MM-DD格式。

How could I delete old files (by comparing YYYY-MM-DD in the filenames) matching the pattern above, while leaving only the newest one.如何删除与上述模式匹配的旧文件(通过比较文件名中的YYYY-MM-DD ),而只留下最新的文件。

Example:例子:

wordpress_2020-01-27_06h25m.Monday.sql.gz
wordpress_2020-01-28_06h25m.Tuesday.sql.gz
wordpress_2020-01-29_06h25m.Wednesday.sql.gz

Ath the end only the last file, meaning wordpress_2020-01-29_06h25m.Wednesday.sql.gz should remain.最后只有最后一个文件,意思是wordpress_2020-01-29_06h25m.Wednesday.sql.gz应该保留。

Assuming:假设:

  • The preceding substring left to _DATE_ portion does not contain underscores. _DATE_部分的前面子字符串不包含下划线。
  • The filenames do not contain newline characters.文件名不包含换行符。

Then would you try the following:那你会不会尝试以下方法:

for f in *.sql.gz; do
    echo "$f"
done | sort -t "_" -k 2 | head -n -1 | xargs rm --

If your head and cut commands support -z option, following code will be more robust against special characters in the filenames:如果您的headcut命令支持-z选项,则以下代码对文件名中的特殊字符将更加健壮:

for f in *.sql.gz; do
    [[ $f =~ _([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})_ ]] && \
        printf "%s\t%s\0" "${BASH_REMATCH[1]}" "$f"
done | sort -z | head -z -n -1 | cut -z -f 2- | xargs -0 rm --
  • It makes use of the NUL character as a line delimiter and allows any special characters in the filenames.它使用NUL字符作为行分隔符,并允许文件名中包含任何特殊字符。
  • It first extracts the DATE portion from the filename, then prepend it to the filename as a first field separated by a tab character.它首先从文件名中提取DATE部分,然后将其作为由制表符分隔的第一个字段添加到文件名中。
  • Then it sorts the files with the DATE string, exclude the last (newest) one, then retrieve the filename cutting the first field off, then remove those files.然后它使用DATE字符串对文件进行排序,排除最后一个(最新的)文件,然后检索删除第一个字段的文件名,然后删除这些文件。

Goto the folder where you have *_DATE_*.*.sql.gz files and try below command转到您拥有*_DATE_*.*.sql.gz文件的文件夹并尝试以下命令

ls -ltr *.sql.gz|awk '{print $9}'|awk '/2020/{print $0}' |xargs rm

or或者

use

`ls -ltr |grep '2019-05-20'|awk '{print $9}'|xargs rm` 

replace /2020/ with the pattern you want to delete./2020/替换为您要删除的模式。 example 2020-05-01 replace as /2020-05-01/示例2020-05-01替换为/2020-05-01/

Since the pattern (glob) you present us is very generic, we have to make an assumption here.由于您向我们展示的模式 (glob) 非常通用,因此我们必须在这里做出假设。

assumption: the date pattern, is the first sequence that matches the regex [0-9]{4}-[0-9]{2}-[0-9]{2}假设:日期模式是与正则表达式[0-9]{4}-[0-9]{2}-[0-9]{2}匹配的第一个序列

Files are of the form: constant_string_<DATE>_*.sql.gz文件格式为: constant_string_<DATE>_*.sql.gz

a=( *.sql.gz )
unset a[${#a[@]}-1]
rm "${a[@]}"

Files are of the form: *_<DATE>_*.sql.gz文件格式为: *_<DATE>_*.sql.gz

Using this, it is easily done in the following way:使用它,可以通过以下方式轻松完成:

a=( *.sql.gz );
cnt=0; ref="0000-00-00"; for f in "${a[@]}"; do 
   [[ "$f" =~ [0-9]{4}(-[0-9]{2}){2} ]] \
   && [[ "$BASH_REMATCH" > "$ref" ]]    \
   && ref="${BASH_REMATCH}" && refi=$cnt
   ((++cnt))
done
unset a[cnt]
rm "${a[@]}"

[[ expression ]] <snip> [[ expression ]] <snip>
An additional binary operator, =~ , is available, with the same precedence as == and != .可以使用额外的二元运算符=~ ,其优先级与==!= When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)).使用时,运算符右侧的字符串被视为扩展正则表达式并进行相应匹配(如 regex(3) 中所示)。 The return value is 0 if the string matches the pattern, and 1 otherwise.如果字符串与模式匹配,则返回值为0 ,否则为1 If the regular expression is syntactically incorrect, the conditional expression's return value is 2 .如果正则表达式在语法上不正确,则条件表达式的返回值为2 If the shell option nocasematch is enabled, the match is performed without regard to the case of alphabetic characters.如果启用了 shell 选项nocasematch ,则执行匹配时不考虑字母字符的大小写。 Any part of the pattern may be quoted to force it to be matched as a string.可以引用模式的任何部分以强制将其作为字符串进行匹配。 Substrings matched by parenthesized subexpressions within the regular expression are saved in the array variable BASH_REMATCH .正则表达式中括号BASH_REMATCH表达式匹配的子字符串保存在数组变量BASH_REMATCH The element of BASH_REMATCH with index 0 is the portion of the string matching the entire regular expression.索引为0BASH_REMATCH元素是与整个正则表达式匹配的字符串部分。 The element of BASH_REMATCH with index n is the portion of the string matching the nth parenthesized subexpression索引为nBASH_REMATCH元素是与第 n 个带括号的子表达式匹配的字符串部分

source: man bash来源: man bash

I found this in another question.我在另一个问题中发现了这一点。 Although it serves the purpose, but it does not handle the files based on their filenames.虽然它达到了目的,但它不会根据文件名处理文件。

ls -tp | grep -v '/$' | tail -n +2 | xargs -I {} rm -- {}

Using two for loop使用两个 for 循环

#!/bin/bash
shopt -s nullglob  ##: This might not be needed but just in case
                   ##: If there are no files the glob will not expand
latest=
allfiles=()
unwantedfiles=()

for file in *_????-??-??_*.sql.gz; do
  if [[ $file =~ _([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})_ ]]; then
    allfiles+=("$file")
    [[ $file > $latest ]] && latest=$file  ##: The > is magical inside [[
  fi
done

n=${#allfiles[@]}

if ((n <= 1)); then  ##: No files or only one file don't remove it!!
  printf '%s\n' "Found ${n:-0} ${allfiles[@]:-*sql.gz} file, bye!"
  exit 0    ##: Exit gracefully instead
fi

for f in "${allfiles[@]}"; do
  [[ $latest == $f ]] && continue  ##: Skip the latest file in the loop.
  unwantedfiles+=("$f")  ##: Save all files in an array without the latest.
done

printf 'Deleting the following files: %s\n' "${unwantedfiles[*]}"

echo rm -rf "${unwantedfiles[@]}"

Relies heavily on the > test operator inside [[在很大程度上依赖于>测试操作员内部[[

You can create a new file with lower dates and should still be good.您可以创建一个具有较低日期的新文件,并且应该仍然很好。

The echo is there just to see what's going to happen.回声只是为了看看会发生什么。 Remove it if you're satisfied with the output.如果您对输出感到满意,请将其删除。

I'm actually using this script via cron now, except for the *.sql.gz part since I only have directories to match but the same date formant so I have, ????-??-??/ and only ([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}) as the regex pattern.我现在实际上是通过 cron 使用这个脚本的,除了*.sql.gz部分,因为我只有要匹配的目录但有相同的日期共振峰,所以我有, ????-??-??/并且只有([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})作为正则表达式模式。

You can use my Python script "rotate-archives" for smart delete backups.您可以使用我的 Python 脚本“rotate-archives”进行智能删除备份。 ( https://gitlab.com/k11a/rotate-archives ). https://gitlab.com/k11a/rotate-archives )。

An example of starting archives deletion:开始归档删除的示例:

rotate-archives.py test_mode=off age_from-period-amount_for_last_timeslot=7-5,31-14,365-180-5 archives_dir=/mnt/archives

As a result, there will remain archives from 7 to 30 days old with a time interval between archives of 5 days, from 31 to 364 days old with time interval between archives 14 days, from 365 days old with time interval between archives 180 days and the number of 5.因此,将保留 7 到 30 天的存档,存档时间间隔为 5 天,31 到 364 天的存档时间间隔为 14 天,存档时间间隔为 365 天,存档时间间隔为 180 天和5的数量。

But require move _date_ to beginning file name or script add current date for new files.但是需要将_date_移动到开始文件名或脚本为新文件添加当前日期。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM