简体   繁体   English

Bash脚本,find命令,使用通配符或正则表达式

[英]Bash script, find command, using wildcards or regex

I am writing a bash script that goes over all files in certain directory and: 我正在编写一个bash脚本,该脚本遍历某些目录中的所有文件,并且:

  1. Picks the files with names that match a specified pattern 选择名称与指定模式匹配的文件
  2. Sorts them by data and time (date and time are part of the filename) 按数据和时间对它们进行排序(日期和时间是文件名的一部分)
  3. Takes X oldest files 需要X个最旧的文件
  4. Performs certain operations on them 对它们执行某些操作

The pattern used to match the files is passed to the script and looks like: 用于匹配文件的模式将传递给脚本,如下所示:

someprefix_[cats|dogs]_[oranges|apples|tomatos]_[2|3]*.txt

I tried to implement it as following (fields 6 and 7 at the pattern are assumed to contain date and time): 我尝试如下实现它(假定模式中的字段6和7包含日期和时间):

FILES=`find . -name "$PATTERN” | sort -t_ -k6 | head -n $NUM_OF_FILES`

It doesn't work. 没用 Tried various options with -name and -regex .... Most examples online are for much less complicated patterns. 尝试了使用-name-regex ...的各种选项。在线上的大多数示例都使用了不太复杂的模式。 Since there might be hundreds of thousands of files to go through, I am looking for a solution that works efficiently. 由于可能要处理数十万个文件,因此我正在寻找一种有效的解决方案。 I would like to avoid using sed for readability reasons. 由于可读性原因,我想避免使用sed。

Your find regex must match the entire path returned by find. 您的find正则表达式必须与find返回的整个路径匹配。 For example if you are searching somedir/ for your files, then your regex must match, eg 例如,如果您正在搜索somedir/寻找文件,则您的正则表达式必须匹配,例如

somedir/prefix_cats_apples_2.txt

Complicating the picture, is you have multiple types of regex you can use by changing the -regextype option to find , eg emacs (default), posix-awk, posix-basic, posix-egrep, posix-extended . 使图片复杂化的是,您有多种正则表达式类型,可以通过更改-regextype选项来find ,例如emacs (default), posix-awk, posix-basic, posix-egrep, posix-extended ( posix-basic has no alteration capability) posix-basic没有更改功能)

posix-egrep is probably the most transferable between your tools like grep, sed, find, etc.. A posix-egrep regex for your pattern searching for the files in somedir/ would be: posix-egrep可能是grep, sed, find, etc..工具之间最可移植的grep, sed, find, etc..用于模式搜索somedir/的文件的posix-egrep regex为:

'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$'

Matching against a test with your filenames (with the ending number ranging 0-3 to show the exclusion of files ending in 0, 1 ) the following example files were used: 针对具有文件名的测试(以0-3结尾的数字表示排除以0, 1结尾的文件),使用了以下示例文件:

$ls -1 somedir/
prefix_cats_apples_0.txt
prefix_cats_apples_1.txt
prefix_cats_apples_2.txt
prefix_cats_apples_3.txt
prefix_cats_oranges_0.txt
prefix_cats_oranges_1.txt
prefix_cats_oranges_2.txt
prefix_cats_oranges_3.txt
prefix_cats_tomatos_0.txt
prefix_cats_tomatos_1.txt
prefix_cats_tomatos_2.txt
prefix_cats_tomatos_3.txt
prefix_dogs_apples_0.txt
prefix_dogs_apples_1.txt
prefix_dogs_apples_2.txt
prefix_dogs_apples_3.txt
prefix_dogs_oranges_0.txt
prefix_dogs_oranges_1.txt
prefix_dogs_oranges_2.txt
prefix_dogs_oranges_3.txt
prefix_dogs_tomatos_0.txt
prefix_dogs_tomatos_1.txt
prefix_dogs_tomatos_2.txt
prefix_dogs_tomatos_3.txt

Now matching only files that satisfy your criteria and passing for a general sort would yield: 现在仅匹配满足您条件的文件并通过常规sort将产生:

$ find somedir/ -regextype posix-egrep -regex 'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$' | sort
somedir/prefix_cats_apples_2.txt
somedir/prefix_cats_apples_3.txt
somedir/prefix_cats_oranges_2.txt
somedir/prefix_cats_oranges_3.txt
somedir/prefix_cats_tomatos_2.txt
somedir/prefix_cats_tomatos_3.txt
somedir/prefix_dogs_apples_2.txt
somedir/prefix_dogs_apples_3.txt
somedir/prefix_dogs_oranges_2.txt
somedir/prefix_dogs_oranges_3.txt
somedir/prefix_dogs_tomatos_2.txt
somedir/prefix_dogs_tomatos_3.txt

Since you didn't provide an example of where the time/date was in the filenames, the sorting by time/date is left to you. 由于您没有提供时间/日期在文件名中的位置的示例,因此按时间/日期进行的排序留给您。 Let me know if you have further questions. 如果您还有其他问题,请告诉我。

Assuming that 假如说

  • your sorting/filtering logic is OK 您的排序/过滤逻辑还可以
  • you do not require a recursive search 您不需要递归搜索
  • you have no newlines in your filenames 您的文件名中没有换行符

I would use this: 我会用这个:

printf '%s\n' someprefix_{cats,dogs}_{oranges,apples,tomatos}_[23]*.txt \
    | sort -t_ -k6 \
    | head -n $NUM_OF_FILES

This uses the shell's built-in glob expansion capability to generate the list of files. 这使用外壳程序的内置glob扩展功能来生成文件列表。 Each result is printed on a separate line. 每个结果都打印在单独的行上。 The output is processed using the same pipeline as in your question. 使用与您的问题相同的管道处理输出。

the default type of regex that matches with the find function are the Emacs regex so the notation for the patterns vary a bit. 与find函数匹配的默认正则表达式类型是Emacs正则表达式,因此模式的符号略有不同。

If I understood your pattern correctly, here is the matching command that works: 如果我正确理解了您的模式,则以下匹配命令可以正常工作:

find . '.*_\(cats\|dogs\)_\(oranges\|apples\|tomatos\)_\(2\|3\).*\.txt'

You can find any information you need about regex types and syntax for emacs here . 您可以在此处找到有关正则表达式类型和emacs语法所需的任何信息。

Hope that helped 希望能有所帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM