Bash脚本，find命令，使用通配符或正则表达式

Question

I am writing a bash script that goes over all files in certain directory and: 我正在编写一个bash脚本，该脚本遍历某些目录中的所有文件，并且：

Picks the files with names that match a specified pattern 选择名称与指定模式匹配的文件
Sorts them by data and time (date and time are part of the filename) 按数据和时间对它们进行排序（日期和时间是文件名的一部分）
Takes X oldest files 需要X个最旧的文件
Performs certain operations on them 对它们执行某些操作

The pattern used to match the files is passed to the script and looks like: 用于匹配文件的模式将传递给脚本，如下所示：

someprefix_[cats|dogs]_[oranges|apples|tomatos]_[2|3]*.txt

I tried to implement it as following (fields 6 and 7 at the pattern are assumed to contain date and time): 我尝试如下实现它（假定模式中的字段6和7包含日期和时间）：

FILES=`find . -name "$PATTERN” | sort -t_ -k6 | head -n $NUM_OF_FILES`

It doesn't work. 没用 Tried various options with -name and -regex .... Most examples online are for much less complicated patterns. 尝试了使用-name和-regex ...的各种选项。在线上的大多数示例都使用了不太复杂的模式。 Since there might be hundreds of thousands of files to go through, I am looking for a solution that works efficiently. 由于可能要处理数十万个文件，因此我正在寻找一种有效的解决方案。 I would like to avoid using sed for readability reasons. 由于可读性原因，我想避免使用sed。

Answer 1

Your find regex must match the entire path returned by find. 您的find正则表达式必须与find返回的整个路径匹配。 For example if you are searching somedir/ for your files, then your regex must match, eg 例如，如果您正在搜索somedir/寻找文件，则您的正则表达式必须匹配，例如

somedir/prefix_cats_apples_2.txt

Complicating the picture, is you have multiple types of regex you can use by changing the -regextype option to find , eg emacs (default), posix-awk, posix-basic, posix-egrep, posix-extended . 使图片复杂化的是，您有多种正则表达式类型，可以通过更改-regextype选项来find ，例如emacs (default), posix-awk, posix-basic, posix-egrep, posix-extended 。 ( posix-basic has no alteration capability) （ posix-basic没有更改功能）

posix-egrep is probably the most transferable between your tools like grep, sed, find, etc.. A posix-egrep regex for your pattern searching for the files in somedir/ would be: posix-egrep可能是grep, sed, find, etc..工具之间最可移植的grep, sed, find, etc..用于模式搜索somedir/的文件的posix-egrep regex为：

'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$'

Matching against a test with your filenames (with the ending number ranging 0-3 to show the exclusion of files ending in 0, 1 ) the following example files were used: 针对具有文件名的测试（以0-3结尾的数字表示排除以0, 1结尾的文件），使用了以下示例文件：

$ls -1 somedir/
prefix_cats_apples_0.txt
prefix_cats_apples_1.txt
prefix_cats_apples_2.txt
prefix_cats_apples_3.txt
prefix_cats_oranges_0.txt
prefix_cats_oranges_1.txt
prefix_cats_oranges_2.txt
prefix_cats_oranges_3.txt
prefix_cats_tomatos_0.txt
prefix_cats_tomatos_1.txt
prefix_cats_tomatos_2.txt
prefix_cats_tomatos_3.txt
prefix_dogs_apples_0.txt
prefix_dogs_apples_1.txt
prefix_dogs_apples_2.txt
prefix_dogs_apples_3.txt
prefix_dogs_oranges_0.txt
prefix_dogs_oranges_1.txt
prefix_dogs_oranges_2.txt
prefix_dogs_oranges_3.txt
prefix_dogs_tomatos_0.txt
prefix_dogs_tomatos_1.txt
prefix_dogs_tomatos_2.txt
prefix_dogs_tomatos_3.txt

Now matching only files that satisfy your criteria and passing for a general sort would yield: 现在仅匹配满足您条件的文件并通过常规sort将产生：

$ find somedir/ -regextype posix-egrep -regex 'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$' | sort
somedir/prefix_cats_apples_2.txt
somedir/prefix_cats_apples_3.txt
somedir/prefix_cats_oranges_2.txt
somedir/prefix_cats_oranges_3.txt
somedir/prefix_cats_tomatos_2.txt
somedir/prefix_cats_tomatos_3.txt
somedir/prefix_dogs_apples_2.txt
somedir/prefix_dogs_apples_3.txt
somedir/prefix_dogs_oranges_2.txt
somedir/prefix_dogs_oranges_3.txt
somedir/prefix_dogs_tomatos_2.txt
somedir/prefix_dogs_tomatos_3.txt

Since you didn't provide an example of where the time/date was in the filenames, the sorting by time/date is left to you. 由于您没有提供时间/日期在文件名中的位置的示例，因此按时间/日期进行的排序留给您。 Let me know if you have further questions. 如果您还有其他问题，请告诉我。

Answer 2

Assuming that 假如说

your sorting/filtering logic is OK 您的排序/过滤逻辑还可以
you do not require a recursive search 您不需要递归搜索
you have no newlines in your filenames 您的文件名中没有换行符

I would use this: 我会用这个：

printf '%s\n' someprefix_{cats,dogs}_{oranges,apples,tomatos}_[23]*.txt \
    | sort -t_ -k6 \
    | head -n $NUM_OF_FILES

This uses the shell's built-in glob expansion capability to generate the list of files. 这使用外壳程序的内置glob扩展功能来生成文件列表。 Each result is printed on a separate line. 每个结果都打印在单独的行上。 The output is processed using the same pipeline as in your question. 使用与您的问题相同的管道处理输出。

Answer 3

the default type of regex that matches with the find function are the Emacs regex so the notation for the patterns vary a bit. 与find函数匹配的默认正则表达式类型是Emacs正则表达式，因此模式的符号略有不同。

If I understood your pattern correctly, here is the matching command that works: 如果我正确理解了您的模式，则以下匹配命令可以正常工作：

find . '.*_\(cats\|dogs\)_\(oranges\|apples\|tomatos\)_\(2\|3\).*\.txt'

You can find any information you need about regex types and syntax for emacs here . 您可以在此处找到有关正则表达式类型和emacs语法所需的任何信息。

Hope that helped 希望能有所帮助

Bash脚本，find命令，使用通配符或正则表达式

问题描述

3 个解决方案

解决方案1
2 已采纳 2019-06-14 15:21:46

解决方案2
1 2019-06-14 14:28:09

解决方案3
-1 2019-06-14 13:45:14

Bash脚本，find命令，使用通配符或正则表达式

问题描述

3 个解决方案

解决方案1 2 已采纳 2019-06-14 15:21:46

解决方案2 1 2019-06-14 14:28:09

解决方案3 -1 2019-06-14 13:45:14

解决方案1
2 已采纳 2019-06-14 15:21:46

解决方案2
1 2019-06-14 14:28:09

解决方案3
-1 2019-06-14 13:45:14