[英]Bash script, find command, using wildcards or regex
I am writing a bash script that goes over all files in certain directory and: 我正在编写一个bash脚本,该脚本遍历某些目录中的所有文件,并且:
The pattern used to match the files is passed to the script and looks like: 用于匹配文件的模式将传递给脚本,如下所示:
someprefix_[cats|dogs]_[oranges|apples|tomatos]_[2|3]*.txt
I tried to implement it as following (fields 6 and 7 at the pattern are assumed to contain date and time): 我尝试如下实现它(假定模式中的字段6和7包含日期和时间):
FILES=`find . -name "$PATTERN” | sort -t_ -k6 | head -n $NUM_OF_FILES`
It doesn't work. 没用 Tried various options with
-name
and -regex
.... Most examples online are for much less complicated patterns. 尝试了使用
-name
和-regex
...的各种选项。在线上的大多数示例都使用了不太复杂的模式。 Since there might be hundreds of thousands of files to go through, I am looking for a solution that works efficiently. 由于可能要处理数十万个文件,因此我正在寻找一种有效的解决方案。 I would like to avoid using sed for readability reasons.
由于可读性原因,我想避免使用sed。
Your find
regex must match the entire path returned by find. 您的
find
正则表达式必须与find返回的整个路径匹配。 For example if you are searching somedir/
for your files, then your regex must match, eg 例如,如果您正在搜索
somedir/
寻找文件,则您的正则表达式必须匹配,例如
somedir/prefix_cats_apples_2.txt
Complicating the picture, is you have multiple types of regex you can use by changing the -regextype
option to find
, eg emacs (default), posix-awk, posix-basic, posix-egrep, posix-extended
. 使图片复杂化的是,您有多种正则表达式类型,可以通过更改
-regextype
选项来find
,例如emacs (default), posix-awk, posix-basic, posix-egrep, posix-extended
。 ( posix-basic
has no alteration capability) (
posix-basic
没有更改功能)
posix-egrep
is probably the most transferable between your tools like grep, sed, find, etc..
A posix-egrep
regex for your pattern searching for the files in somedir/
would be: posix-egrep
可能是grep, sed, find, etc..
工具之间最可移植的grep, sed, find, etc..
用于模式搜索somedir/
的文件的posix-egrep
regex为:
'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$'
Matching against a test with your filenames (with the ending number ranging 0-3
to show the exclusion of files ending in 0, 1
) the following example files were used: 针对具有文件名的测试(以
0-3
结尾的数字表示排除以0, 1
结尾的文件),使用了以下示例文件:
$ls -1 somedir/
prefix_cats_apples_0.txt
prefix_cats_apples_1.txt
prefix_cats_apples_2.txt
prefix_cats_apples_3.txt
prefix_cats_oranges_0.txt
prefix_cats_oranges_1.txt
prefix_cats_oranges_2.txt
prefix_cats_oranges_3.txt
prefix_cats_tomatos_0.txt
prefix_cats_tomatos_1.txt
prefix_cats_tomatos_2.txt
prefix_cats_tomatos_3.txt
prefix_dogs_apples_0.txt
prefix_dogs_apples_1.txt
prefix_dogs_apples_2.txt
prefix_dogs_apples_3.txt
prefix_dogs_oranges_0.txt
prefix_dogs_oranges_1.txt
prefix_dogs_oranges_2.txt
prefix_dogs_oranges_3.txt
prefix_dogs_tomatos_0.txt
prefix_dogs_tomatos_1.txt
prefix_dogs_tomatos_2.txt
prefix_dogs_tomatos_3.txt
Now matching only files that satisfy your criteria and passing for a general sort
would yield: 现在仅匹配满足您条件的文件并通过常规
sort
将产生:
$ find somedir/ -regextype posix-egrep -regex 'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$' | sort
somedir/prefix_cats_apples_2.txt
somedir/prefix_cats_apples_3.txt
somedir/prefix_cats_oranges_2.txt
somedir/prefix_cats_oranges_3.txt
somedir/prefix_cats_tomatos_2.txt
somedir/prefix_cats_tomatos_3.txt
somedir/prefix_dogs_apples_2.txt
somedir/prefix_dogs_apples_3.txt
somedir/prefix_dogs_oranges_2.txt
somedir/prefix_dogs_oranges_3.txt
somedir/prefix_dogs_tomatos_2.txt
somedir/prefix_dogs_tomatos_3.txt
Since you didn't provide an example of where the time/date was in the filenames, the sorting by time/date is left to you. 由于您没有提供时间/日期在文件名中的位置的示例,因此按时间/日期进行的排序留给您。 Let me know if you have further questions.
如果您还有其他问题,请告诉我。
Assuming that 假如说
I would use this: 我会用这个:
printf '%s\n' someprefix_{cats,dogs}_{oranges,apples,tomatos}_[23]*.txt \
| sort -t_ -k6 \
| head -n $NUM_OF_FILES
This uses the shell's built-in glob expansion capability to generate the list of files. 这使用外壳程序的内置glob扩展功能来生成文件列表。 Each result is printed on a separate line.
每个结果都打印在单独的行上。 The output is processed using the same pipeline as in your question.
使用与您的问题相同的管道处理输出。
the default type of regex that matches with the find function are the Emacs regex so the notation for the patterns vary a bit. 与find函数匹配的默认正则表达式类型是Emacs正则表达式,因此模式的符号略有不同。
If I understood your pattern correctly, here is the matching command that works: 如果我正确理解了您的模式,则以下匹配命令可以正常工作:
find . '.*_\(cats\|dogs\)_\(oranges\|apples\|tomatos\)_\(2\|3\).*\.txt'
You can find any information you need about regex types and syntax for emacs here . 您可以在此处找到有关正则表达式类型和emacs语法所需的任何信息。
Hope that helped 希望能有所帮助
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.