简体   繁体   English

“查找”文件中包含指定范围内的整数(以bash表示)

[英]'find' files containing an integer in a specified range (in bash)

You'd think I could find an answer to this already somewhere, but I am struggling to do so. 您可能以为我已经在某个地方找到了答案,但是我很难做到这一点。 I want to find some log files with names like 我想找到一些名称如下的日志文件

myfile_3.log

however I only want to find the ones with numbers in a certain range. 但是我只想找到数字在一定范围内的数字。 I tried things like this: 我尝试过这样的事情:

find <path> -name myfile_{0..67}.log #error: find: paths must precede expression
find <path> -name myfile_[0-67].log #only return 0-7, not 67
find <path> -name myfile_[0,67].log #only returns 0,6,7
find <path> -name myfile_*([0,67]).log # returns only 0,6,7,60,66,67,70,76,77

Any other ideas? 还有其他想法吗?

If you want to match an integer range using regular expression, use the option -regex in the your find command. 如果要使用正则表达式匹配整数范围,请在find命令中使用选项-regex

For example to match all files from 0 to 67, use this: 例如,要匹配从0到67的所有文件,请使用以下命令:

find <path> -regextype egrep -regex '.*file([0-5][0-9]|6[0-7])\.txt'

There are 2 parts in the regex: 正则表达式包含两个部分:

  • [0-5][0-9] matches the range 0-59 [0-5][0-9]匹配范围0-59
  • 6[0-7] matches the range 60-67 6[0-7]匹配范围60-67

Note the option -regextype egrep to have extended regular expression. 注意选项-regextype egrep具有扩展的正则表达式。
Note also the option -regex matches the whole filename, including path, that's the reason of .* at the beginning of the regex. 还要注意,选项-regex与整个文件名匹配,包括路径,这就是正则表达式开头的.*的原因。

You can do this simply and concisely, but admittedly not very efficiently, with GNU Parallel : 使用GNU Parallel ,您可以简单,简洁地完成此操作,但要承认的是效率不高:

parallel find . -name "*file{}.txt" ::: {0..67}

In case, you are wondering why I say it is not that efficient, it is because it starts 68 parallel instances of find - each looking for a different number in the filename... but that may be ok. 如果您想知道为什么我说它没有那么有效,那是因为它启动了68个并行的find实例-每个实例都在文件名中寻找不同的数字...但这可能没问题。

The following will find all files named myfile_X.log - whereby the X part is a digit ranging from 0-67. 以下将查找所有名为myfile_X.log文件X部分为0-67之间的数字。

find <path> -type f | grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"

Explanation: 说明:

  • -type f finds files whose type is f ile. -type f查找文件,其类型为f ILE。

  • | pipes the filepath(s) to grep for filtering. 通过管道将文件路径传输到grep进行过滤。

  • grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\\.log$" performs an extended ( -E ) regexp to find the last part of the path (ie the filename) which: grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\\.log$"执行扩展的( -E )正则表达式来查找的最后一部分路径(即文件名):

    • begins with myfile_ myfile_
    • followed with a digit(s) ranging from 0-67. 后跟一个数字,范围从0到67。
    • ends with .log .log结尾

Edit: 编辑:

Alternatively, as suggested by @ghoti in the comments, you can utilize the -regex option in the find command instead of piping to grep . 或者,如@ghoti在注释中建议的那样 ,您可以在find命令中使用-regex选项,而不是管道传递给grep For example: 例如:

find -E <path> -type f -regex ".*/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"

Note: The regexp is very similar to the previous grep example shown previously. 注意: regexp与前面显示的上一个grep示例非常相似。 However, it begins with .*/ to match all parts of the filepath up to and including the final forward slash. 但是,它以.*/开头,以匹配文件路径的所有部分,直到并包括最后的正斜杠。 For some reason, unknown to me, the .*/ part is not necessary with grep 1 . 由于某些原因(我不知道), grep 1不需要.*/部分。


Footnotes: 脚注:

1 If any readers know why the ERE utilized with find's -regex option requires the initial .* and the same ERE with grep does not - then please leave a comment. 1 如果任何读者都知道为什么将find与-regex选项一起使用的ERE需要初始.* ,而grep不需要相同的ERE,则请发表评论。 You'll make me sleep better at night ;) 你会让我晚上睡得更好;)


One possibility is to build up the range from several ranges that can be matched by glob patterns. 一种可能性是从可以由glob模式匹配的几个范围建立范围。 For example: 例如:

find . -name 'myfile_[0-9].log' -o -name 'myfile_[1-5][0-9].log' -o -name 'myfile_6[0-7].log'

You cannot represent a general range with a regular expression, although you can craft a regex for a specific range. 尽管可以为特定范围制作正则表达式,但不能用正则表达式表示常规范围。 Better use find to get files with a number and filter the output with another tool that perform the range checking, like awk. 更好地使用find来获取带有编号的文件,并使用另一个执行范围检查的工具(例如awk)过滤输出。

START=0
END=67
while IFS= read -r -d '' file
do
    N=$(echo "$file" | sed 's/file_\([0-9]\+\).log/\1/')
    if [ "$N" -ge "$START" -a "$N" -le "$END" ]
    then
        echo "$file"
    fi
done < <(find <path> -name "myfile_*.log" -print0)

In that script, you perform a find of all the files that have the desired pattern, then you loop through the found files and sed is used to capture the number in the filename. 在该脚本中,您将find具有所需模式的所有文件,然后遍历找到的文件,并使用sed捕获文件名中的数字。 Finally, you compare that number with your range limits. 最后,您将该数字与范围限制进行比较。 If the comparisons succeed, the file is printed. 如果比较成功,则打印文件。

There are many other answers that give you a regex for the specific range in the example, but they are not general. 在示例中,还有许多其他答案可以为您提供针对特定范围的正则表达式,但它们并不通用。 Any of them allows for easy modification of the range involved. 它们中的任何一个都可以轻松修改所涉及的范围。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM