简体   繁体   English

尝试使用 GNU find 递归搜索仅在文件名的任何部分包含字符串的文件名(不是目录)

[英]Trying to use GNU find to search recursively for filenames only (not directories) containing a string in any portion of the file name

Trying to find a command that is flexible enough to allow for some variations of the string, but not other variations of it.试图找到一个足够灵活的命令,以允许字符串的某些变体,但不允许字符串的其他变体。

For instance, I am looking for audio files that have some variation of " rain " in the filename only (rains, raining, rained, rainbow, rainfall, a dark rain cloud, etc), whether at the beginning, end or middle of the filename.例如,我正在寻找仅在文件名中有一些“”变体的音频文件(下雨、下雨、下雨、彩虹、降雨、乌云等),无论是在开头、结尾还是中间文件名。

However, this also includes words like "brain", "train", "grain", "drain", "Lorraine", et al, which are not wanted (basically any word that has nothing to do with the concept of rain).然而,这也包括诸如“大脑”、“火车”、“谷物”、“排水”、“洛林”等想要的词(基本上任何与雨的概念无关的词)。

Something like this fails:这样的事情失败了:

find . -name '*rain*' ! -name '*brain*'| more

And I'm having no luck with even getting started on building a successful regex variant because I cannot wrap my mind around regex ... for instance, this doesn't do anything:而且我什至开始构建成功的正则表达式变体都没有运气,因为我无法将注意力集中在正则表达式上……例如,这没有任何作用:

# this is incomplete, just a stub of where I was going
# -type f also still includes a directory name
find . -regextype findutils-default -iregex '\(*rain*\)' -type f  

Any help would be greatly appreciated.任何帮助将不胜感激。 If I could see a regex command that does everything I want it to do, with an explanation of each character in the command, it would help me learn more about regex with the find command in general.如果我能看到一个正则表达式命令,它可以完成我想要它做的所有事情,并且解释了命令中的每个字符,那么它将帮助我更多地了解使用 find 命令的正则表达式。


edit 1:编辑1:

Taking cues from all the feedback so far from jhnc and Seth Falco, I have tried this:从 jhnc 和 Seth Falco 迄今为止的所有反馈中获取线索,我已经尝试过这个:

find . -type f | grep -Pi '(?<![a-zA-Z])rain'

I think this pretty much works (I don't think it is missing anything), my only issue with it is that it also matches on occurrences of "rain" further up the path, not only in the file name.认为这非常有效(我认为它没有遗漏任何东西),我唯一的问题是它也匹配路径更远的“雨”的出现,不仅在文件名中。 So I get example output like this:所以我得到这样的示例输出:

./Linux/path/to/radiohead - 2007 - in rainbows/09 Jigsaw Falling Into Place.mp3

Since "rain" is not in the filename itself, this is a result I'd rather not see.由于“rain”不在文件名本身中,这是我不想看到的结果。 So I tried this:所以我尝试了这个:

find . -type f -printf '%f\n' | grep -Pi '(?<![a-zA-Z])rain'

That does ensure that only filenames are matched, but it also does not output the paths to the filenames, which I would still like to see, so I know where the file is.这确实确保只有文件名匹配,但它也不会输出文件名的路径,我仍然希望看到,所以我知道文件在哪里。

So I guess what I really need is a PCRE (PCRE2 ?) which can take the seemingly successful look-behind method, but only apply it after the last path delimiter (/ since I am on Linux), and I am still stumped.所以我想我真正需要的是一个 PCRE(PCRE2 ?),它可以采用看似成功的后视方法,但只能在最后一个路径分隔符之后应用它(/ 因为我在 Linux 上),我仍然很难过。

specification:规格:

  1. match "rain"匹配“雨”
  2. in filename在文件名中
  3. only at start of a word仅在词首
  4. case-insensitive不区分大小写

assumptions:假设:

  1. define "word" to be sequence of letters (no punctuation, digits, etc)将“单词”定义为字母序列(无标点符号、数字等)
  2. paths have form prefix/name where prefix can have one or more levels delimited by / and name does not contain /路径具有prefix/name形式,其中prefix可以具有由/分隔的一个或多个级别,并且名称不包含/

constraints:约束:

  1. find -iregex matches against entire path ( -name only matches filename) find -iregex匹配整个路径( -name仅匹配文件名)
  2. find -iregex must match entirety of path (eg. "c" is only a partial match and does not match path "a/b/c") find -iregex必须匹配整个路径(例如,“c”只是部分匹配,不匹配路径“a/b/c”)

method:方法:

find can return matches against non-files (eg. directories). find可以返回针对非文件(例如目录)的匹配项。 Given definition 6, we would be unable to tell if name is a directory or an ordinary file.给定定义 6,我们将无法判断name是目录还是普通文件。 To satisfy 2, we can exclude non-files using find 's -type f predicate.为了满足 2,我们可以使用find-type f谓词排除非文件。

We can compare paths found by find against our specification by using find 's case-insensitive regex matching predicate ( -iregex ).我们可以使用find的不区分大小写的正则表达式匹配谓词 ( -iregex ) 将 find find的路径与我们的规范进行比较。 The "grep" flavour ( -regextype grep ) is sufficiently expressive. “grep”风味( -regextype grep )具有足够的表现力。

Just using 1, a suitable regex is: rain只需使用 1,一个合适的正则表达式是: rain

2+6+7 says we must forbid / after "rain": rain[^/]*$ 2+6+7 表示我们必须在“rain”之后禁止/rain[^/]*$

  • [/] matches character in set (ie. / ) [/]匹配集合中的字符(即/
  • [^/] : ^ inverts match: ie. [^/] : ^反转匹配:即。 character that is not /不是/的字符
  • * matches preceding match zero or more times *匹配前面的匹配零次或多次
  • $ constrains preceding match to occur at end of input $约束前面的匹配出现在输入的末尾

3+5 says there must be no immediately preceding word characters: [^az]rain[^/]*$ 3+5 表示前面不能有单词字符: [^az]rain[^/]*$

  • az is a shortcut for the range a to z azaz范围的快捷方式

8 requires matching the prefix explicitly: ^.*[^az]rain[^/]*$ 8 需要明确匹配前缀: ^.*[^az]rain[^/]*$

  • ^ outside of [...] constrains subsequent match to occur at beginning of input [...]之外的^限制后续匹配发生在输入的开头
  • . matches anything匹配任何东西
  • [^az] matches a non-alphabetic [^az]匹配非字母

Final command-line:最终命令行:

find . -type f -regextype grep -iregex '^.*[^a-z]rain[^/]*$'

Note: The leading ^ and trailing $ are not actually required, given 8, and could be elided.注意:前导^和尾随$实际上不是必需的,给定 8,可以省略。


exercise for the reader:读者练习:

  1. extend "word" to non-ASCII characters (eg. UTF-8)将“单词”扩展到非 ASCII 字符(例如 UTF-8)

You probably want to use either a character class, word boundary, or just have a negative look behind for alpha characters.您可能想要使用字符类、单词边界,或者只是对字母字符进行否定查看。

Look Behind向后看

^.+(?<![a-zA-Z])rain[^\/]*$

Matches any instance of rain , but only if it's not following [a-zA-Z] , and doesn't have any slashes afterwards.匹配rain的任何实例,但前提是它不跟随[a-zA-Z]并且之后没有任何斜线。 Unfortunately, find doesn't support look ahead or look behind… so we'll use a character class instead.不幸的是, find不支持向前看或向后看……所以我们将使用字符类来代替。

Character Class字符类

^.+(?:^|[^a-zA-Z])rain[^\/]*$

Matches the start of the line, or a character that isn't [a-zA-Z] , then proceeds to match by the characters for rain if it comes immediately after, so long as there are no slashes afterwards.匹配行的开头,或者不是[a-zA-Z]的字符,然后继续匹配rain的字符,如果它紧随其后,只要后面没有斜线。

You can use it in find like this:您可以像这样在find中使用它:

find ./ -iregex '.+(?:^|[^a-zA-Z])rain[^\/]*'

The ^ at the start and $ at the end of the pattern are implied when using find with -iregex , so you can omit them.find-iregex一起使用时,模式开头的^和结尾的$是隐含的,因此您可以省略它们。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 递归搜索以字符串开头的目录以查找与目录匹配的内部目录 - Recursively search directories beginning with string for inner directories matching pattern 如何使用find + regex匹配文件并仅返回包含所述文件的唯一目录? - How to use find + regex to match files and return only unique directories containing said files? 如何递归地更改名称与Perl中的字符串匹配的目录中的文件? - How to recursively change files in directories whose name matches a string in Perl? 查找/保存:如何递归搜索/替换文件中的字符串,但仅查找与特定正则表达式匹配的行 - Find/sed: How can I recursively search/replace a string in files but only for lines that match a particular regexp 在目录中按名称查找文件的重复项-Linux - Find duplicates of a file by name in a directory recursively - Linux 在文件中搜索RegEx字符串,仅返回文件名,路径和字符串 - Search for RegEx string in files and return ONLY file name, path and string Python 搜索类似文件名但只有一个不同字符串的列表 - Python search list for similar filenames but with only one different string 创建一个字符串,让我找出index.php文件是否位于任何目录中 - Create a string that would let me find out if index.php file is located in any of the directories 一种使用RegEx在字符串中查找一组文件名路径的方法 - A way to use RegEx to find a set of filenames paths in a string 如何使用PHP正则表达式查找并返回字符串的一部分? - How to use PHP Regular Expressions to find and return a portion of a string?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM