简体   繁体   English

使用 glob 参数递归匹配文件名

[英]Recursively matching filenames with glob argument

I have been trying to get a list of files matching a glob pattern in a command line argument ( sys.argv[1] ) recursively using glob.glob and os.walk .我一直在尝试使用glob.globos.walk递归地获取与命令行参数( sys.argv[1] )中的 glob 模式匹配的文件列表。 The problem is, bash (and many other shells it seems) auto-expand glob patterns into filenames.问题是,bash(以及许多其他的 shell)自动将 glob 模式扩展为文件名。

How do standard unix programs (eg grep -R ) do this then?那么标准 unix 程序(例如grep -R )如何做到这一点? I realize they're not in python, but if this is happening at the shell level, that shouldn't matter, right?我意识到它们不在 python 中,但如果这发生在 shell 级别,那没关系,对吧? Is there a way for a script to tell the shell to not auto-expand glob patterns?脚本有没有办法告诉 shell 不自动扩展全局模式? It looks like set -f will disable globbing, but I'm not sure how to run this early enough, so to speak.看起来set -f会禁用通配符,但我不确定如何足够早地运行它,可以这么说。

I've seen Use a Glob() to find files recursively in Python?我见过使用 Glob() 在 Python 中递归查找文件? , but that doesn't cover actually getting the glob patterns from command line arguments. ,但这并不包括从命令行 arguments 实际获取 glob 模式。

Thanks!谢谢!

Edit:编辑:

The grep-like perl script ack accepts a perl regex as one of its arguments.类似于 grep 的 perl 脚本ack接受 perl 正则表达式作为其 arguments 之一。 Thus, ack.* prints out every line of every file.因此, ack.*打印出每个文件的每一行。 But .* should expand to all hidden files in a directory.但是.*应该扩展到目录中的所有隐藏文件。 I tried reading the script but I don't know perl;我尝试阅读脚本,但我不知道 perl; how can it do this?它怎么能做到这一点?

The shell performs glob expansion before it even thinks of invoking the command. shell 在考虑调用命令之前执行全局扩展。 Programs such as grep don't do anything to prevent globbing: they can't.诸如 grep 之类的程序不会做任何事情来防止通配:它们不能。 You, as the caller of these programs, must tell the shell that you want to pass the special characters such as * and ?作为这些程序的调用者,您必须告诉 shell 您要传递特殊字符,例如*? to the program, and not let the shell interpret them.到程序,而不是让 shell 解释它们。 You do that by putting them inside quotes:您可以通过将它们放在引号内来做到这一点:

grep -E 'ba(na)* split' *.txt

(look for ba split , bana split , etc., in all files called <something> .txt ) In this case, either single quotes or double quotes will do the trick. (在所有名为 <something> .txt的文件中查找ba splitbana split等)在这种情况下,单引号或双引号都可以解决问题。 Between single quotes, the shell expands nothing.在单引号之间, shell 没有扩展。 Between double quotes, $ , ` and \ are still interpreted.在双引号之间, $`\仍然被解释。 You can also protect a single character from shell expansion by preceding it with a backslash.您还可以通过在单个字符前面加上反斜杠来保护单个字符免受 shell 扩展的影响。 It's not only wildcard characters that need to be protected;不仅需要保护通配符; for example, above, the space in the pattern is in quotes so it's part of the argument to grep and not an argument separator.例如,在上面,模式中的空格用引号引起来,因此它是grep参数的一部分,而不是参数分隔符。 Alternative ways to write the snippet above include编写上述代码段的替代方法包括

grep -E "ba(na)* split" *.txt
grep -E ba\(na\)\*\ split *.txt

With most shells, if an argument contains wildcards but the pattern doesn't match any file, the pattern is left unchanged and passed to the underlying command.对于大多数 shell,如果参数包含通配符但模式与任何文件都不匹配,则模式保持不变并传递给底层命令。 So a command like所以像这样的命令

grep b[an]*a *.txt

has a different effect depending on what files are present on the system.根据系统上存在的文件,具有不同的效果。 If the current directory doesn't contain any file whose name begins with b , the command searches the pattern b[an]*a in the files whose name matches *.txt .如果当前目录不包含任何名称以b开头的文件,则该命令在名称与*.txt匹配的文件中搜索模式b[an]*a If the current directory contains files named baclava , bnm and hello.txt , the command expands to grep baclava bnm hello.txt , so it searches the pattern baclava in the two files bnm and hello.txt .如果当前目录包含名为baclavabnmhello.txt的文件,则该命令将扩展为grep baclava bnm hello.txt ,因此它在bnmhello.txt两个文件中搜索模式baclava Needless to say, it's a bad idea to rely on this in scripts;不用说,在脚本中依赖它是个坏主意。 on the command line it can occasionally save typing, but it's risky.在命令行上它偶尔可以节省打字,但这是有风险的。

When you run ack.* in a directory containing no dot file, the shell runs ack. ..在不包含点文件的目录中运行ack.*时,shell 会运行ack. .. ack. .. . ack. .. The behavior of the ack command is then to print out all non-empty lines (pattern . : matches any one character) in all files under .. (the parent of the current directory) recursively.然后ack命令的行为是递归地打印出.. (当前目录的父目录)下的所有文件中的所有非空行(模式. : 匹配任何一个字符)。 Contrast with ack '.*' , which searches the pattern .* (which matches anything) in the current directory and its subdirectories (due to the behavior of ack when you don't pass any filename argument).ack '.*'相比,它在当前目录及其子目录中搜索模式.* (匹配任何内容)(由于ack在您不传递任何文件名参数时的行为)。

When it comes to grep, it simply accept a list of filenames, and doesn't do the glob expansion itself.当涉及到 grep 时,它只接受文件名列表,本身不进行 glob 扩展。 If you really need to pass a pattern as an argument, it has to be quoted on the command line with single quotes.如果您确实需要将模式作为参数传递,则必须在命令行中用单引号将其引用。 But before you do that, consider letting the shell do the job it was designed for.但在您这样做之前,请考虑让 shell 完成其设计的工作。

Yes, set -f , you're on the right track.是的, set -f ,你在正确的轨道上。

It sounds like you are going to call your python program from a shell.听起来你要从 shell 调用你的 python 程序。

Any time you use a shell to issue a command, it tries scans the cmd-line and processes wild-cards, command-substitution and a whole bunch of other things.每当您使用 shell 发出命令时,它都会尝试扫描命令行并处理通配符、命令替换和一大堆其他事情。

So you have to turn off the the globing before you run the program on the command-line因此,在命令行上运行程序之前,您必须关闭 globing

set -f
echo *
*

myprogram *.txt

will pass the string '*.txt' to your program.会将字符串 '*.txt' 传递给您的程序。 Then you can use the internal globbing to get your files.然后您可以使用内部通配符来获取您的文件。

OR you can do essentially the same thing by creating a wrapper script或者你可以通过创建一个包装脚本来做同样的事情

 #!/bin/bash
 set -f
 myProgram ${@}

where ${@} are the arguments you pass in when you start myProgram` either from the command -line, crontab or via exec(...) from another process.其中${@} are the arguments you pass in when you start arguments。

I hope this helps.我希望这有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM