简体   繁体   English

你如何在python中实现ant样式模式集来选择文件组?

[英]How would you implement ant-style patternsets in python to select groups of files?

Ant has a nice way to select groups of files, most handily using ** to indicate a directory tree. Ant有一种很好的方法来选择文件组,最方便的是使用**来表示目录树。 Eg 例如

**/CVS/*            # All files immediately under a CVS directory.
mydir/mysubdir/**   # All files recursively under mysubdir

More examples can be seen here: 这里可以看到更多的例子:

http://ant.apache.org/manual/dirtasks.html http://ant.apache.org/manual/dirtasks.html

How would you implement this in python, so that you could do something like: 你将如何在python中实现这一点,以便你可以做类似的事情:

files = get_files("**/CVS/*")
for file in files:
    print file

=>
CVS/Repository
mydir/mysubdir/CVS/Entries
mydir/mysubdir/foo/bar/CVS/Entries

Sorry, this is quite a long time after your OP. 对不起,这是你的OP后很长一段时间。 I have just released a Python package which does exactly this - it's called Formic and it's available at the PyPI Cheeseshop . 我刚刚发布了一个Python软件包,它正是这样做的 - 它叫做Formic,它可以在PyPI Cheeseshop上找到 With Formic, your problem is solved with: 使用Formic,您的问题可通过以下方式解决:

import formic
fileset = formic.FileSet(include="**/CVS/*", default_excludes=False)
for file_name in fileset.qualified_files():
    print file_name

There is one slight complexity: default_excludes. 有一个轻微的复杂性:default_excludes。 Formic, just like Ant, excludes CVS directories by default (as for the most part collecting files from them for a build is dangerous), the default answer to the question would result in no files. Formic,就像Ant一样,默认情况下排除CVS目录(因为大多数情况下从构建文件中收集文件是危险的),问题的默认答案将导致没有文件。 Setting default_excludes=False disables this behaviour. 设置default_excludes = False会禁用此行为。

As soon as you come across a ** , you're going to have to recurse through the whole directory structure, so I think at that point, the easiest method is to iterate through the directory with os.walk, construct a path, and then check if it matches the pattern. 一旦遇到** ,你将不得不在整个目录结构中进行递归,所以我认为在这一点上,最简单的方法是使用os.walk遍历目录,构建一个路径,然后检查它是否与模式匹配。 You can probably convert to a regex by something like: 您可以通过以下方式转换为正则表达式:

def glob_to_regex(pat, dirsep=os.sep):
    dirsep = re.escape(dirsep)
    print re.escape(pat)
    regex = (re.escape(pat).replace("\\*\\*"+dirsep,".*")
                           .replace("\\*\\*",".*")
                           .replace("\\*","[^%s]*" % dirsep)
                           .replace("\\?","[^%s]" % dirsep))
    return re.compile(regex+"$")

(Though note that this isn't that fully featured - it doesn't support [az] style glob patterns for instance, though this could probably be added). (虽然注意到这不是那么全功能 - 但它不支持[az]样式的glob模式,尽管可能会添加它)。 (The first \\*\\*/ match is to cover cases like \\*\\*/CVS matching ./CVS , as well as having just \\*\\* to match at the tail.) (第一个\\*\\*/匹配是为了覆盖像\\*\\*/CVS匹配./CVS ,以及只有\\*\\*来匹配尾部。)

However, obviously you don't want to recurse through everything below the current dir when not processing a ** pattern, so I think you'll need a two-phase approach. 但是,显然你不想在不处理**模式时通过当前目录下的所有内容进行递归,所以我认为你需要一个两阶段的方法。 I haven't tried implementing the below, and there are probably a few corner cases, but I think it should work: 我没有尝试过实现下面的内容,并且可能有一些极端情况,但我认为应该可行:

  1. Split the pattern on your directory seperator. 拆分目录分隔符上的模式。 ie pat.split('/') -> ['**','CVS','*'] pat.split('/') -> ['**','CVS','*']

  2. Recurse through the directories, and look at the relevant part of the pattern for this level. 通过目录进行递归,并查看此级别的模式的相关部分。 ie. 即。 n levels deep -> look at pat[n] . n levels deep -> look at pat[n]

  3. If pat[n] == '**' switch to the above strategy: 如果pat[n] == '**'切换到上述策略:

    • Reconstruct the pattern with dirsep.join(pat[n:]) 使用dirsep.join(pat[n:])重构模式dirsep.join(pat[n:])
    • Convert to a regex with glob\\_to\\_regex() 使用glob\\_to\\_regex()转换为正则表达式
    • Recursively os.walk through the current directory, building up the path relative to the level you started at. 递归os.walk通过当前目录,建立相对于您开始的级别的路径。 If the path matches the regex, yield it. 如果路径与正则表达式匹配,则将其生成。
  4. If pat doesn't match "**" , and it is the last element in the pattern, then yield all files/dirs matching glob.glob(os.path.join(curpath,pat[n])) 如果pat与"**"不匹配,并且它是模式中的最后一个元素,那么产生匹配glob.glob(os.path.join(curpath,pat[n]))所有文件/目录glob.glob(os.path.join(curpath,pat[n]))

  5. If pat doesn't match "**" , and it is NOT the last element in the pattern, then for each directory, check if it matches (with glob) pat[n] . 如果pat与"**"不匹配,并且它不是模式中的最后一个元素,那么对于每个目录,检查它是否匹配(使用glob) pat[n] If so, recurse down through it, incrementing depth (so it will look at pat[n+1] ) 如果是这样,通过它递减,增加深度(所以它将看pat[n+1]

os.walk is your friend. os.walk是你的朋友。 Look at the example in the Python manual ( https://docs.python.org/2/library/os.html#os.walk ) and try to build something from that. 查看Python手册( https://docs.python.org/2/library/os.html#os.walk )中的示例,并尝试从中构建一些内容。

To match " **/CVS/* " against a file name you get, you can do something like this: 要将“ **/CVS/* ”与您获得的文件名相匹配,您可以执行以下操作:

def match(pattern, filename):
    if pattern.startswith("**"):
        return fnmatch.fnmatch(file, pattern[1:])
    else:
        return fnmatch.fnmatch(file, pattern)

In fnmatch.fnmatch , "*" matches anything (including slashes). fnmatch.fnmatch ,“*”匹配任何内容(包括斜杠)。

There's an implementation in the 'waf' build system source code. “waf”构建系统源代码中有一个实现。 http://code.google.com/p/waf/source/browse/trunk/waflib/Node.py?r=10755#471 May be this should be wrapped up in a library of its own? http://code.google.com/p/waf/source/browse/trunk/waflib/Node.py?r=10755#471可能这应该包含在自己的库中吗?

Yup. 对。 Your best bet is, as has already been suggested, to work with 'os.walk'. 正如已经建议的那样,你最好的选择是使用'os.walk'。 Or, write wrappers around ' glob ' and ' fnmatch ' modules, perhaps. 或者,也许是围绕' glob '和' fnmatch '模块编写包装器。

os.walk is your best bet for this. os.walk是你最好的选择。 I did the example below with .svn because I had that handy, and it worked great: 我用.svn做了下面的例子,因为我有这个方便,而且效果很好:

import re

for (dirpath, dirnames, filenames) in os.walk("."):
    if re.search(r'\.svn$', dirpath):
        for file in filenames:
            print file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM