Pythonic方法处理多个for循环使用不同的过滤器对同一个列表？

Question

Here's a bit of a program I'm writing that will create a csv categorizing a directory of files: 这是我正在编写的一个程序，它将创建一个对文件目录进行分类的csv：

matches = []
for root, dirnames, filenames in os.walk(directory):
    for filename in fnmatch.filter(filenames, '*[A-Z]*'):
        matches.append([os.path.join(root, filename), "No Capital Letters!"])

    test = re.compile(".*\.(py|php)", re.IGNORECASE)
    for filename in filter(test.search, filenames):
        matches.append([os.path.join(root, filename), "Invalid File type!"])

Basically, the user picks a folder and the program denotes problem files, which can be of several types (just two listed here: no files with uppercase letters, no php or python files). 基本上，用户选择一个文件夹，程序表示问题文件，可以是几种类型（这里只列出两种：没有大写字母的文件，没有php或python文件）。 There will be probably five or six cases. 可能会有五六个案例。

While this works, I want to refactor. 虽然这有效，但我想重构。 Is it possible to do something like 是否有可能做类似的事情

for filename in itertools.izip(fnmatch.filter(filenames, '*[A-Z]*'), filter(test.search, filenames), ...):
    matches.append([os.path.join(root, filename), "Violation")

while being able to keep track of which of original unzipped lists caused the "violation?" 能够跟踪哪些原始解压缩列表导致“违规”？

Answer 1

A simpler solution would probably be to just iterate over the files first and then apply your checks one by one: 一个更简单的解决方案可能是先迭代文件，然后逐个应用检查：

reTest = re.compile(".*\.(py|php)", re.IGNORECASE)
for root, dirnames, filenames in os.walk(directory):
    for filename in filenames:
        error = None
        if fnmatch.fnmatch(filename, '*[A-Z]*'):
            error = 'No capital letters!'
        elif reTest.search(filename):
            error = 'Invalid file type!'

        if error:
            matches.append([os.path.join(root, filename), error])

This will not only make the logic a lot simpler since you only ever need to check a single file (instead of having to figure every time out how to call your check method on a sequence of filenames), it will also iterate only once through the list of filenames. 这不仅会使逻辑变得更简单，因为您只需要检查单个文件（而不必每次都计算出如何在一系列文件名上调用check方法），它也只会迭代一次。文件名列表。

Furthermore, it will also avoid generating multiple matches for a single file name; 此外，它还将避免为单个文件名生成多个匹配项; it just adds one error (the first) at most. 它最多只添加一个错误（第一个）。 If you don't want this, you could make error a list instead and append to it in your checks—of course you want to change the elif to if then to evaluate all the checks. 如果你不想这样，你可以将error改为列表并在你的支票中附加到它 - 当然你想要将elif改为if然后评估所有支票。

Answer 2

I recommend you look at these slides . 我建议你看看这些幻灯片。

David Beazley gives an example of using yield to process log files. David Beazley给出了使用yield来处理日志文件的示例。

edit: here are two examples from the pdf, one without generator: 编辑：以下是pdf中的两个示例，一个没有生成器：

wwwlog = open("access-log")
total = 0
for line in wwwlog:
  bytestr = line.rsplit(None,1)[1]
   if bytestr != '-':
     total += int(bytestr)
 print "Total", total

and with generator (can use function with yield for more complex examples) 并使用生成器（可以使用带有yield的函数来获得更复杂的示例）

wwwlog = open("access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", sum(bytes)

Pythonic方法处理多个for循环使用不同的过滤器对同一个列表？

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-05-27 19:53:55

解决方案2
-1 2015-05-27 19:59:21

Pythonic方法处理多个for循环使用不同的过滤器对同一个列表？

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-05-27 19:53:55

解决方案2 -1 2015-05-27 19:59:21

解决方案1
4 已采纳 2015-05-27 19:53:55

解决方案2
-1 2015-05-27 19:59:21