简体   繁体   English

Pythonic方法处理多个for循环使用不同的过滤器对同一个列表?

[英]Pythonic way to process multiple for loops with different filters against the same list?

Here's a bit of a program I'm writing that will create a csv categorizing a directory of files: 这是我正在编写的一个程序,它将创建一个对文件目录进行分类的csv:

matches = []
for root, dirnames, filenames in os.walk(directory):
    for filename in fnmatch.filter(filenames, '*[A-Z]*'):
        matches.append([os.path.join(root, filename), "No Capital Letters!"])

    test = re.compile(".*\.(py|php)", re.IGNORECASE)
    for filename in filter(test.search, filenames):
        matches.append([os.path.join(root, filename), "Invalid File type!"])

Basically, the user picks a folder and the program denotes problem files, which can be of several types (just two listed here: no files with uppercase letters, no php or python files). 基本上,用户选择一个文件夹,程序表示问题文件,可以是几种类型(这里只列出两种:没有大写字母的文件,没有php或python文件)。 There will be probably five or six cases. 可能会有五六个案例。

While this works, I want to refactor. 虽然这有效,但我想重构。 Is it possible to do something like 是否有可能做类似的事情

for filename in itertools.izip(fnmatch.filter(filenames, '*[A-Z]*'), filter(test.search, filenames), ...):
    matches.append([os.path.join(root, filename), "Violation")

while being able to keep track of which of original unzipped lists caused the "violation?" 能够跟踪哪些原始解压缩列表导致“违规”?

A simpler solution would probably be to just iterate over the files first and then apply your checks one by one: 一个更简单的解决方案可能是先迭代文件,然后逐个应用检查:

reTest = re.compile(".*\.(py|php)", re.IGNORECASE)
for root, dirnames, filenames in os.walk(directory):
    for filename in filenames:
        error = None
        if fnmatch.fnmatch(filename, '*[A-Z]*'):
            error = 'No capital letters!'
        elif reTest.search(filename):
            error = 'Invalid file type!'

        if error:
            matches.append([os.path.join(root, filename), error])

This will not only make the logic a lot simpler since you only ever need to check a single file (instead of having to figure every time out how to call your check method on a sequence of filenames), it will also iterate only once through the list of filenames. 这不仅会使逻辑变得更简单,因为您只需要检查单个文件(而不必每次都计算出如何在一系列文件名上调用check方法),它也只会迭代一次。文件名列表。

Furthermore, it will also avoid generating multiple matches for a single file name; 此外,它还将避免为单个文件名生成多个匹配项; it just adds one error (the first) at most. 它最多只添加一个错误(第一个)。 If you don't want this, you could make error a list instead and append to it in your checks—of course you want to change the elif to if then to evaluate all the checks. 如果你不想这样,你可以将error改为列表并在你的支票中附加到它 - 当然你想要将elif改为if然后评估所有支票。

I recommend you look at these slides . 我建议你看看这些幻灯片

David Beazley gives an example of using yield to process log files. David Beazley给出了使用yield来处理日志文件的示例。

edit: here are two examples from the pdf, one without generator: 编辑:以下是pdf中的两个示例,一个没有生成器:

wwwlog = open("access-log")
total = 0
for line in wwwlog:
  bytestr = line.rsplit(None,1)[1]
   if bytestr != '-':
     total += int(bytestr)
 print "Total", total

and with generator (can use function with yield for more complex examples) 并使用生成器(可以使用带有yield的函数来获得更复杂的示例)

wwwlog = open("access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", sum(bytes)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM