简体   繁体   English

python中嵌套循环操作的替代方法?

[英]Alternative for nested loop operation in python?

I want a fast alternative of a nested loop operation in which the second loop occurs after some operation in first loop. 我想要一个嵌套循环操作的快速替代方法,其中第二个循环在第一个循环中执行某些操作之后发生。

For example: 例如:

date = target_date_list = pd.date_range(start=start_date, end=end_date).strftime(f'year=%Y/month=%m/day=%d')

for date in target_date_list:
    folder = f'path_to_folder/{date}'
    for file in folder:
        //some operation

There is no meaningfully faster alternative here. 这里没有有意义的更快替代方法。 The inner loop's values are dependent on the value generated by the outer loop, so the micro-optimization of using itertools.product isn't available. 内部循环的值取决于外部循环生成的值,因此无法使用itertools.product进行微优化。

If you're actually iterating a directory (not characters in a string describing a directory), I'd strongly recommend using os.scandir over os.listdir (assuming like many folks you were using the latter without knowing the former existed), as it's much faster when: 如果您实际上是在迭代目录(而不是描述目录的字符串中的字符),我强烈建议os.scandiros.listdir使用os.scandir (假设您正在使用后者,而许多人却不知道前者的存在),如下在以下情况下速度更快:

  1. You're operating on large directories 您正在大型目录上操作
  2. You're filtering the contents based on stat info (in particular entry types, which come for free without a stat at all) 您正在根据统计信息(特别是条目类型,这些统计信息是免费提供的,完全没有统计信息)来过滤内容

With os.scandir , and inner loop previously implemented like: 使用os.scandir和先前实现的内部循环:

for file in os.listdir(dir):
    path = os.path.join(dir, file)
    if file.endswith('.txt') and os.path.isfile(path) and os.path.getsize(path) > 4096:
        # do stuff with 4+KB file described by "path"

can simplify slightly and speed up by changing to: 通过更改为:可以稍微简化并加快速度:

with os.scandir(dir) as direntries:
    for entry in direntries:
        if entry.name.endswith('.txt') and entry.is_file() and entry.stat().st_size >= 4096:
        # do stuff with 4+KB file described by "entry.path"

but fundamentally, this optimization has nothing to do with avoiding nested loops; 但从根本上说,这种优化与避免嵌套循环无关。 if you want to iterate all the files, you have to iterate all the files. 如果要迭代所有文件,则必须迭代所有文件。 A nested loop will need to occur somehow even if you hide it behind utility methods, and the cost will not be meaningful relative to the cost of file system access. 即使您将嵌套循环隐藏在实用程序方法后面,也需要以某种方式发生嵌套循环,并且相对于文件系统访问的成本而言,该成本不会有意义。

As a rule of thumb, your best bet for better performance in a for loop is to use a generator expression. 根据经验,在for循环中获得更好性能的最佳选择是使用生成器表达式。 However, I suspect that the performance boost for your particular example will be minimal, since your outer loop is just a trivial task of assigning a variable to a string. 但是,我怀疑您的特定示例的性能提升会很小,因为您的外部循环只是将变量分配给字符串的琐碎任务。

date = target_date_list = pd.date_range(start=start_date, end=end_date).strftime(f'year=%Y/month=%m/day=%d')

for file in (f'path_to_folder/{date}' for date in target_date_list):
    //some operation

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM