简体   繁体   中英

Alternative for nested loop operation in python?

I want a fast alternative of a nested loop operation in which the second loop occurs after some operation in first loop.

For example:

date = target_date_list = pd.date_range(start=start_date, end=end_date).strftime(f'year=%Y/month=%m/day=%d')

for date in target_date_list:
    folder = f'path_to_folder/{date}'
    for file in folder:
        //some operation

There is no meaningfully faster alternative here. The inner loop's values are dependent on the value generated by the outer loop, so the micro-optimization of using itertools.product isn't available.

If you're actually iterating a directory (not characters in a string describing a directory), I'd strongly recommend using os.scandir over os.listdir (assuming like many folks you were using the latter without knowing the former existed), as it's much faster when:

  1. You're operating on large directories
  2. You're filtering the contents based on stat info (in particular entry types, which come for free without a stat at all)

With os.scandir , and inner loop previously implemented like:

for file in os.listdir(dir):
    path = os.path.join(dir, file)
    if file.endswith('.txt') and os.path.isfile(path) and os.path.getsize(path) > 4096:
        # do stuff with 4+KB file described by "path"

can simplify slightly and speed up by changing to:

with os.scandir(dir) as direntries:
    for entry in direntries:
        if entry.name.endswith('.txt') and entry.is_file() and entry.stat().st_size >= 4096:
        # do stuff with 4+KB file described by "entry.path"

but fundamentally, this optimization has nothing to do with avoiding nested loops; if you want to iterate all the files, you have to iterate all the files. A nested loop will need to occur somehow even if you hide it behind utility methods, and the cost will not be meaningful relative to the cost of file system access.

As a rule of thumb, your best bet for better performance in a for loop is to use a generator expression. However, I suspect that the performance boost for your particular example will be minimal, since your outer loop is just a trivial task of assigning a variable to a string.

date = target_date_list = pd.date_range(start=start_date, end=end_date).strftime(f'year=%Y/month=%m/day=%d')

for file in (f'path_to_folder/{date}' for date in target_date_list):
    //some operation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM